schemas/VegCore/VegCore.ERD.mwb: taxon_path: converted to an auxiliary table of taxon_name instead of a subclass of it (like geopath for the place table). this causes distinct taxon_paths to be stored only once, instead of repeatedly for each taxon_name.
schemas/VegCore/VegCore.ERD.mwb: place hierarchy: reorganized to store scrubbed geoplaces in a containment hierarchy instead of a denormalized geopath. this allows each source-specific place to be GNRS-scrubbed to a GADM place, and then have its coordinates geovalidated to see if it is within the matched GADM place. this uses the georeferencing table to store the matched GADM place (scrubbed_geoplace) for each input place, instead of geopath_scrub to store the matched GADM geo*path* for each input geo*path*. (this avoids the need to scrub every combination of place ranks, because just the name of each place is scrubbed relative to its parent place.) geopath instead becomes an auxiliary table to store the place table's verbatim ranks, for easy access and storage.
inputs/SpeciesLink/Specimen/map.csv: conceptual_darwin_2003_1_0_BoundingBox: remapped to UNUSED
schemas/VegCore/VegCore.ERD.mwb: place: renamed to local_place to distinguish it from geoplace, which is not a subclass of place (it is a separate, global table, while local_place is source-specific). note that renames sometimes need to be done manually on vegbiendev, to avoid triggering a MySQL bug that blocks the new table from being created and requires the entire database to be recreated to clear the error.
schemas/VegCore/VegCore.ERD.mwb: stem, stem_observation: made associated individual/individual_observation optional, because some stems (e.g. in VegBank) are not grouped together into individuals. note that a stem is still considered to BE-AN individual, but it is a type of individual which may be grouped under another, plant-level individual.
schemas/VegCore/VegCore.ERD.mwb: fixed lines
schemas/VegCore/VegCore.ERD.mwb: specimen_observation: added description (vegcore.vegpath.org?specimenDescription). taxon_presence: added occurrence_status (vegcore.vegpath.org?occurrenceStatus). stem_observation, aggregate_observation: made room for them to expand with additional first-class fields.
schemas/VegCore/VegCore.ERD.mwb: taxon_presence, taxon_absence: inherit from taxon_determination rather than taxon_observation, so that the taxon_determination's taxon can be used as the identifying taxon (i.e. the authorPlantName, VegCore.vegpath.org?authorPlantName)
schemas/VegCore/VegCore.ERD.mwb: taxon_determination: inherit from taxon_observation again because now that redeterminations can only occur on reobservable things, it makes sense to only allow one taxon_determination per observation event. this means that each redetermination on a specimen would get its own taxon_observation (where any additional attributes noted in the reobservation could also be included).
schemas/VegCore/VegCore.ERD.mwb: taxon_occurrence: renamed to reobservable to emphasize that this is only for things on which taxon redeterminations can be made, such as individuals and specimens (including voucher specimens). a redetermination on an aggregate_observation would instead be made on its voucher specimen, which is the only reobservable part of it.
schemas/VegCore/VegCore.ERD.mwb: moved taxon_observation subclasses closer to taxon_observation so that it would be clear they were observation-related rather than occurrence-related (e.g. there is no concept of "repeat-sampling" of an aggregate_observation, because each sampling it is the collector's opinion that the plants correspond to a particular taxon)
bugfix: schemas/VegCore/VegCore.ERD.png: switched back to attaching the sRGB color profile directly, because actually, the native->sRGB translation happens in the monitor driver itself (and can be adjusted in System Preferences > Displays > Color), rather than in the specific application. this means that the hex color values color-matched in MySQL Workbench were actually sRGB (translated by the OS to monitor-native for display), and that the sRGB profile merely needed to be explicitly indicated for other monitors that are not close to sRGB (and thus need the translation). the closeness of the 27-inch iMac screen to sRGB can be verified by selecting sRGB in System Preferences > Displays > Color, and noting that the desktop background does not change from when the default "iMac" setting is selected.
bugfix: schemas/VegCore/VegCore.ERD.png: convert to sRGB color profile after attaching the native monitor profile instead of attaching it directly. this allows the hex colors that were color-matched in MySQL Workbench (which presumably uses raw monitor RGB) to be translated to the universal sRGB space, where they can then be localized to a different monitor's local color space. note that this does not visibly change the image on the 27-inch iMac screen from what was produced via the previous, incorrect method (attaching the sRGB profile without conversion from native), which would imply that the iMac's screen is very close to the sRGB color space already. if this is the case, it is instead older LCDs that have off-white color spaces that need translation from sRGB.
schemas/VegCore/VegCore.ERD.png: attached sRGB color profile using Gimp (gimp.org), so that the colors don't look completely washed out and off-hue on older LCDs (i.e. other than the 27-inch iMac screen)
schemas/VegCore/VegCore.ERD.mwb: regenerated exports
schemas/VegCore/VegCore.ERD.mwb: added separate geo category (turquoise) to visually distinguish the broader geoplace tables from the more specific plot tables. (note that georeferencing is actually a plot table despite geo- in its name, because it assigns a geoplace to a plot.)
schemas/VegCore/VegCore.ERD.mwb: georeferencing: added georeferenced_by
schemas/VegCore/VegCore.ERD.mwb: added georeferencing table for georeference* DwC fields. this can be used to link a place to a georeferenced geoplace other than (or in addition to) the original geoplace.
schemas/VegCore/VegCore.ERD.mwb: added geopath_scrub for GNRS results (separate from point-in-polygon validation)
schemas/VegCore/VegCore.ERD.mwb: place: factored optional geocoords, geopath out into separate geoplace table (with both nullable), which validatable_geoplace (renamed from geoplace, with both NOT NULL) extends
bugfix: schemas/VegCore/VegCore.ERD.mwb: geovalidation: made scrubbed_geoplace optional because not all geoplaces will scrub to a valid geoplace
bugfix: schemas/VegCore/VegCore.ERD.mwb: geovalidation: need to inherit from record now that this is source-specific
bugfix: schemas/VegCore/VegCore.ERD.mwb: geovalidation: HAVE-AN input geoplace rather than BEING-ONE, to allow multiple geovalidations for a geoplace by different sources
schemas/VegCore/VegCore.ERD.mwb: taxon_determination: changed IS-A relationship with taxon_observation to HAS-A so that a separate taxon_observation doesn't need to be created for each taxon_determination (even though each taxon_determination event is theoretically a reobservation of the specimen, etc.). instead, inherit from sampling_event to include the necessary event-related fields.
bugfix: schemas/VegCore/VegCore.ERD.mwb: geopath: made country NOT NULL so that every geoplace (for input to geovalidation) has something on the geopath side. geocoords: made latitude_deg/longitude_deg NOT NULL so that every geoplace (for input to geovalidation) has something on the geocoords side. added geocoords_unique constraint since this is a global table with one entry for each lat/long.
schemas/VegCore/VegCore.ERD.mwb: place: added coords hstore extender, for verbatim coordinates, etc.
schemas/VegCore/VegCore.ERD.mwb: coordinates: abbreviated to coords (unambiguous abbreviation)
schemas/VegCore/VegCore.ERD.mwb: replaced parsed_taxon_assertion with taxon_scrub, which HAS-A parsed taxon_assertion rather than BEING-A parsed_taxon_assertion. (multiple TNRS results may parse to the same thing.)
schemas/VegCore/VegCore.ERD.mwb: geovalidatable_place: renamed to geoplace, since this uniquification is useful independently of geovalidation. note that the MySQL upgrade on vegbiendev has now reordered the fkeys again, this time in forwards order.
planning/timeline/timeline.2013.xls: updated for July progress
schemas/VegCore/VegCore.ERD.mwb: place tables that are absolute within Earth rather than relative to a parent place: prefixed geo- to table name for clarity
schemas/VegCore/VegCore.ERD.mwb: plot, subplot: added hstore extenders (dimensions, coordinates)
schemas/VegCore/VegCore.ERD.mwb: fixed inheritance connectors to be 1:1, optional on subclass
schemas/VegCore/VegCore.ERD.mwb: plot: added shape. bounding_box: changed units to rect, since this just needs a width/height (the x/y coord is the lat/long).
schemas/VegCore/VegCore.ERD.mwb: plot: added footprint_geom_WKT. bounding_box: added units (WKT).
schemas/VegCore/VegCore.ERD.mwb: back-synced from staging copy on vegbiendev to flush out sync changes that it kept trying to re-make
schemas/VegCore/VegCore.ERD.mwb: event: moved method to separate sampling_event subclass
schemas/VegCore/VegCore.ERD.mwb: aggregate_observation: inherit from taxon_presence, since this is a type of taxon_presence and it avoids duplicating the taxon_concept field
schemas/VegCore/VegCore.ERD.mwb: added taxon_absence, to avoid including absence observations in the same table as presence observations (which needlessly complicates queries). note that the fkey order now gets set back to forwards whenever a table is changed.
schemas/VegCore/VegCore.ERD.mwb: re-saved. the fkey order is now apparently reversed for recently-changed tables.
schemas/VegCore/VegCore.ERD.mwb: collector, identified_by: allow multiple parties for these fields, using the new party_list array table
schemas/VegCore/VegCore.ERD.mwb: party arrays: use new party_list array table instead of adding a separate many:many table for each table that uses a party array. this also allows using the party_list ID in a unique constraint, because it is now a first-class field.
schemas/VegCore/VegCore.ERD.mwb: party: added party_list array table
schemas/VegCore/VegCore.ERD.mwb: party: added optional fkey to organization
schemas/VegCore/VegCore.ERD.mwb: geovalidation: renamed lat_long_in_ranks to lat_long_in_place_ranks for clarity
schemas/VegCore/VegCore.ERD.mwb: individual: added tag_history hstore to store custom identity attributes
schemas/VegCore/VegCore.ERD.mwb: taxon_string: documented that to get the parsed_taxon_assertion (TNRS result) for a taxon_string, you would join using the SQL dotpath taxon_string.string<-taxon_assertion(string)::parsed_taxon_assertion[source='TNRS.version'] (see wiki.vegpath.org/SQL_dotpaths). important how-to comments such as this one are now included in the version-controlled MySQL schema file itself, not just the .mwb file and the staging copy on vegbiendev.
bin/my2pg: use s!...!...! when either the regexp or the replacement contains / , to avoid unnecessary \-s
bin/my2pg: commenting out table options: added explanatory comment, because it is not obvious from the regexp what this does
lib/sh/db.sh: mysqldump(): don't use --compatible=postgresql when the table structure is being exported, because this removes the table options (which include the COMMENT attribute). --compatible=postgresql remains on in data-only mode because embedded ` in data cannot easily be distinguished from ` around column names, so ANSI_QUOTES is needed to do the translation to " (and data sections do not contain table options). note that all --compatible modes that offer ANSI_QUOTES unfortunately exclude the table options, and there is no way to run a SQL query to set the SQL mode before beginning the dump, so ANSI_QUOTES translation must be handled by my2pg instead.
bin/my2pg: comment out table options (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_no_table_options) instead of removing them, because they include table COMMENTs, which contain important metadata such as table definitions. (note that table COMMENTs use a slightly different syntax than column COMMENTs, so the table COMMENTs will not be commented out twice.)
bin/my2pg: comment out COMMENTs instead of removing them so that they will be included in the PostgreSQL translation. COMMENTs contain important metadata about columns, such as definitions and the meanings of integer flag values.
inputs/{.,}*/*.schema.sql: regenerated using the instructions in bin/my2pg. this primarily replaces timestamp with text/*timestamp*/ (to preserve indefinite dates).
bin/my2pg: added instructions for regenerating *.schema.sql whenever this script is changed
bin/my2pg: COMMENT: also match COMMENTs with embedded ', because there will only be one COMMENT per line, so the contents of the COMMENT can just extend to the last ' on the line
bugfix: lib/sh/util.sh: $sed_cmd: make output unbuffered, so that running e.g. bin/my2pg at the command line produces output as each line is read
bin/my2pg: replace MySQL ` quotes with " quotes to support exports that were generated without ANSI_QUOTES mode. (this replacement only applies to schema exports, not data.) ANSI_QUOTES is only available with mysqldump --compatible modes that also include NO_TABLE_OPTIONS, which omits important table options such as comments. in particular, these comments are part of schemas/VegCore/VegCore.ERD.mwb but were not being included in VegCore.my.sql.
schemas/VegCore/VegCore.ERD.mwb: taxon_string: removed parsed_taxon_assertion field, since there may be more than one parsing (TNRS result) for a given taxon_string. the parsing relationship can better be represented by adding a parsed_taxon_assertion whose taxon_assertion.string points to the parsed taxon_string. getting the parsed_taxon_assertion for a taxon_string now requires joining on parsed_taxon_assertion using a backwards instead of forwards fkey, and filtering the corresponding assertions to include only the ones for TNRS (of the desired TNRS version). documented that taxon_assertion.string was previously the concatenated matched name, but is now the TNRS input name. the concatenated matched name is still in parsed_taxon_assertion.matched_taxon_concept->:taxon_name.unique_name.
schemas/VegCore/VegCore.my.sql: regenerated from .mwb schema, which apparently reverses the order of the fkeys (possibly a Linux MySQL bug?)
inputs/SpeciesLink/Specimen/map.csv: remapped Darwin Core synonyms to DUPLICATE. this avoids the need to translate these to postprocessing derived columns for new-style import, and also speeds up column-based import because there are less automatic alts to perform to resolve filter-less collisions. the svn diff was verified by replacing DUPLICATE#of:dwc_terms<term>#... with <term>, removing the comment, and checking that this removes the diff (except where VegCore has renamed a DwC term).
bugfix: inputs/SpeciesLink/Specimen/map.csv: *scientificName: remapped to scientificName instead of taxonName to match the DwC term's name (this is the same dwc_terms_scientificName mismapping that was fixed in r10434)
bugfix: inputs/SpeciesLink/Specimen/map.csv: dwc_terms_scientificName: remapped to scientificName instead of taxonName to match that DwC term name, as well as the mappings of other *scientificName terms
inputs/SpeciesLink/Specimen/map.csv: marked dwc_geospatial_VerbatimLatitude,Longitude as exact duplicates of dwc_terms_*
inputs/SpeciesLink/Specimen/map.csv: remapped identical _alt-ed fields to DUPLICATE. this avoids the need to translate these to postprocessing derived columns for new-style import, and also speeds up column-based import because there are less automatic _alts to perform to resolve filter-less collisions.
bugfix: inputs/SpeciesLink/Specimen/map.csv: *CollectorNumber: moved these to the same _alt group as recordNumber, because they are actually duplicates
correction: inputs/SpeciesLink/Specimen/map.csv: FieldNumber: fixed incorrect comment that these fields are identical to recordNumber, when instead they have the same *meaning but not the same values. instead, values are stored under either of the two terms. the previous conclusion had been based on an incorrect query, which used != instead of the NULL-sensitive IS NOT DISTINCT FROM.
planning/timeline/timeline.2013.xls: Adding derived columns: extended to overlap with all subtasks
planning/timeline/timeline.2013.xls: Geoscrubbing: split into separate re-run and automated pipeline tasks
planning/timeline/timeline.2013.xls: moved Data provider validations before Adding derived columns because ensuring that the source data is in the database is more important than the derived data, which can always be added later
planning/timeline/timeline.2013.xls: Data provider validations: added dot in July because some amount of datasource-level validation happens when mappings issues are discovered during the refactoring
bugfix: inputs/*/*/map.csv for specimen tables: remapped eventDate,day,month,year to *Collected, because a general date always applies to the observation itself rather than to any parent event (specimens don't have a parent event)
inputs/*/*/map.csv for IndividualObservation tables: also mapped eventDate,day,month,year to *Collected, because a general date always applies to the observation itself in addition to any parent event which it may be a part of
bugfix: inputs/XAL/Specimen/, NY/Ecatalog_all/: *JulianDay: remapped to dayOfYear instead of day (the day of the month)
inputs/SpeciesLink/Specimen/map.csv: remapped *dayOfYear-related terms to UNUSED
bugfix: inputs/SpeciesLink/Specimen/map.csv: remapped conceptual_darwin_2003_1_0_JulianDay, dwc_dwcore_DayOfYear to dayOfYear instead of day (the day of the month)
mappings/VegCore.htm: regenerated from wiki. added dayOfYear (=julianDay), which is different from startDayOfYear/endDayOfYear.
inputs/CTFS/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource
inputs/CTFS/StemObservation/: translated collisions (missing filters) to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
planning/timeline/timeline.2013.xls: rebalanced tasks across the remaining months, taking into account priority changes made in the conference call (e.g. that we should not be handling people's individual data requests (Brad, wiki.vegpath.org/2013-07-25_conference_call#Decisions-made))
planning/timeline/timeline.2013.xls: updated with additional tasks added in conference call: translate source-specific derived columns to plain SQL, flatten the datasources, automated geoscrubbing pipeline
planning/goals/BIEN_3_derived_data_products_NormalizedDB_only.docx: removed BIEN species-level phylogeny, which Brad says is out of scope for the BIEN DB
removed planning/workflow/bien3_architecture.odp because the current version is now in bien3_architecture.pptx
added planning/workflow/validation/TNRS_results.ppt symlink to inputs/test_taxonomic_names/_scrub/TNRS_results.ppt
inputs/test_taxonomic_names/_scrub/TNRS_results.ppt: highlighted the sample row and related rows
inputs/test_taxonomic_names/_scrub/TNRS_results.xls: moved arrows to TNRS_results.ppt so they can be changed more easily
inputs/test_taxonomic_names/_scrub/TNRS_results.ppt: TNRS.tnrs: added diagram labels for the various names and steps
inputs/test_taxonomic_names/_scrub/TNRS_results.xls: use "Poa annua var. eriolepis"->"Poaceae Poa annua L." as the synonym example instead of "Poa annua fo. lanuginosa"->"Poaceae Poa annua var. annua" because the input name is simpler and it's closer to the beginning of the list
inputs/test_taxonomic_names/_scrub/run: exports/make(): tnrs.csv: include Name_matched instead of Genus_matched+Specific_epithet_matched because this also contains lower ranks, which are used in the TNRS synonymizing
inputs/test_taxonomic_names/_scrub/TNRS_results.ppt: added annotations explaining the import steps
added inputs/test_taxonomic_names/_scrub/TNRS_results.ppt, containing the *.png screenshots with tables labeled
added inputs/test_taxonomic_names/_scrub/*.png, screenshots of the TNRS_results.xls tabs (LibreOffice does not preserve the formatting when pasting a spreadsheet to a PowerPoint as a table, and the table editing options are limited)
added inputs/test_taxonomic_names/_scrub/TNRS_results.xls with formatted versions of the *.csv tables
inputs/test_taxonomic_names/_scrub/run: exports/make(): subset the columns to include only the most important to demo how the data is represented
lib/sh/db.sh: mk_select(): support passing $cols as array instead of SQL string, which is easier to enter in a shell script (less quotes, \ , etc.)
lib/sh/db.sh: added cols2list()
lib/sh/util.sh: added is_array()
inputs/test_taxonomic_names/_scrub/run: exports/make(): allow specifying an explicit columns list for each table using cols=... (initially set to all columns)
added inputs/test_taxonomic_names/_scrub/*.csv exports
added inputs/test_taxonomic_names/_scrub/run, which exports the test_scrub-populated tables to CSV