bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: updated filter condition to match output query
inputs/NY/run: `make inputs/NY/validate`: updated runtime (8 min, with added queries)
fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
bugfix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13*: also need to include coordinate pairs which have one of their coordinates NULL, by using OR instead of AND
bugfix: inputs/NY/validations.sql: _specimens_13b_list_of_all_decimal_lat_long: matched column types to output query
bugfix: inputs/NY/validations.sql: _specimens_13a_list_of_all_verbatim_lat_long: matched column types to output query
inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: added breakdowns _specimens_13a_list_of_all_verbatim_lat_long, _specimens_13b_list_of_all_decimal_lat_long to help troubleshoot the diff
fix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: count lat/longs together instead of separately, because the DISTINCT is by coordinate pair, not individual coordinate value (which wouldn't make much sense)
fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric
inputs/NY/validations.-.util.sql: added util.is_castable() wrapper
bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
bugfix: inputs/NY/validations.sql: specimens*_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on *_of_unique_verb_subsp_taxa_with_author instead)
fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
lib/common.Makefile: added $(nice) and use it everywhere its definition is used
inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
inputs/input.Makefile: import: validate at the end of the import
inputs/input.Makefile: added new-style aggregating validations (`validate` target)
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
fix: inputs/*/*/map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
inputs/XAL/Specimen/header.csv: updated
added inputs/NY/validations*.sql*
bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
inputs/run: postprocess(): updated runtime (25 min)
inputs/run: postprocess(): updated runtime (20 min)
mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes|make inputs/{NVS,SALVIAS,TEAM}/test`.
bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.
inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)
inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
inputs/VegBank/projectcontributor_/test.xml.ref: updated inserted row count
bugfix: inputs/VegBank/import_order.txt: added missing project, needed to trigger the staging table renaming for the project table
inputs/VegBank/run: documented `rm=1 import()` runtime (>1.5 h)
inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
inputs/MO/Specimen/test.xml.ref: updated, which adds dateCollected mappings
inputs/WIN/Specimen/test.xml.ref: updated to map.csv, which has eventDate->dateCollected
inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
inputs/CVS/plot_/postprocess.sql: add unique constraint on locationName (analogous to the unique constraint in plot), for use by inputs/VegBank/plot/postprocess.sql in removing inter-datasource duplication
inputs/run: postprocess(): documented runtime (30 min)
bugfix: inputs/input.Makefile: %/postprocess.sql: don't perform replacements using map.csv, because map.csv is not idempotent. this functionality was only there to facilitate switching to new-style import, which is now largely done. (the remaining datasources NVS, SALVIAS, TEAM contain only 1 postprocess.sql: inputs/SALVIAS/projects/postprocess.sql (`st inputs/{NVS,SALVIAS,TEAM}/*/postprocess.sql`).)
inputs/input.Makefile: %/postprocess.sql: always run this, not just if the associated map spreadsheets change, to avoid needing to `touch` them to cause %/postprocess.sql to run
fix: inputs/*/*/postprocess.sql: un-doubled *
bugfix: inputs/input.Makefile: %/postprocess.sql: also need to apply renames from mappings/VegCore.thesaurus.csv, as these have been applied to map.csv
added inputs/run, which runs all the inputs' runscripts using the new auto-forwarding
removed unused inputs/table.run. inputs/*/table.run include lib/runscripts/table.run directly.
inputs/SALVIAS/validations.sql: implemented _plots_19_count_of_censuses_per_plot_in_each_project
bugfix: inputs/SALVIAS/validations.sql: plots_07_list_of_plots_with_counts_of_individuals_per_species: renamed to _plots_07_list_of_plots*which_use*_... because this query is not intended to include the actual counts, just to say which plots have them (the correct "which use" wording is also used in queries #8, 9)
schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_06a_list_of_stems, for use in figuring out the diff in _plots_06_list_of_plots_with_stem_measurements
fix: inputs/SALVIAS/validations.sql: _plots_18_list_of_subplots_codes_for_each_plot_for_each_project: changed columns to match output query
fix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed types to match output query
bugfix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed summarizing column from mean_cover->totalpercentcover to match output query
bugfix: inputs/SALVIAS/validations.sql: _plots_10a_aggregate_observation_individual_counts: changed individual_id type to match output query
schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_10a_aggregate_observation_individual_counts, for use in debugging diffs in _plots_10_count_of_individuals_per_plot_in_each_proj
fix: inputs/SALVIAS/validations.sql: renamed SiteCode to plot_code to match output queries
inputs/SALVIAS/validations.sql: use plot_code instead of plotcode for easier readability
bugfix: *.sql: public.source_by_shortname(): need to wrap it in a nested SELECT because Postgres incorrectly does not constant-fold (inline) it, leading to a slowdown when it is therefore run many times. this is done using the steps at wiki.vegpath.org/Postgres_queries#wrap-function-call-in-nested-SELECT .