inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
lib/common.Makefile: added $(nice) and use it everywhere its definition is used
inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
inputs/input.Makefile: import: validate at the end of the import
inputs/input.Makefile: added new-style aggregating validations (`validate` target)
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
fix: inputs/*/*/map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
inputs/XAL/Specimen/header.csv: updated
added inputs/NY/validations*.sql*
bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
inputs/run: postprocess(): updated runtime (25 min)
inputs/run: postprocess(): updated runtime (20 min)
mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes|make inputs/{NVS,SALVIAS,TEAM}/test`.
bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.
inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)
inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
inputs/VegBank/projectcontributor_/test.xml.ref: updated inserted row count
bugfix: inputs/VegBank/import_order.txt: added missing project, needed to trigger the staging table renaming for the project table
inputs/VegBank/run: documented `rm=1 import()` runtime (>1.5 h)
inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
inputs/MO/Specimen/test.xml.ref: updated, which adds dateCollected mappings
inputs/WIN/Specimen/test.xml.ref: updated to map.csv, which has eventDate->dateCollected
inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
inputs/CVS/plot_/postprocess.sql: add unique constraint on locationName (analogous to the unique constraint in plot), for use by inputs/VegBank/plot/postprocess.sql in removing inter-datasource duplication
inputs/run: postprocess(): documented runtime (30 min)
bugfix: inputs/input.Makefile: %/postprocess.sql: don't perform replacements using map.csv, because map.csv is not idempotent. this functionality was only there to facilitate switching to new-style import, which is now largely done. (the remaining datasources NVS, SALVIAS, TEAM contain only 1 postprocess.sql: inputs/SALVIAS/projects/postprocess.sql (`st inputs/{NVS,SALVIAS,TEAM}/*/postprocess.sql`).)
inputs/input.Makefile: %/postprocess.sql: always run this, not just if the associated map spreadsheets change, to avoid needing to `touch` them to cause %/postprocess.sql to run
fix: inputs/*/*/postprocess.sql: un-doubled *
bugfix: inputs/input.Makefile: %/postprocess.sql: also need to apply renames from mappings/VegCore.thesaurus.csv, as these have been applied to map.csv
added inputs/run, which runs all the inputs' runscripts using the new auto-forwarding
removed unused inputs/table.run. inputs/*/table.run include lib/runscripts/table.run directly.
inputs/SALVIAS/validations.sql: implemented _plots_19_count_of_censuses_per_plot_in_each_project
bugfix: inputs/SALVIAS/validations.sql: plots_07_list_of_plots_with_counts_of_individuals_per_species: renamed to _plots_07_list_of_plots*which_use*_... because this query is not intended to include the actual counts, just to say which plots have them (the correct "which use" wording is also used in queries #8, 9)
schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_06a_list_of_stems, for use in figuring out the diff in _plots_06_list_of_plots_with_stem_measurements
fix: inputs/SALVIAS/validations.sql: _plots_18_list_of_subplots_codes_for_each_plot_for_each_project: changed columns to match output query
fix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed types to match output query
bugfix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed summarizing column from mean_cover->totalpercentcover to match output query
bugfix: inputs/SALVIAS/validations.sql: _plots_10a_aggregate_observation_individual_counts: changed individual_id type to match output query
schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_10a_aggregate_observation_individual_counts, for use in debugging diffs in _plots_10_count_of_individuals_per_plot_in_each_proj
fix: inputs/SALVIAS/validations.sql: renamed SiteCode to plot_code to match output queries
inputs/SALVIAS/validations.sql: use plot_code instead of plotcode for easier readability
bugfix: *.sql: public.source_by_shortname(): need to wrap it in a nested SELECT because Postgres incorrectly does not constant-fold (inline) it, leading to a slowdown when it is therefore run many times. this is done using the steps at wiki.vegpath.org/Postgres_queries#wrap-function-call-in-nested-SELECT .
fix: inputs/SALVIAS/validations.sql: plotMetadata.SiteCode: need to match types with the output query column
fix: inputs/SALVIAS/validations.sql: _plots_02_list_of_project_names: altered column aliases to match output query
inputs/SALVIAS/validations.sql: added Brad's comments from validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.VegCore.sql
added inputs/SALVIAS/validations*.sql
fix: schemas/vegbien.sql: _traits_08_taxonname_trait_and_value_for_first_5000_records: renamed to _traits_08_taxonname_trait_and_value because this actually includes all the records, not just the first 5000. this uses the new public_validations.rename_query_view() to rename all associated tables and views, including handling truncated names.
bugfix: inputs/bien2_traits/validations.sql: _traits_01_count_records: changed column names to match public_validations._traits_01_count_records
bugfix: inputs/bien2_traits/validations.sql: use a wrapper function for util.ifnull() so that the views don't get dropped when the util schema is reinstalled
validation/aggregating/*/*.sql, schemas/vegbien.sql, lib/runscripts/validations.pg.sql.run, inputs/bien2_traits/validations.sql: added _ to beginning of each view name so the validation views would sort at the top in the datasource's tables list. this will also make the validation result sets easily distinguishable from the data tables.
added inputs/bien2_traits/validations.sql, from validation/aggregating/traits/BIEN2_traits/bien3_validations_traits_original_mysql.VegCore.sql
inputs/input.Makefile: $(svnFilesGlob): added validations.sql
added inputs/bien2_traits/validations.sql.run
inputs/import.stats.xls: updated import times
fix: inputs/VegBIEN/Redmine/wiki/.htaccess: redirect to new main page when accessed without trailing /
inputs/bien2_traits/TraitObservation/postprocess.sql: remove rows with no taxon name, which are invalid, and which helps simplify the aggregating validations queries
fix: inputs/VegBIEN/Redmine/svn/.htaccess: updated repository URL to point to trunk/
bugfix: inputs/SALVIAS/verify/plots.out.sql: fixed ' quoting syntax to use '' instead of \' to escape '
inputs/input.Makefile: verify/%.out: use a *.sql file in the verify/ directory itself to generate *.out, so that each datasource can have its own set of output queries. for datasources that should share the same set of queries, they can instead be symlinked to the same file.
fix: inputs/CVS/project/: added _no_import since this should not also be imported separately from taxon_observation.**
added inputs/XAL/Specimen/_no_import, since this is a demo-only datasource and there isn't a staging table for it
inputs/.geoscrub/county_centroids/test.xml.ref, inputs/.NCBI/{names.src,nodes.src}/test.xml.ref: accepted test outputs (generated now that these tables are in import_order.txt)
inputs/FIA/taxon_observation.**/header.csv: updated for new REF_RESEARCH_STATION.country metadata value col
inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
inputs/publishable datasources.xlsx: updated
fix: inputs/SALVIAS/projects/postprocess.sql: remove private data that should not be publicly visible: remove projects that do not have "There are no specific use conditions attached to this dataset"
fix: inputs/SALVIAS/salvias_plots.~.clean_up.sql: Remove private data that should not be publicly visible: also need to remove metadata-only plots
bugfix: inputs/SALVIAS/plotMetadata_/map.csv: things mapped to project_participant: remapped to event__participant because these actually relate to the event, not the project, even though they seem like project-related fields
fix: inputs/SALVIAS/plotMetadata_/map.csv, inputs/Madidi/LocationObservation/map.csv: things mapped to communityID: remapped to communityName, which is what's used in analytical_stem (communityID is for numeric IDs)
inputs/SALVIAS/plotMetadata_/create.sql, map.csv: expanded plot_administrator:party_code_party_ and mapped plot_administrator_name to a 2nd project_participant
mappings/VegCore-VegBIEN.csv: project_participant: use [!...] negative lookahead assertion so that multiple project_participant columns will properly map to separate projectcontributor rows