/trunk/inputs - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs @ 13130

svn:ignore: .~*

#	Date	Author	Comment
13130	04/14/2014 04:51 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
13129	04/14/2014 04:48 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
13126	04/14/2014 03:58 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
13125	04/14/2014 03:23 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
13124	04/14/2014 03:22 PM	Aaron Marcuse-Kubitza	inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
13123	04/10/2014 04:06 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
13115	04/10/2014 02:24 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
13113	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
13112	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
13111	04/10/2014 01:17 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
13110	04/10/2014 12:59 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
13109	04/10/2014 12:57 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
13107	04/10/2014 12:03 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on _of_unique_verb_subsp_taxa_with_author instead)
13095	04/10/2014 03:45 AM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
13089	04/10/2014 02:34 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
13087	04/10/2014 02:07 AM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
13086	04/09/2014 08:19 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
13075	04/08/2014 03:49 PM	Aaron Marcuse-Kubitza	fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
13070	04/08/2014 01:40 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
13068	04/08/2014 01:19 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
13067	04/08/2014 12:49 PM	Aaron Marcuse-Kubitza	inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
13065	04/07/2014 06:19 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
13056	04/07/2014 09:47 AM	Aaron Marcuse-Kubitza	inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
13055	04/04/2014 06:13 PM	Aaron Marcuse-Kubitza	added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
13042	04/02/2014 05:21 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
13041	04/02/2014 05:16 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_species_excluding_author: renamed to _species_binomials for clarity
13040	04/02/2014 05:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
13038	04/02/2014 04:38 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
13037	04/02/2014 04:09 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_subspecific_taxa_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13035	04/02/2014 03:54 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _verbatim_species_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13018	04/01/2014 01:29 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: added $(nice) and use it everywhere its definition is used
12993	03/30/2014 06:12 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
12992	03/30/2014 06:08 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: import: validate at the end of the import
12991	03/30/2014 06:02 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: added new-style aggregating validations (`validate` target)
12988	03/30/2014 05:41 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
12985	03/30/2014 05:11 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
12968	03/29/2014 04:06 AM	Aaron Marcuse-Kubitza	*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
12967	03/29/2014 03:58 AM	Aaron Marcuse-Kubitza	lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
12963	03/28/2014 02:39 AM	Aaron Marcuse-Kubitza	fix: inputs///map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
12962	03/28/2014 02:34 AM	Aaron Marcuse-Kubitza	fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
12961	03/28/2014 02:32 AM	Aaron Marcuse-Kubitza	fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
12960	03/28/2014 02:03 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
12958	03/28/2014 01:29 AM	Aaron Marcuse-Kubitza	inputs/XAL/Specimen/header.csv: updated
12922	03/27/2014 03:36 AM	Aaron Marcuse-Kubitza	added inputs/NY/validations.sql
12920	03/27/2014 03:31 AM	Aaron Marcuse-Kubitza	bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
12919	03/27/2014 03:27 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
12891	03/25/2014 04:18 AM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
12886	03/24/2014 05:35 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
12885	03/24/2014 05:32 PM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): updated runtime (25 min)
12882	03/24/2014 05:02 PM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): updated runtime (20 min)
12879	03/24/2014 01:49 AM	Aaron Marcuse-Kubitza	mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes\|make inputs/{NVS,SALVIAS,TEAM}/test`.
12873	03/23/2014 11:43 PM	Aaron Marcuse-Kubitza	bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.
12869	03/22/2014 05:56 AM	Aaron Marcuse-Kubitza	inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
12867	03/22/2014 05:06 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
12795	03/21/2014 02:16 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
12794	03/21/2014 12:34 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
12793	03/21/2014 12:31 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
12792	03/21/2014 12:02 AM	Aaron Marcuse-Kubitza	inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)
12791	03/20/2014 11:54 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
12790	03/20/2014 11:54 PM	Aaron Marcuse-Kubitza	inputs/VegBank/projectcontributor_/test.xml.ref: updated inserted row count
12788	03/20/2014 10:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/import_order.txt: added missing project, needed to trigger the staging table renaming for the project table
12787	03/20/2014 10:42 PM	Aaron Marcuse-Kubitza	inputs/VegBank/run: documented `rm=1 import()` runtime (>1.5 h)
12786	03/20/2014 10:40 PM	Aaron Marcuse-Kubitza	inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
12785	03/20/2014 08:27 PM	Aaron Marcuse-Kubitza	inputs/MO/Specimen/test.xml.ref: updated, which adds dateCollected mappings
12784	03/20/2014 08:20 PM	Aaron Marcuse-Kubitza	inputs/WIN/Specimen/test.xml.ref: updated to map.csv, which has eventDate->dateCollected
12783	03/20/2014 08:13 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
12779	03/20/2014 07:58 PM	Aaron Marcuse-Kubitza	*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
12776	03/20/2014 07:47 PM	Aaron Marcuse-Kubitza	inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
12758	03/18/2014 05:47 PM	Aaron Marcuse-Kubitza	bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
12757	03/18/2014 05:41 PM	Aaron Marcuse-Kubitza	inputs/CVS/plot_/postprocess.sql: add unique constraint on locationName (analogous to the unique constraint in plot), for use by inputs/VegBank/plot/postprocess.sql in removing inter-datasource duplication
12753	03/18/2014 05:10 PM	Aaron Marcuse-Kubitza	inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
12752	03/18/2014 05:34 AM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): documented runtime (30 min)
12751	03/18/2014 05:16 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/postprocess.sql: don't perform replacements using map.csv, because map.csv is not idempotent. this functionality was only there to facilitate switching to new-style import, which is now largely done. (the remaining datasources NVS, SALVIAS, TEAM contain only 1 postprocess.sql: inputs/SALVIAS/projects/postprocess.sql (`st inputs/{NVS,SALVIAS,TEAM}/*/postprocess.sql`).)
12747	03/18/2014 04:33 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: %/postprocess.sql: always run this, not just if the associated map spreadsheets change, to avoid needing to `touch` them to cause %/postprocess.sql to run
12745	03/18/2014 04:24 AM	Aaron Marcuse-Kubitza	fix: inputs///postprocess.sql: un-doubled *
12744	03/18/2014 04:06 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/postprocess.sql: also need to apply renames from mappings/VegCore.thesaurus.csv, as these have been applied to map.csv
12714	03/14/2014 07:35 PM	Aaron Marcuse-Kubitza	added inputs/run, which runs all the inputs' runscripts using the new auto-forwarding
12703	03/14/2014 05:25 PM	Aaron Marcuse-Kubitza	removed unused inputs/table.run. inputs/*/table.run include lib/runscripts/table.run directly.
12679	03/13/2014 05:03 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/validations.sql: implemented _plots_19_count_of_censuses_per_plot_in_each_project
12638	03/11/2014 09:56 PM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/validations.sql: plots_07_list_of_plots_with_counts_of_individuals_per_species: renamed to _plots_07_list_of_plotswhich_use_... because this query is not intended to include the actual counts, just to say which plots have them (the correct "which use" wording is also used in queries #8, 9)
12635	03/07/2014 10:49 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_06a_list_of_stems, for use in figuring out the diff in _plots_06_list_of_plots_with_stem_measurements
12605	03/06/2014 08:52 AM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/validations.sql: _plots_18_list_of_subplots_codes_for_each_plot_for_each_project: changed columns to match output query
12603	03/06/2014 08:29 AM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed types to match output query
12602	03/06/2014 08:14 AM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed summarizing column from mean_cover->totalpercentcover to match output query
12601	03/06/2014 08:12 AM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/validations.sql: _plots_10a_aggregate_observation_individual_counts: changed individual_id type to match output query
12596	03/06/2014 12:07 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_10a_aggregate_observation_individual_counts, for use in debugging diffs in _plots_10_count_of_individuals_per_plot_in_each_proj
12538	02/27/2014 07:56 PM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/validations.sql: renamed SiteCode to plot_code to match output queries
12526	02/27/2014 06:58 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/validations.sql: use plot_code instead of plotcode for easier readability
12516	02/27/2014 01:27 PM	Aaron Marcuse-Kubitza	bugfix: *.sql: public.source_by_shortname(): need to wrap it in a nested SELECT because Postgres incorrectly does not constant-fold (inline) it, leading to a slowdown when it is therefore run many times. this is done using the steps at wiki.vegpath.org/Postgres_queries#wrap-function-call-in-nested-SELECT .
12508	02/26/2014 11:58 PM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/validations.sql: plotMetadata.SiteCode: need to match types with the output query column
12417	02/24/2014 10:51 PM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/validations.sql: _plots_02_list_of_project_names: altered column aliases to match output query
12407	02/24/2014 08:58 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/validations.sql: added Brad's comments from validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.VegCore.sql
12406	02/24/2014 08:53 AM	Aaron Marcuse-Kubitza	added inputs/SALVIAS/validations*.sql
12367	02/23/2014 12:13 PM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: _traits_08_taxonname_trait_and_value_for_first_5000_records: renamed to _traits_08_taxonname_trait_and_value because this actually includes all the records, not just the first 5000. this uses the new public_validations.rename_query_view() to rename all associated tables and views, including handling truncated names.
12286	02/17/2014 01:58 PM	Aaron Marcuse-Kubitza	bugfix: inputs/bien2_traits/validations.sql: _traits_01_count_records: changed column names to match public_validations._traits_01_count_records
12246	02/16/2014 04:22 PM	Aaron Marcuse-Kubitza	bugfix: inputs/bien2_traits/validations.sql: use a wrapper function for util.ifnull() so that the views don't get dropped when the util schema is reinstalled
12224	02/14/2014 03:09 PM	Aaron Marcuse-Kubitza	validation/aggregating//.sql, schemas/vegbien.sql, lib/runscripts/validations.pg.sql.run, inputs/bien2_traits/validations.sql: added _ to beginning of each view name so the validation views would sort at the top in the datasource's tables list. this will also make the validation result sets easily distinguishable from the data tables.
12221	02/14/2014 12:20 PM	Aaron Marcuse-Kubitza	added inputs/bien2_traits/validations.sql, from validation/aggregating/traits/BIEN2_traits/bien3_validations_traits_original_mysql.VegCore.sql
12220	02/14/2014 12:20 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: $(svnFilesGlob): added validations.sql
12213	02/14/2014 11:00 AM	Aaron Marcuse-Kubitza	added inputs/bien2_traits/validations.sql.run

Project

General

Profile