/trunk/inputs - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs @ 13503

svn:ignore: .~*

#	Date	Author	Comment
13503	05/21/2014 04:13 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: map_taxonomic_status(): need to use accepted name instead of scrubbed name (which also includes no-opinion names), as described at http://wiki.vegpath.org/2013-11-14_conference_call#taxonomic-fields. this used to be the accepted name, but got switched when the concatenated name was also used to store the matched name for no-opinion names.
13501	05/21/2014 01:27 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: documented how to modify it (using util.force_recreate())
13498	05/20/2014 05:46 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon, etc.: added accepted_morphospecies_binomial derived field
13444	05/13/2014 04:50 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon.Accepted_name_species: mapped to accepted_species_binomial
13443	05/13/2014 04:09 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: COMMENTs: always include newline before and after
13441	05/13/2014 03:46 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: undid rename of accepted name columns to scrubbed_* (r13435), because these are actually not the same (scrubbed_* is the combination of accepted and no-opinion names). the accepted name columns will now be named accepted_*, following the standard naming scheme.
13439	05/13/2014 03:13 AM	Aaron Marcuse-Kubitza	fix: inputs/.TNRS/schema.sql: taxon_scrub, etc.: scrubbed_: use columns from MatchedTaxon whenever possible, to as much as possible avoid the need to join to taxon_scrub.scrubbed_unique_taxon_name.
13437	05/13/2014 02:29 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.TNRS/grants.sql: added GRANT statements from schema.sql because these aren't run by `make inputs/.TNRS/reinstall`
13418	05/07/2014 07:17 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(datasrc_schema_exists): need to use $(datasrc), not $(schema), as $schema is only what this var is called in the runscripts
13417	05/07/2014 06:48 PM	Aaron Marcuse-Kubitza	bugfix: inputs/analytical_db/: need dummy table.run file to cause a schema to be created for this datasource
13416	05/07/2014 06:44 PM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: $(sortFile): don't print the "add any missing tables to $(sortFile)" message every time the Makefile is run
13415	05/07/2014 06:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: only run this for datasource dirs
13414	05/07/2014 05:18 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: install: use ./run's install target for clarity
13412	05/07/2014 04:56 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: made it idempotent (using new $(datasrc_schema_exists)) so that it could be run by `make install` on an existing system
13411	05/07/2014 04:02 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(datasrc_schema_exists): need to use $(shell ...)
13410	05/07/2014 03:31 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: added $(datasrc_schema_exists)
13402	05/03/2014 02:03 PM	Aaron Marcuse-Kubitza	added inputs/VegBank/verify/outputBien.log.url
13401	05/03/2014 02:03 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add: verify/: also svn:ignore *.log
13375	05/01/2014 01:58 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/postprocess: invoke runscript if it exists
13374	05/01/2014 01:37 PM	Aaron Marcuse-Kubitza	lib/runscripts/validations.pg.sql.run: export_(): make the export idempotent for easier re-runnability
13372	05/01/2014 01:29 PM	Aaron Marcuse-Kubitza	fix: lib/runscripts/file.pg.sql.run: removed include of in_datasrc_dir.run, because this location does not apply to all .sql export scripts
13371	05/01/2014 01:15 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: validations.sql must be in a subdir so it won't get run by sql/install
13370	05/01/2014 01:11 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: validations.sql must be in a subdir so it won't get run by sql/install
13369	05/01/2014 05:20 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: install: also run validate/install
13368	05/01/2014 04:44 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: added validate/install
13367	05/01/2014 04:09 AM	Aaron Marcuse-Kubitza	lib/runscripts/validations.pg.sql.run: export_(): make the export idempotent for easier re-runnability
13366	05/01/2014 03:22 AM	Aaron Marcuse-Kubitza	bugfix: inputs/SALVIAS/validations.sql: need to cast character varying to text so that the types of each side of if() match
13357	04/30/2014 05:46 PM	Aaron Marcuse-Kubitza	bugfix: **/postprocess.sql: don't use the public schema, because this creates an unsatisfied dependency while the database is being installed, and breaks `make install`
13316	04/24/2014 05:29 PM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/.rsync_ignore: added GBIFPortalDB-*.data.sql.gz, because these are intermediate files
13195	04/19/2014 10:14 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_src/: set svn:ignore
13164	04/17/2014 08:21 PM	Aaron Marcuse-Kubitza	fix: inputs/SALVIAS/projects/postprocess.sql: remove private data that should not be publicly visible: preserve datasets with ipr_specific = '', because they are actually redistributable, according to Brad (http://wiki.vegpath.org/2014-04-17_conference_call#conditions-of-use)
13152	04/16/2014 10:49 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: updated filter condition to match output query
13151	04/16/2014 10:48 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (8 min, with added queries)
13150	04/16/2014 10:24 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
13149	04/16/2014 06:41 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
13147	04/16/2014 04:24 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13*: also need to include coordinate pairs which have one of their coordinates NULL, by using OR instead of AND
13146	04/16/2014 04:15 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_13b_list_of_all_decimal_lat_long: matched column types to output query
13145	04/16/2014 04:14 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_13a_list_of_all_verbatim_lat_long: matched column types to output query
13144	04/16/2014 03:13 PM	Aaron Marcuse-Kubitza	inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: added breakdowns _specimens_13a_list_of_all_verbatim_lat_long, _specimens_13b_list_of_all_decimal_lat_long to help troubleshoot the diff
13143	04/16/2014 02:04 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: count lat/longs together instead of separately, because the DISTINCT is by coordinate pair, not individual coordinate value (which wouldn't make much sense)
13138	04/15/2014 06:52 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric
13137	04/15/2014 06:29 PM	Aaron Marcuse-Kubitza	inputs/NY/validations.-.util.sql: added util.is_castable() wrapper
13130	04/14/2014 04:51 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
13129	04/14/2014 04:48 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
13126	04/14/2014 03:58 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
13125	04/14/2014 03:23 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
13124	04/14/2014 03:22 PM	Aaron Marcuse-Kubitza	inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
13123	04/10/2014 04:06 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
13115	04/10/2014 02:24 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
13113	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
13112	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
13111	04/10/2014 01:17 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
13110	04/10/2014 12:59 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
13109	04/10/2014 12:57 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
13107	04/10/2014 12:03 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on _of_unique_verb_subsp_taxa_with_author instead)
13095	04/10/2014 03:45 AM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
13089	04/10/2014 02:34 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
13087	04/10/2014 02:07 AM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
13086	04/09/2014 08:19 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
13075	04/08/2014 03:49 PM	Aaron Marcuse-Kubitza	fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
13070	04/08/2014 01:40 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
13068	04/08/2014 01:19 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
13067	04/08/2014 12:49 PM	Aaron Marcuse-Kubitza	inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
13065	04/07/2014 06:19 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
13056	04/07/2014 09:47 AM	Aaron Marcuse-Kubitza	inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
13055	04/04/2014 06:13 PM	Aaron Marcuse-Kubitza	added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
13042	04/02/2014 05:21 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
13041	04/02/2014 05:16 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_species_excluding_author: renamed to _species_binomials for clarity
13040	04/02/2014 05:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
13038	04/02/2014 04:38 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
13037	04/02/2014 04:09 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_subspecific_taxa_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13035	04/02/2014 03:54 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _verbatim_species_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13018	04/01/2014 01:29 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: added $(nice) and use it everywhere its definition is used
12993	03/30/2014 06:12 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
12992	03/30/2014 06:08 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: import: validate at the end of the import
12991	03/30/2014 06:02 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: added new-style aggregating validations (`validate` target)
12988	03/30/2014 05:41 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
12985	03/30/2014 05:11 PM	Aaron Marcuse-Kubitza	added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
12968	03/29/2014 04:06 AM	Aaron Marcuse-Kubitza	*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
12967	03/29/2014 03:58 AM	Aaron Marcuse-Kubitza	lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
12963	03/28/2014 02:39 AM	Aaron Marcuse-Kubitza	fix: inputs///map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
12962	03/28/2014 02:34 AM	Aaron Marcuse-Kubitza	fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
12961	03/28/2014 02:32 AM	Aaron Marcuse-Kubitza	fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
12960	03/28/2014 02:03 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
12958	03/28/2014 01:29 AM	Aaron Marcuse-Kubitza	inputs/XAL/Specimen/header.csv: updated
12922	03/27/2014 03:36 AM	Aaron Marcuse-Kubitza	added inputs/NY/validations.sql
12920	03/27/2014 03:31 AM	Aaron Marcuse-Kubitza	bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
12919	03/27/2014 03:27 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
12891	03/25/2014 04:18 AM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
12886	03/24/2014 05:35 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
12885	03/24/2014 05:32 PM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): updated runtime (25 min)
12882	03/24/2014 05:02 PM	Aaron Marcuse-Kubitza	inputs/run: postprocess(): updated runtime (20 min)
12879	03/24/2014 01:49 AM	Aaron Marcuse-Kubitza	mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes\|make inputs/{NVS,SALVIAS,TEAM}/test`.
12873	03/23/2014 11:43 PM	Aaron Marcuse-Kubitza	bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.
12869	03/22/2014 05:56 AM	Aaron Marcuse-Kubitza	inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
12867	03/22/2014 05:06 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
12795	03/21/2014 02:16 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
12794	03/21/2014 12:34 AM	Aaron Marcuse-Kubitza	fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
12793	03/21/2014 12:31 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
12792	03/21/2014 12:02 AM	Aaron Marcuse-Kubitza	inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)

Project

General

Profile