/ - Changes - BIEN 3 - NCEAS Projects

root @ 7238

#	Date	Author	Comment
7238	01/16/2013 06:09 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Do the tnrs_canon joins manually instead of using tnrs_canon, to allow PostgreSQL to use a nested loop join on just the needed tnrs rows instead of a hash self-join of all tnrs rows. The query planner is not yet advanced enough to automatically integrate the select on the view into the top-level joins list, which would make this change automatically.
7237	01/16/2013 05:52 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Look at last 100 rows instead of last 10, because rows are added to the log file each time the script waits and the Inserted # new rows message must be in the tailed rows
7236	01/16/2013 05:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Fixed bug where need to test if log file exists before using it in tail, because if tail fails and causes rowsAdded to return false, this error exit status will be indistinguishable from false for no rows added and the script will keep going
7235	01/16/2013 05:40 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need special handling to support being run as a .make script
7234	01/16/2013 03:35 AM	Aaron Marcuse-Kubitza	input.Makefile: Editing import: Added unscrub to remove TNRS taxondeterminations
7233	01/16/2013 03:34 AM	Aaron Marcuse-Kubitza	psql_script_vegbien: Added no_query_results option to hide results of calls to void functions
7232	01/16/2013 03:33 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added delete_scrubbed_taxondeterminations()
7231	01/16/2013 01:43 AM	Aaron Marcuse-Kubitza	root Makefile: python-Darwin: Added instructions to install dateutil for Python 3 as well as Python 2, for use in PL/Python functions
7230	01/16/2013 01:42 AM	Aaron Marcuse-Kubitza	root Makefile: python-Darwin: Added note that Python 2 comes preinstalled
7229	01/16/2013 01:15 AM	Aaron Marcuse-Kubitza	Added inputs/GBIF/Specimen/postprocess.sql to remove institutions that we have direct data for
7228	01/15/2013 10:42 PM	Aaron Marcuse-Kubitza	import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.
7227	01/14/2013 05:21 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Removed no longer used array_to_string(). The IMMUTABLE wrapper is only needed for index conditions and other places that require an IMMUTABLE function.
7226	01/14/2013 05:14 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7225	01/14/2013 05:13 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7224	01/14/2013 05:12 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: Checking free disk space: Updated import schema size to 110GB
7223	01/14/2013 04:37 PM	Aaron Marcuse-Kubitza	Added inputs/Madidi/_README.TXT
7222	01/14/2013 04:35 PM	Aaron Marcuse-Kubitza	new_terms.csv: Regenerated
7221	01/14/2013 04:34 PM	Aaron Marcuse-Kubitza	inputs/Madidi/new_terms.csv: Regenerated
7220	01/14/2013 04:19 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_archive/2010-1-2/: Set svn:ignore
7219	01/14/2013 04:18 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_README.TXT: Archived to _archive/2010-1-2/
7218	01/14/2013 03:43 PM	Aaron Marcuse-Kubitza	inputs/Madidi/: Refreshed. Note that new export has a completely new schema.
7217	01/14/2013 03:42 PM	Aaron Marcuse-Kubitza	inputs/Madidi/: Refreshed. Note that new export has a completely new schema.
7216	01/14/2013 01:53 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7215	01/14/2013 01:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: fieldNumber (authorEventCode): Fixed bug where locationevent.authorlocationcode should be authoreventcode
7214	01/14/2013 12:19 PM	Aaron Marcuse-Kubitza	Added inputs/Madidi/map.csv, created from new_terms.csv
7213	01/14/2013 12:16 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_archive/: Set svn:ignore
7212	01/14/2013 12:15 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields
7211	01/14/2013 12:13 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation
7210	01/14/2013 11:43 AM	Aaron Marcuse-Kubitza	input.Makefile: Configuration: $(exts): Added .dat, which the new Madidi files use
7209	01/14/2013 08:39 AM	Aaron Marcuse-Kubitza	mappings/Makefile: VegCore.tables.csv: Removed no longer needed removal of Namespaces table, which is now marked as just a section, not a table
7208	01/14/2013 08:37 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7207	01/14/2013 07:39 AM	Aaron Marcuse-Kubitza	Added to_do/timeline.2013.xls (from Brad, converted to .xls)
7206	01/14/2013 07:30 AM	Aaron Marcuse-Kubitza	to_do/timeline.doc: Renamed to timeline.2012.doc to allow for a separate 2013 timeline
7205	01/11/2013 05:05 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: Deleting imports before the last: Added instructions to keep a previous import instead of deleting it
7204	01/11/2013 04:22 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(logInstall): Always log the installation, regardless of the $log env var, because $log is set by default on development machines but an install log should still be created
7203	01/11/2013 01:03 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Regenerated exports
7202	01/11/2013 10:19 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to handle the case where (SELECT source.source_id FROM source WHERE source.shortname = 'TNRS') is NULL because no TNRS names have been imported yet
7201	01/11/2013 09:44 AM	Aaron Marcuse-Kubitza	/new_terms.csv, /unmapped_terms.csv: Regenerated using `make missing_mappings`
7200	01/11/2013 09:19 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: morphoname: Remapped to the original rather than current taxondetermination because this is the original name applied by the author
7199	01/11/2013 09:16 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS*/Organism/map.csv: Remapped voucher_string/coll_number to recordNumber instead of catalogNumber, because this number is actually applied by the collector rather than by a herbarium
7198	01/11/2013 09:11 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped recordNumber to new specimenreplicate.collectionnumber
7197	01/11/2013 09:02 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Also map recordNumber (collectionnumber) to the indirect voucher's specimenreplicate
7196	01/11/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped recordNumber to new individualCode where applicable
7195	01/11/2013 08:44 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped individualCode. authortaxoncode: Prefer tag over recordNumber (collectionnumber), because this applies to the plant rather than the specimen.
7194	01/11/2013 08:17 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped morphoname
7193	01/11/2013 08:16 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7192	01/11/2013 08:14 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7191	01/11/2013 08:04 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonverbatim: Added morphoname (which is different from the morphospecies suffix)
7190	01/11/2013 07:33 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.
7189	01/11/2013 07:28 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.
7188	01/11/2013 07:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Added collectionnumber
7187	01/11/2013 07:17 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel: Removed no longer used matched_label_fit_fraction. Use taxondetermination.taxonfit instead.
7186	01/11/2013 07:02 AM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: Restored inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB
7185	01/11/2013 06:55 AM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Expanded analytical_stem to fit the width of all fields
7184	01/11/2013 06:53 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxondetermination: taxondetermination_computer_min_fit CHECK constraint: Fixed bug where need to use CASE instead of OR when a branch of an OR shouldn't be evaluated, because PostgreSQL doesn't support short-circuit OR
7183	01/11/2013 06:38 AM	Aaron Marcuse-Kubitza	README.TXT: Debugging: Added instructions for "binary chop" debugging, which requires syncing the DB schema to the svn working copy
7182	01/11/2013 06:08 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed no longer used mappings for verbatimScientificName in _if conditions
7181	01/11/2013 06:08 AM	Aaron Marcuse-Kubitza	inputs/.NCBI/nodes/test.xml.ref: Restored inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB
7180	01/11/2013 06:06 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): DuplicateKeyException: Uniquifying input table to avoid internal duplicate keys: Also filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes
7179	01/11/2013 06:01 AM	Aaron Marcuse-Kubitza	sql.py: distinct_table(): Added support for custom joins used in creating the new table. This can then be used by sql_io.put_table() to filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes.
7178	01/11/2013 05:53 AM	Aaron Marcuse-Kubitza	README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Added step to reinstall public schema first, to reset the sequences so that they don't create a diff when the new steps.by_col.log.sql is committed
7177	01/11/2013 05:48 AM	Aaron Marcuse-Kubitza	Added inputs/ACAD/Specimen/logs/steps.by_col.log.sql
7176	01/11/2013 05:45 AM	Aaron Marcuse-Kubitza	sql_gen.py: Join: Added support for mapping values which are lists, for use in USING joins
7175	01/11/2013 05:40 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS//test.xml.ref: Restored SALVIAS inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB
7174	01/11/2013 05:01 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem: Added locationName (authorPlotCode), subplot, individualCode (authorPlantCode) for use in validation
7173	01/11/2013 04:57 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: sync_analytical_stem_to_view(): Drop and re-create dependent objects to avoid errors that analytical_stem can't be dropped because of dependents
7172	01/11/2013 04:56 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: sync_analytical_stem_to_view(): Changed to PL/pgSQL function to allow adding PL/pgSQL commands
7171	01/11/2013 03:26 AM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Moved family_higher_plant_group to leave room for analytical_stem to expand
7170	01/11/2013 03:08 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed no longer used mappings for verbatimScientificName in _if conditions
7169	01/11/2013 02:59 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed taxonlabel for original taxondetermination, because the original taxondetermination is not scrubbed by scrub.make (only the most current taxondetermination gets scrubbed, because only a single scrubbed determination is added by scrub.make). This still leaves the original taxondetermination's taxonverbatim, which stores the taxonomic information for historical purposes.
7168	01/11/2013 02:44 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed no longer used accepted and verbatim (parsed) taxonlabels, which have been replaced by a single accepted or matched taxondetermination created by scrub.make
7167	01/11/2013 02:34 AM	Aaron Marcuse-Kubitza	Removed no longer used inputs/.TNRS/tnrs_accepted, tnrs_other. Use the tnrs_canon view instead.
7166	01/11/2013 02:22 AM	Aaron Marcuse-Kubitza	Removed no longer used inputs/.TNRS/tnrs_accepted, tnrs_other. Use the tnrs_canon view instead.
7165	01/11/2013 02:18 AM	Aaron Marcuse-Kubitza	Added inputs/.TNRS/_archive/
7164	01/11/2013 02:18 AM	Aaron Marcuse-Kubitza	Added inputs/.TNRS/tnrs/cleanup.sql to prevent running the default cleanup operations, which don't work on tables which have views referencing them (as is the case for tnrs, which is referenced by tnrs_canon)
7163	01/11/2013 02:07 AM	Aaron Marcuse-Kubitza	import_all: Removed no longer needed TNRS import, which has been replaced by scrub.make (which adds TNRS taxondeterminations after the import instead of creating taxonlabel links before it)
7162	01/11/2013 02:03 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Removed TNRS input taxonlabels meant to cross-link to taxonlabels added by the TNRS import, because TNRS taxondeterminations are now created instead
7161	01/11/2013 01:42 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Use just the main taxonlabel created by scrub.make instead of all the additional taxonlabels created by the TNRS import
7160	01/11/2013 01:11 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: main taxonverbatim.morphospecies "if has verbatim name" condition: Fixed bug where need to remove the taxonIsCanonical flag, because the TNRS.public.unscrubbed_taxondetermination_view table (which uses this flag) should include this field (although not other places where the morphospecies is stored by other TNRS tables)
7159	01/11/2013 12:49 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxondetermination: taxondetermination_set_iscurrent() trigger: Also run on delete, to mark another taxondetermination as the current one when a current taxondetermination is deleted
7158	01/11/2013 12:18 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_canon: Annotations: Always use value from the matched name, because the accepted name does not have this
7157	01/11/2013 12:05 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: primary taxonlabel's parent taxonlabel: Fixed bug where a taxonverbatim was incorrectly being created solely to store the taxonRank, even though it was already stored in the taxonlabel's rank field
7156	01/10/2013 11:52 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Don't map morphospecies to the parsed taxonlabel's taxonepithet, because this causes an extra, parsed taxonlabel to be created for TNRS.public.unscrubbed_taxondetermination_view. It is not needed by the other TNRS tables.
7155	01/10/2013 11:45 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/map.csv: Omit Infraspecific_rank to help avoid creating a separate, parsed taxonlabel. Don't map to taxonRank because Name_matched_rank is populated more often.
7154	01/10/2013 11:34 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Reduced $maxPause to 4 hr, because new taxondeterminations are being added throughout the import, so it is unlikely that more than more than 4 hr would pass between successive imports of taxondeterminations (causing scrub.make to stop prematurely)
7153	01/10/2013 11:23 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Removed no longer used tnrs+accepted. Use tnrs_canon or a self-join of tnrs instead
7152	01/10/2013 11:22 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: tnrs_input_name: Use TNRS.tnrs directly instead of the now-deprecated tnrs+accepted
7151	01/10/2013 11:12 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Use new TNRS.tnrs_canon instead of tnrs+accepted to avoid creating additional taxonlabels for the parsed, matched, and accepted names and instead just use the most-canonicalized name of the names output by TNRS (the accepted name if available, or the matched name otherwise)
7150	01/10/2013 10:50 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: "if has verbatim name" _if statements that filter something out for TNRS mappings: Also assume true if taxonIsCanonical is specified, because some TNRS tables (eventually such as public.unscrubbed_taxondetermination_view) do not specify a separate "verbatim" taxondetermination but do provide taxonIsCanonical as a flag to turn various mappings on and off
7149	01/10/2013 09:06 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Remapped matched*Fit_fraction to taxondetermination.taxonfit when a taxondetermination, not just a taxonlabel, is provided
7148	01/10/2013 09:03 PM	Aaron Marcuse-Kubitza	bin/map: map_table(): Resolving prefixes: Fixed bug where need to use list instead of tuple for metadata value mappings
7147	01/10/2013 08:16 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxondetermination: Added CHECK constraint to allow only taxondeterminations with a minimum fit fraction of 80%, analogous to taxonlabel's taxonlabel_1_matched_label_min_fit() trigger
7146	01/09/2013 05:34 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Don't create a separate TNRS input taxonlabel if taxonIsCanonical exists
7145	01/09/2013 05:24 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_canon: Fixed bug where need to always use Unmatched_terms from tnrs rather than tnrs_accepted
7144	01/09/2013 05:07 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Added tnrs_canon, which stores the most canonicalized name output by TNRS
7143	01/09/2013 04:17 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: accepted_taxonverbatim: Fixed bug where need to join only to the taxonverbatim whose morphospecies is NULL, to avoid joining to multiple taxonverbatims at once. This extra filter is now needed because there can be multiple taxonverbatims for a taxonlabel with different morphospecies.
7142	01/09/2013 03:59 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: taxonlabel.taxonomicname: Prepend the family to the rest of the name using new _merge_prefix() instead of _join_words()/_nullIf(), so that any input taxonomic name that includes the family will not have the family duplicated in the combined taxonomic name. Previously, the duplication was removed only when the rest of the input name was equal to the family. This change fixes a bug in the new TNRS import where a pre-concatenated taxonomic name (Accepted_scientific_name) which includes the family is now used instead of Accepted_name, which only includes it when it's equal to the family.
7141	01/09/2013 03:52 PM	Aaron Marcuse-Kubitza	xml_func.py: Simplifying functions: Merging: Added _merge_prefix() passthru
7140	01/09/2013 03:33 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Added _merge_prefix()
7139	01/09/2013 02:42 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_accepted_scientific_name(): Fixed bug where Accepted_name_family shouldn't be prefixed to Accepted_name if Accepted_name is itself the family, to avoid duplicating the family in the Accepted_scientific_name

Project

General

Profile