/ - Changes - BIEN 3 - NCEAS Projects

root @ 7282

#	Date	Author	Comment
7282	01/18/2013 05:21 AM	Aaron Marcuse-Kubitza	input.Makefile: Taxonomic scrubbing: Added scrub target and use it in import_scrub
7281	01/18/2013 05:18 AM	Aaron Marcuse-Kubitza	input.Makefile: Import to VegBIEN: Moved import, rm to top of section since they are top-level targets and don't depend on the variables defined for %/import
7280	01/18/2013 05:17 AM	Aaron Marcuse-Kubitza	input.Makefile: Moved rm to Import to VegBIEN section
7279	01/18/2013 05:16 AM	Aaron Marcuse-Kubitza	input.Makefile: Moved taxonomic scrubbing targets to separate Taxonomic scrubbing section
7278	01/18/2013 04:43 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
7277	01/18/2013 03:34 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: provider_count_view: Include only sources with at least one row. Currently (as of r7023), all entries in BIEN2's geoscrub.herbaria are also in VegBIEN, so the filter is not yet necessary, but switching to bien3_adb.ih could create source entries without data rows which should be excluded from the providers list.
7276	01/18/2013 03:25 AM	Aaron Marcuse-Kubitza	import_all: Output the PIDs of the import_scrub and after_import processes, so those processes can be managed without shell job control. This is useful if the connection is lost to the remote shell running the import, which prevents using job control on the import processes.
7275	01/18/2013 01:23 AM	Aaron Marcuse-Kubitza	input.Makefile: Import to VegBIEN: import_scrub: Run `make scrub` in the background, to allow the import to continue with the next table rather than having to wait for the current table to be scrubbed
7274	01/18/2013 12:53 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Moved waitself call to top of script
7273	01/18/2013 12:52 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times
7272	01/18/2013 12:24 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Added Postprocessing section for use with the next import
7271	01/18/2013 12:05 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times. Total does not yet include postprocessing.
7270	01/17/2013 11:29 PM	Aaron Marcuse-Kubitza	import_times: Add blank line before \"Postprocessing logs\" to separate it from the input logs
7269	01/17/2013 11:28 PM	Aaron Marcuse-Kubitza	import_times: Separate out the postprocessing logs (e.g. public.unscrubbed_taxondetermination_view), as the import times in these logs are not aggregated together (each input has its own run of the postprocessing script)
7268	01/16/2013 02:55 PM	Aaron Marcuse-Kubitza	root Makefile: Datasources: import: Use new import_scrub instead of import (input.Makefile)
7267	01/16/2013 02:51 PM	Aaron Marcuse-Kubitza	import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake
7266	01/16/2013 02:50 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need to use tnrs.make's lockfile instead because can't be importing while tnrs.make is scrubbing. tnrs.make leaves tnrs in an incomplete state while running because the accepted names are parsed after their matched names. Using a separate lockfile would cause some accepted names to be missing.
7265	01/16/2013 02:27 PM	Aaron Marcuse-Kubitza	input.Makefile: Import to VegBIEN: Added import_scrub, which runs `make scrub` after the import
7264	01/16/2013 02:26 PM	Aaron Marcuse-Kubitza	root Makefile: Datasources: Added scrub, which runs tnrs-remake and scrub-remake
7263	01/16/2013 02:18 PM	Aaron Marcuse-Kubitza	inputs/.TNRS//.make: Only allow one instance of the script to be running at any time, by using new waitself
7262	01/16/2013 02:15 PM	Aaron Marcuse-Kubitza	waitpid, lockfile: Changed $interval default to 5s to work with smaller imports, where less waiting is needed
7261	01/16/2013 02:14 PM	Aaron Marcuse-Kubitza	Added waitself
7260	01/16/2013 02:11 PM	Aaron Marcuse-Kubitza	bin/lockfile: Include the PID in the lockfile to avoid the need to manually remove lockfiles. On Mac, this requires using shlock instead of lockfile.
7259	01/16/2013 01:35 PM	Aaron Marcuse-Kubitza	Added bin/lockfile
7258	01/16/2013 01:34 PM	Aaron Marcuse-Kubitza	Added pid2name
7257	01/16/2013 01:33 PM	Aaron Marcuse-Kubitza	Added name2pids
7256	01/16/2013 01:33 PM	Aaron Marcuse-Kubitza	waitpid: Use `ps` instead of /proc to also work on Mac
7255	01/16/2013 01:07 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need special handling to support being run as a .make script
7254	01/16/2013 11:59 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim
7253	01/16/2013 11:57 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table
7252	01/16/2013 11:02 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to use tnrs_accepted.Name_submitted IS NOT NULL rather than tnrs_accepted.* IS NOT NULL, because tnrs_accepted.* (which plain tnrs_accepted gets changed to by PostgreSQL) checks each field of the tnrs_accepted tuple rather than checking if the tuple itself is NULL
7251	01/16/2013 10:23 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Added back tnrs+accepted view, which is useful for debugging the import of the TNRS results
7250	01/16/2013 09:21 AM	Aaron Marcuse-Kubitza	inputs/REMIB/Specimen/postprocess.sql: Added back ARIZ, NY because some REMIB specimens for these datasources are not yet in the datasources themselves
7249	01/16/2013 08:43 AM	Aaron Marcuse-Kubitza	Added inputs/REMIB/Specimen/postprocess.sql to remove institutions that we have direct data for
7248	01/16/2013 08:43 AM	Aaron Marcuse-Kubitza	Placed inputs/REMIB/_archive/ under version control
7247	01/16/2013 08:23 AM	Aaron Marcuse-Kubitza	Added inputs/SpeciesLink/Specimen/postprocess.sql to remove institutions that we have direct data for
7246	01/16/2013 08:21 AM	Aaron Marcuse-Kubitza	Placed inputs/SpeciesLink/_archive/ under version control
7245	01/16/2013 07:56 AM	Aaron Marcuse-Kubitza	input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev
7244	01/16/2013 07:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: Changed `WHERE COALESCE` to a join condition to enable using the taxondetermination_single_current_determination index, which produces the filtered rows directly. Note that this index will not be used for full-database imports, because the query planner uses hash joins everywhere instead of nested loops.
7243	01/16/2013 06:47 AM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Fixed bug where for views, shouldn't advance start (OFFSET clause) after each chunk, because views are typically dynamic and will contain a new set of rows after the first set is imported
7242	01/16/2013 06:41 AM	Aaron Marcuse-Kubitza	sql.py: Added view_exists()
7241	01/16/2013 06:16 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Removed no longer used tnrs_canon. unscrubbed_taxondetermination_view uses its definition directly instead.
7240	01/16/2013 06:14 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon
7239	01/16/2013 06:12 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon
7238	01/16/2013 06:09 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Do the tnrs_canon joins manually instead of using tnrs_canon, to allow PostgreSQL to use a nested loop join on just the needed tnrs rows instead of a hash self-join of all tnrs rows. The query planner is not yet advanced enough to automatically integrate the select on the view into the top-level joins list, which would make this change automatically.
7237	01/16/2013 05:52 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Look at last 100 rows instead of last 10, because rows are added to the log file each time the script waits and the Inserted # new rows message must be in the tailed rows
7236	01/16/2013 05:48 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Fixed bug where need to test if log file exists before using it in tail, because if tail fails and causes rowsAdded to return false, this error exit status will be indistinguishable from false for no rows added and the script will keep going
7235	01/16/2013 05:40 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need special handling to support being run as a .make script
7234	01/16/2013 03:35 AM	Aaron Marcuse-Kubitza	input.Makefile: Editing import: Added unscrub to remove TNRS taxondeterminations
7233	01/16/2013 03:34 AM	Aaron Marcuse-Kubitza	psql_script_vegbien: Added no_query_results option to hide results of calls to void functions
7232	01/16/2013 03:33 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added delete_scrubbed_taxondeterminations()
7231	01/16/2013 01:43 AM	Aaron Marcuse-Kubitza	root Makefile: python-Darwin: Added instructions to install dateutil for Python 3 as well as Python 2, for use in PL/Python functions
7230	01/16/2013 01:42 AM	Aaron Marcuse-Kubitza	root Makefile: python-Darwin: Added note that Python 2 comes preinstalled
7229	01/16/2013 01:15 AM	Aaron Marcuse-Kubitza	Added inputs/GBIF/Specimen/postprocess.sql to remove institutions that we have direct data for
7228	01/15/2013 10:42 PM	Aaron Marcuse-Kubitza	import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.
7227	01/14/2013 05:21 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: Removed no longer used array_to_string(). The IMMUTABLE wrapper is only needed for index conditions and other places that require an IMMUTABLE function.
7226	01/14/2013 05:14 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7225	01/14/2013 05:13 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7224	01/14/2013 05:12 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: Checking free disk space: Updated import schema size to 110GB
7223	01/14/2013 04:37 PM	Aaron Marcuse-Kubitza	Added inputs/Madidi/_README.TXT
7222	01/14/2013 04:35 PM	Aaron Marcuse-Kubitza	new_terms.csv: Regenerated
7221	01/14/2013 04:34 PM	Aaron Marcuse-Kubitza	inputs/Madidi/new_terms.csv: Regenerated
7220	01/14/2013 04:19 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_archive/2010-1-2/: Set svn:ignore
7219	01/14/2013 04:18 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_README.TXT: Archived to _archive/2010-1-2/
7218	01/14/2013 03:43 PM	Aaron Marcuse-Kubitza	inputs/Madidi/: Refreshed. Note that new export has a completely new schema.
7217	01/14/2013 03:42 PM	Aaron Marcuse-Kubitza	inputs/Madidi/: Refreshed. Note that new export has a completely new schema.
7216	01/14/2013 01:53 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
7215	01/14/2013 01:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: fieldNumber (authorEventCode): Fixed bug where locationevent.authorlocationcode should be authoreventcode
7214	01/14/2013 12:19 PM	Aaron Marcuse-Kubitza	Added inputs/Madidi/map.csv, created from new_terms.csv
7213	01/14/2013 12:16 PM	Aaron Marcuse-Kubitza	inputs/Madidi/_archive/: Set svn:ignore
7212	01/14/2013 12:15 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields
7211	01/14/2013 12:13 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation
7210	01/14/2013 11:43 AM	Aaron Marcuse-Kubitza	input.Makefile: Configuration: $(exts): Added .dat, which the new Madidi files use
7209	01/14/2013 08:39 AM	Aaron Marcuse-Kubitza	mappings/Makefile: VegCore.tables.csv: Removed no longer needed removal of Namespaces table, which is now marked as just a section, not a table
7208	01/14/2013 08:37 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7207	01/14/2013 07:39 AM	Aaron Marcuse-Kubitza	Added to_do/timeline.2013.xls (from Brad, converted to .xls)
7206	01/14/2013 07:30 AM	Aaron Marcuse-Kubitza	to_do/timeline.doc: Renamed to timeline.2012.doc to allow for a separate 2013 timeline
7205	01/11/2013 05:05 PM	Aaron Marcuse-Kubitza	README.TXT: Data import: Deleting imports before the last: Added instructions to keep a previous import instead of deleting it
7204	01/11/2013 04:22 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(logInstall): Always log the installation, regardless of the $log env var, because $log is set by default on development machines but an install log should still be created
7203	01/11/2013 01:03 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Regenerated exports
7202	01/11/2013 10:19 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to handle the case where (SELECT source.source_id FROM source WHERE source.shortname = 'TNRS') is NULL because no TNRS names have been imported yet
7201	01/11/2013 09:44 AM	Aaron Marcuse-Kubitza	/new_terms.csv, /unmapped_terms.csv: Regenerated using `make missing_mappings`
7200	01/11/2013 09:19 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: morphoname: Remapped to the original rather than current taxondetermination because this is the original name applied by the author
7199	01/11/2013 09:16 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS*/Organism/map.csv: Remapped voucher_string/coll_number to recordNumber instead of catalogNumber, because this number is actually applied by the collector rather than by a herbarium
7198	01/11/2013 09:11 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped recordNumber to new specimenreplicate.collectionnumber
7197	01/11/2013 09:02 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Also map recordNumber (collectionnumber) to the indirect voucher's specimenreplicate
7196	01/11/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs///map.csv: Remapped recordNumber to new individualCode where applicable
7195	01/11/2013 08:44 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped individualCode. authortaxoncode: Prefer tag over recordNumber (collectionnumber), because this applies to the plant rather than the specimen.
7194	01/11/2013 08:17 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped morphoname
7193	01/11/2013 08:16 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7192	01/11/2013 08:14 AM	Aaron Marcuse-Kubitza	mappings/VegCore.csv: Regenerated from wiki
7191	01/11/2013 08:04 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonverbatim: Added morphoname (which is different from the morphospecies suffix)
7190	01/11/2013 07:33 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.
7189	01/11/2013 07:28 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.
7188	01/11/2013 07:23 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimenreplicate: Added collectionnumber
7187	01/11/2013 07:17 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel: Removed no longer used matched_label_fit_fraction. Use taxondetermination.taxonfit instead.
7186	01/11/2013 07:02 AM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: Restored inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB
7185	01/11/2013 06:55 AM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: Expanded analytical_stem to fit the width of all fields
7184	01/11/2013 06:53 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxondetermination: taxondetermination_computer_min_fit CHECK constraint: Fixed bug where need to use CASE instead of OR when a branch of an OR shouldn't be evaluated, because PostgreSQL doesn't support short-circuit OR
7183	01/11/2013 06:38 AM	Aaron Marcuse-Kubitza	README.TXT: Debugging: Added instructions for "binary chop" debugging, which requires syncing the DB schema to the svn working copy

Project

General

Profile