Project

General

Profile

Statistics
| Revision:

# Date Author Comment
7273 01/18/2013 12:52 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

7272 01/18/2013 12:24 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Added Postprocessing section for use with the next import

7271 01/18/2013 12:05 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times. Total does not yet include postprocessing.

7270 01/17/2013 11:29 PM Aaron Marcuse-Kubitza

import_times: Add blank line before \"Postprocessing logs\" to separate it from the input logs

7269 01/17/2013 11:28 PM Aaron Marcuse-Kubitza

import_times: Separate out the postprocessing logs (e.g. public.unscrubbed_taxondetermination_view), as the import times in these logs are not aggregated together (each input has its own run of the postprocessing script)

7268 01/16/2013 02:55 PM Aaron Marcuse-Kubitza

root Makefile: Datasources: import: Use new import_scrub instead of import (input.Makefile)

7267 01/16/2013 02:51 PM Aaron Marcuse-Kubitza

import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake

7266 01/16/2013 02:50 PM Aaron Marcuse-Kubitza

inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need to use tnrs.make's lockfile instead because can't be importing while tnrs.make is scrubbing. tnrs.make leaves tnrs in an incomplete state while running because the accepted names are parsed after their matched names. Using a separate lockfile would cause some accepted names to be missing.

7265 01/16/2013 02:27 PM Aaron Marcuse-Kubitza

input.Makefile: Import to VegBIEN: Added import_scrub, which runs `make scrub` after the import

7264 01/16/2013 02:26 PM Aaron Marcuse-Kubitza

root Makefile: Datasources: Added scrub, which runs tnrs-remake and scrub-remake

7263 01/16/2013 02:18 PM Aaron Marcuse-Kubitza

inputs/.TNRS/*/*.make: Only allow one instance of the script to be running at any time, by using new waitself

7262 01/16/2013 02:15 PM Aaron Marcuse-Kubitza

waitpid, lockfile: Changed $interval default to 5s to work with smaller imports, where less waiting is needed

7261 01/16/2013 02:14 PM Aaron Marcuse-Kubitza

Added waitself

7260 01/16/2013 02:11 PM Aaron Marcuse-Kubitza

bin/lockfile: Include the PID in the lockfile to avoid the need to manually remove lockfiles. On Mac, this requires using shlock instead of lockfile.

7259 01/16/2013 01:35 PM Aaron Marcuse-Kubitza

Added bin/lockfile

7258 01/16/2013 01:34 PM Aaron Marcuse-Kubitza

Added pid2name

7257 01/16/2013 01:33 PM Aaron Marcuse-Kubitza

Added name2pids

7256 01/16/2013 01:33 PM Aaron Marcuse-Kubitza

waitpid: Use `ps` instead of /proc to also work on Mac

7255 01/16/2013 01:07 PM Aaron Marcuse-Kubitza

inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need special handling to support being run as a .make script

7254 01/16/2013 11:59 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim

7253 01/16/2013 11:57 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table

7252 01/16/2013 11:02 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to use tnrs_accepted.Name_submitted IS NOT NULL rather than tnrs_accepted.* IS NOT NULL, because tnrs_accepted.* (which plain tnrs_accepted gets changed to by PostgreSQL) checks each field of the tnrs_accepted tuple rather than checking if the tuple itself is NULL

7251 01/16/2013 10:23 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Added back tnrs+accepted view, which is useful for debugging the import of the TNRS results

7250 01/16/2013 09:21 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: Added back ARIZ, NY because some REMIB specimens for these datasources are not yet in the datasources themselves

7249 01/16/2013 08:43 AM Aaron Marcuse-Kubitza

Added inputs/REMIB/Specimen/postprocess.sql to remove institutions that we have direct data for

7248 01/16/2013 08:43 AM Aaron Marcuse-Kubitza

Placed inputs/REMIB/_archive/ under version control

7247 01/16/2013 08:23 AM Aaron Marcuse-Kubitza

Added inputs/SpeciesLink/Specimen/postprocess.sql to remove institutions that we have direct data for

7246 01/16/2013 08:21 AM Aaron Marcuse-Kubitza

Placed inputs/SpeciesLink/_archive/ under version control

7245 01/16/2013 07:56 AM Aaron Marcuse-Kubitza

input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev

7244 01/16/2013 07:23 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: Changed `WHERE COALESCE` to a join condition to enable using the taxondetermination_single_current_determination index, which produces the filtered rows directly. Note that this index will not be used for full-database imports, because the query planner uses hash joins everywhere instead of nested loops.

7243 01/16/2013 06:47 AM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Fixed bug where for views, shouldn't advance start (OFFSET clause) after each chunk, because views are typically dynamic and will contain a new set of rows after the first set is imported

7242 01/16/2013 06:41 AM Aaron Marcuse-Kubitza

sql.py: Added view_exists()

7241 01/16/2013 06:16 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Removed no longer used tnrs_canon. unscrubbed_taxondetermination_view uses its definition directly instead.

7240 01/16/2013 06:14 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon

7239 01/16/2013 06:12 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon

7238 01/16/2013 06:09 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unscrubbed_taxondetermination_view: Do the tnrs_canon joins manually instead of using tnrs_canon, to allow PostgreSQL to use a nested loop join on just the needed tnrs rows instead of a hash self-join of all tnrs rows. The query planner is not yet advanced enough to automatically integrate the select on the view into the top-level joins list, which would make this change automatically.

7237 01/16/2013 05:52 AM Aaron Marcuse-Kubitza

inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Look at last 100 rows instead of last 10, because rows are added to the log file each time the script waits and the Inserted # new rows message must be in the tailed rows

7236 01/16/2013 05:48 AM Aaron Marcuse-Kubitza

inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Fixed bug where need to test if log file exists before using it in tail, because if tail fails and causes rowsAdded to return false, this error exit status will be indistinguishable from false for no rows added and the script will keep going

7235 01/16/2013 05:40 AM Aaron Marcuse-Kubitza

inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need special handling to support being run as a .make script

7234 01/16/2013 03:35 AM Aaron Marcuse-Kubitza

input.Makefile: Editing import: Added unscrub to remove TNRS taxondeterminations

7233 01/16/2013 03:34 AM Aaron Marcuse-Kubitza

psql_script_vegbien: Added no_query_results option to hide results of calls to void functions

7232 01/16/2013 03:33 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added delete_scrubbed_taxondeterminations()

7231 01/16/2013 01:43 AM Aaron Marcuse-Kubitza

root Makefile: python-Darwin: Added instructions to install dateutil for Python 3 as well as Python 2, for use in PL/Python functions

7230 01/16/2013 01:42 AM Aaron Marcuse-Kubitza

root Makefile: python-Darwin: Added note that Python 2 comes preinstalled

7229 01/16/2013 01:15 AM Aaron Marcuse-Kubitza

Added inputs/GBIF/Specimen/postprocess.sql to remove institutions that we have direct data for

7228 01/15/2013 10:42 PM Aaron Marcuse-Kubitza

import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.

7227 01/14/2013 05:21 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: Removed no longer used array_to_string(). The IMMUTABLE wrapper is only needed for index conditions and other places that require an IMMUTABLE function.

7226 01/14/2013 05:14 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms

7225 01/14/2013 05:13 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms

7224 01/14/2013 05:12 PM Aaron Marcuse-Kubitza

README.TXT: Data import: Checking free disk space: Updated import schema size to 110GB

7223 01/14/2013 04:37 PM Aaron Marcuse-Kubitza

Added inputs/Madidi/_README.TXT

7222 01/14/2013 04:35 PM Aaron Marcuse-Kubitza

new_terms.csv: Regenerated

7221 01/14/2013 04:34 PM Aaron Marcuse-Kubitza

inputs/Madidi/new_terms.csv: Regenerated

7220 01/14/2013 04:19 PM Aaron Marcuse-Kubitza

inputs/Madidi/_archive/2010-1-2/: Set svn:ignore

7219 01/14/2013 04:18 PM Aaron Marcuse-Kubitza

inputs/Madidi/_README.TXT: Archived to _archive/2010-1-2/

7218 01/14/2013 03:43 PM Aaron Marcuse-Kubitza

inputs/Madidi/: Refreshed. Note that new export has a completely new schema.

7217 01/14/2013 03:42 PM Aaron Marcuse-Kubitza

inputs/Madidi/: Refreshed. Note that new export has a completely new schema.

7216 01/14/2013 01:53 PM Aaron Marcuse-Kubitza

input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms

7215 01/14/2013 01:18 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: fieldNumber (authorEventCode): Fixed bug where locationevent.authorlocationcode should be authoreventcode

7214 01/14/2013 12:19 PM Aaron Marcuse-Kubitza

Added inputs/Madidi/map.csv, created from new_terms.csv

7213 01/14/2013 12:16 PM Aaron Marcuse-Kubitza

inputs/Madidi/_archive/: Set svn:ignore

7212 01/14/2013 12:15 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields

7211 01/14/2013 12:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation

7210 01/14/2013 11:43 AM Aaron Marcuse-Kubitza

input.Makefile: Configuration: $(exts): Added .dat, which the new Madidi files use

7209 01/14/2013 08:39 AM Aaron Marcuse-Kubitza

mappings/Makefile: VegCore.tables.csv: Removed no longer needed removal of Namespaces table, which is now marked as just a section, not a table

7208 01/14/2013 08:37 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Regenerated from wiki

7207 01/14/2013 07:39 AM Aaron Marcuse-Kubitza

Added to_do/timeline.2013.xls (from Brad, converted to .xls)

7206 01/14/2013 07:30 AM Aaron Marcuse-Kubitza

to_do/timeline.doc: Renamed to timeline.2012.doc to allow for a separate 2013 timeline

7205 01/11/2013 05:05 PM Aaron Marcuse-Kubitza

README.TXT: Data import: Deleting imports before the last: Added instructions to keep a previous import instead of deleting it

7204 01/11/2013 04:22 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(logInstall): Always log the installation, regardless of the $log env var, because $log is set by default on development machines but an install log should still be created

7203 01/11/2013 01:03 PM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Regenerated exports

7202 01/11/2013 10:19 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to handle the case where (SELECT source.source_id FROM source WHERE source.shortname = 'TNRS') is NULL because no TNRS names have been imported yet

7201 01/11/2013 09:44 AM Aaron Marcuse-Kubitza

*/new_terms.csv, */unmapped_terms.csv: Regenerated using `make missing_mappings`

7200 01/11/2013 09:19 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: morphoname: Remapped to the original rather than current taxondetermination because this is the original name applied by the author

7199 01/11/2013 09:16 AM Aaron Marcuse-Kubitza

inputs/SALVIAS*/Organism/map.csv: Remapped voucher_string/coll_number to recordNumber instead of catalogNumber, because this number is actually applied by the collector rather than by a herbarium

7198 01/11/2013 09:11 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped recordNumber to new specimenreplicate.collectionnumber

7197 01/11/2013 09:02 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also map recordNumber (collectionnumber) to the indirect voucher's specimenreplicate

7196 01/11/2013 08:48 AM Aaron Marcuse-Kubitza

inputs/*/*/map.csv: Remapped recordNumber to new individualCode where applicable

7195 01/11/2013 08:44 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped individualCode. authortaxoncode: Prefer tag over recordNumber (collectionnumber), because this applies to the plant rather than the specimen.

7194 01/11/2013 08:17 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped morphoname

7193 01/11/2013 08:16 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Regenerated from wiki

7192 01/11/2013 08:14 AM Aaron Marcuse-Kubitza

mappings/VegCore.csv: Regenerated from wiki

7191 01/11/2013 08:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: Added morphoname (which is different from the morphospecies suffix)

7190 01/11/2013 07:33 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.

7189 01/11/2013 07:28 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: plantobservation: Renamed collectionnumber to authorplantcode since this number, which identifies the plant, is actually different from the collectionnumber that identifies the specimen collected from it. This distinction is meaningful for plots data, but generally not for specimens data.

7188 01/11/2013 07:23 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Added collectionnumber

7187 01/11/2013 07:17 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel: Removed no longer used matched_label_fit_fraction. Use taxondetermination.taxonfit instead.

7186 01/11/2013 07:02 AM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: Restored inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB

7185 01/11/2013 06:55 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Expanded analytical_stem to fit the width of all fields

7184 01/11/2013 06:53 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination: taxondetermination_computer_min_fit CHECK constraint: Fixed bug where need to use CASE instead of OR when a branch of an OR shouldn't be evaluated, because PostgreSQL doesn't support short-circuit OR

7183 01/11/2013 06:38 AM Aaron Marcuse-Kubitza

README.TXT: Debugging: Added instructions for "binary chop" debugging, which requires syncing the DB schema to the svn working copy

7182 01/11/2013 06:08 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed no longer used mappings for verbatimScientificName in _if conditions

7181 01/11/2013 06:08 AM Aaron Marcuse-Kubitza

inputs/.NCBI/nodes/test.xml.ref: Restored inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB

7180 01/11/2013 06:06 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): DuplicateKeyException: Uniquifying input table to avoid internal duplicate keys: Also filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes

7179 01/11/2013 06:01 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Added support for custom joins used in creating the new table. This can then be used by sql_io.put_table() to filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes.

7178 01/11/2013 05:53 AM Aaron Marcuse-Kubitza

README.TXT: Documentation: Redmine-formatted list of steps for column-based import: Added step to reinstall public schema first, to reset the sequences so that they don't create a diff when the new steps.by_col.log.sql is committed

7177 01/11/2013 05:48 AM Aaron Marcuse-Kubitza

Added inputs/ACAD/Specimen/logs/steps.by_col.log.sql

7176 01/11/2013 05:45 AM Aaron Marcuse-Kubitza

sql_gen.py: Join: Added support for mapping values which are lists, for use in USING joins

7175 01/11/2013 05:40 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/*/test.xml.ref: Restored SALVIAS* inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB

7174 01/11/2013 05:01 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem: Added locationName (authorPlotCode), subplot, individualCode (authorPlantCode) for use in validation