bugfix: inputs/import.stats.xls: need to exclude postprocessing from the ms/row and Change formulas, also for the "<2014-2-2" tab
inputs/import.stats.xls: updated import times
bugfix: inputs/import.stats.xls: need to exclude postprocessing from the ms/row and Change formulas. removed deleted rows that don't apply to the most recent imports. updated runtime formulas to match bin/import_all.
bugfix: inputs/import.stats.xls: restored missing formatting for multi-day times. prepped tabs to have new import stats data added.
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/import.stats.xls: removed table names from datasources where only one table is imported
fix: inputs/import.stats.xls: removed deleted tables from current import
inputs/import.stats.xls: analytical DB: updated rowcount
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/import.stats.xls: added backup MD5 test time for last import
inputs/import.stats.xls: added backup upload time for last import
inputs/import.stats.xls: added backup times from last import
inputs/import.stats.xls: Updated import times
fix: inputs/import.stats.xls: removed spurious diff comment on total time, which only applied to the previous import
inputs/import.stats.xls: reformatted times longer than one day as a # of days instead of hours, for clarity. the days format is chosen automatically when the # hours exceeds one day.
inputs/import.stats.xls: Postprocessing: populated entries for analytical DB for last 4 imports, and for backup, backup test for last import. note that the combined import time for the last import is 3.5 days, compared to 3 days for the column-based import portion.
inputs/import.stats.xls: Postprocessing: added (empty) entries for analytical DB, backup, backup test
inputs/import.stats.xls: Updated import times. GBIF has been refreshed (with the range modeling column subset), and column-based import now takes 3 days for 88.4 million rows.
inputs/import.stats.xls: Removed the previous imports from the current tab because they are also in the 2012-6~9 tab, and should not be in two places
inputs/import.stats.xls: Updated import times. MO and FIA have been refreshed.
inputs/import.stats.xls: Updated import times. The core import time has dropped by more than half (!) to ~12 hours, now that the TNRS scrubbing is added using a simple LEFT JOIN, instead of being pushed through the normalized schema. Not since October has the import been this fast!
inputs/import.stats.xls: Updated import times using the import_times bugfix for times longer than a day
inputs/import.stats.xls: Added Postprocessing section for use with the next import
inputs/import.stats.xls: Updated import times. Total does not yet include postprocessing.
inputs/import.stats.xls: Moved CTFS to Deleted section
inputs/import.stats.xls: Reformatted so the first by column import and the comparison by row import will fit on the same page when printed on portrait-mode letter paper
inputs/import.stats.xls: Changed import type labels to By row/By column so they would fit into one field, leaving the extra field free to contain the revision #
Renamed inputs/NCU-NCSC/ to NCU because this is the primary herbarium contained in the data
inputs/import.stats.xls: Added separate tab with stats for 2012-6~9. The Excel format apparently only supports 255 columns, so previous imports had been silently truncated off. Note that once the 2012-10 imports reach column 255, a new tab will need to be created with the 2012-10+ imports.
inputs/import.stats.xls: Updated import times. This now includes the Canadensys plants-related datasources HIBG, JBM, QFA, TRT, TRTE, UBC, VASCAN, and WIN.
inputs/import.stats.xls: Updated import times. Fixed input row counts and import times to include derived data, such as TNRS and geoscrub, which adds to the import time and therefore should be considered in the import's speed. (TNRS was already being included in the import time for some, but not all, imports.)
inputs/import.stats.xls: Updated import times. The TNRS import has slowed down significantly, possibly due to a bug in the autopopulation of the taxonlabel_relationship table when the input data contains cycles.
inputs/import.stats.xls: Updated import times. This now includes the half-hour-long pre-import of the TNRS taxonomic names (which the datasources then match up with), as well as the concatenation of the datasource's taxonomic name components to create or match up with the TNRS input name.
inputs/import.stats.xls: Copied the Change factor formula to all rows (it displays an empty string for rows that don't have both a row-based and a column-based import)
inputs/import.stats.xls: Updated with stats from latest import
inputs/import.stats.xls: Updated with stats from latest import. Corrected input row count of CTFS.TaxonOccurrence, which had been set to the inserted row count (which is right above it in the log file).
inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.
inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.
inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) went back down to 9 hours after replacing the slower _merge with _alt.
inputs/import.stats.xls: Updated with stats from latest import. The import time for SpeciesLink (the slowest datasource) doubled, to 16 hours, most likely due to replacing _alt with the slower _merge, which preserves more input data.
inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes additional date parsing on all date fields, which adds 1/2-1 hour to the import time. Eventually, we will want to translate _date() to PL/pgSQL and only use extra date processing if PostgreSQL's cast to timestamp doesn't work, which should greatly reduce this time.