Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  Source 11396 about 11 years Aaron Marcuse-Kubitza fix: bin/map: put template: comment out the "Pu...
  Specimen 12516 over 10 years Aaron Marcuse-Kubitza bugfix: *.sql: public.source_by_shortname(): ne...
  Specimen.src 10091 over 11 years Aaron Marcuse-Kubitza added inputs/*/*/header.csv for CSV inputs, whi...
  _archive 7737 over 11 years Aaron Marcuse-Kubitza Added inputs/REMIB/_archive/remib_raw.0.header....
  logs 8801 over 11 years Aaron Marcuse-Kubitza inputs/input.Makefile: SVN: add, %/add: */logs:...
  verify 12018 over 10 years Aaron Marcuse-Kubitza inputs/input.Makefile: add!: verify/: also svn:...
Makefile 27 Bytes 10177 over 11 years Aaron Marcuse-Kubitza inputs/*/: added top-level Makefile which inclu...
import_order.txt 16 Bytes 6394 almost 12 years Aaron Marcuse-Kubitza Added inputs/REMIB/Source/, containing referenc...
new_terms.csv 323 Bytes 11788 almost 11 years Aaron Marcuse-Kubitza **/new_terms.csv, unmapped_terms.csv updated (u...
run 87 Bytes 10349 over 11 years Aaron Marcuse-Kubitza inputs/REMIB/: switched to new-style import, us...
table.run 81 Bytes 10179 over 11 years Aaron Marcuse-Kubitza inputs/*/: added table.run for use by the table...
unmapped_terms.csv 261 Bytes 11788 almost 11 years Aaron Marcuse-Kubitza **/new_terms.csv, unmapped_terms.csv updated (u...
  • svn:ignore: *

Latest revisions

# Date Author Comment
12516 02/27/2014 01:27 PM Aaron Marcuse-Kubitza

bugfix: *.sql: public.source_by_shortname(): need to wrap it in a nested SELECT because Postgres incorrectly does not constant-fold (inline) it, leading to a slowdown when it is therefore run many times. this is done using the steps at wiki.vegpath.org/Postgres_queries#wrap-function-call-in-nested-SELECT .

12018 02/02/2014 12:49 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11788 11/26/2013 11:11 PM Aaron Marcuse-Kubitza

**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11107 09/29/2013 08:58 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10377 07/20/2013 05:09 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: map_nulls() derived cols: documented total runtime (7.5 min on vegbiendev)

10376 07/20/2013 05:07 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: map_nulls() derived cols: updated runtimes for map_nulls() inlining, which created a speed improvement of 7x for the numeric columns and 2.5x for the text columns (292563.362->41929.772 ms and 83640.424->35690.797 ms, respectively). note that the map_nulls__coord__*() calls could be optimized further by combining the successive map_nulls() calls into one, with the hstores merged.

10361 07/20/2013 01:27 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: map_nulls__*(): turned off STRICT to allow dynamic inlining, which speeds up the mk_derived_col() statements by 5x (342799.823 ms -> 71533.252 ms (6 min -> 1 min) for latitude_sec)

View revisions

Also available in: Atom