Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 over 12 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 10267 over 11 years Aaron Marcuse-Kubitza backups/Makefile: %.backup/restore: documented ...
  bin 10283 over 11 years Aaron Marcuse-Kubitza bugfix: bin/*: spell out [:alnum:] as [a-zA-Z0-...
  config 7801 over 11 years Aaron Marcuse-Kubitza root Makefile: VegBIEN DB: mk_db: Added command...
  exports 9928 over 11 years Aaron Marcuse-Kubitza added exports/_archive/
  inputs 10339 over 11 years Aaron Marcuse-Kubitza inputs/REMIB/Specimen/: translated single-colum...
  lib 10302 over 11 years Aaron Marcuse-Kubitza lib/sql_io.py: put_table(): documented that Pos...
  mappings 10289 over 11 years Aaron Marcuse-Kubitza mappings/VegCore.htm: regenerated from wiki. Sp...
  planning 10311 over 11 years Aaron Marcuse-Kubitza planning/timeline/timeline.2013.xls: moved Indi...
  schemas 10329 over 11 years Aaron Marcuse-Kubitza bugfix: schemas/util.sql: not_empty(anyarray): ...
  web 10306 over 11 years Aaron Marcuse-Kubitza web/links/index.htm: updated to Firefox bookmar...
.htaccess 326 Bytes 8771 over 11 years Aaron Marcuse-Kubitza /.htaccess: use canonical URL without symlinks
.rsync_filter.upload 33 Bytes 10042 over 11 years Aaron Marcuse-Kubitza /.rsync_ignore: temp files: hide them on upload...
.rsync_ignore 12 Bytes 10042 over 11 years Aaron Marcuse-Kubitza /.rsync_ignore: temp files: hide them on upload...
Makefile 12.6 KB 10223 over 11 years Aaron Marcuse-Kubitza /Makefile: mysql-Linux: also install mysql-work...
README.TXT 23.2 KB 10286 over 11 years Aaron Marcuse-Kubitza /README.TXT: Maintenance: regenerate mappings/V...
fix_perms 97 Bytes 7560 over 11 years Aaron Marcuse-Kubitza Added root fix_perms
map 1001 Bytes 6949 almost 12 years Aaron Marcuse-Kubitza vegbien_dest: Changed default $prefix to "", so...
new_terms.csv 38.1 KB 7222 almost 12 years Aaron Marcuse-Kubitza new_terms.csv: Regenerated
run 433 Bytes 9916 over 11 years Aaron Marcuse-Kubitza /run: geoscrub_input/make(): documented runtime...
unmapped_terms.csv 13.1 KB 7201 almost 12 years Aaron Marcuse-Kubitza **/new_terms.csv, **/unmapped_terms.csv: Regene...

Latest revisions

# Date Author Comment
10339 07/19/2013 08:32 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Switching_to_new-style_import#stage-I-source-specific > "translate single-column filters to postprocessing derived columns". null-mapping filters now use wrappers around new util.map_nulls(). note that the verbatim columns input to the filters need to be renamed to avoid name collisions with their filtered columns, which must be VegCore terms for new-style import.

10338 07/19/2013 07:53 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: remove frameshifted rows: also filter out non-numbers for long_sec, lat_min, lat_sec

10337 07/19/2013 07:18 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: remove frameshifted rows: remove rows where long_min is not a number

10336 07/19/2013 07:15 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: change E'' to regular '' to avoid the need to double \ (instead ' would be doubled). E'' used to be necessary in previous versions of PostgreSQL to avoid a warning about escape string syntax.

10335 07/19/2013 07:09 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: remove frameshifted rows: removed unnecessary () around `DELETE FROM :table WHERE long_deg ...`

10334 07/19/2013 07:03 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: removed coll_year, country, long_deg indexes because the frameshift filter conditions on these columns do not use index scans (because their regexp patterns do not contain a fixed prefix). eventually, some regexp patterns may be able to be modified to use prefixes.

10333 07/19/2013 07:01 AM Aaron Marcuse-Kubitza

bugfix: inputs/REMIB/Specimen/postprocess.sql: remove frameshifted rows: can't OR together conditions to determine rows to delete, because if any condition is NULL instead of true/false, this will NULL out the entire WHERE condition and prevent any other true conditions from causing a deletion. the best way to fix this is to use a separate DELETE statement for each condition, so that NULLs only impact that particular condition's DELETE. unlike using a modified, NULL-insensitive OR, which would prevent the use of index scans, this allows indexes to be used for conditions that support them.

10332 07/19/2013 06:05 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/postprocess.sql: removed duplicate CREATE INDEX for the acronym column

10331 07/19/2013 05:59 AM Aaron Marcuse-Kubitza

bugfix: inputs/REMIB/Specimen/postprocess.sql: switched back to the input column names, since the renaming to *_verbatim is part of a later step

10330 07/19/2013 05:26 AM Aaron Marcuse-Kubitza

inputs/REMIB/Specimen/create.sql: moved filtering out of frameshifted rows to postprocess.sql, where it can happen in an idempotent DELETE. this allows filters to remove additional rows to easily be added on top of the existing filters, without needing to remake Specimen (which takes a long time, because of the many stage I derived columns that get added). the logical inversion inherent in the DELETE condition has been factored through rather than wrapped in NOT (...), because removal of frameshifted rows is more accurately specified as the detection of specific patterns that indicate frameshifting rather than the validation of all fields.

View all revisions | View revisions

Also available in: Atom