fix: *Makefile: changed line endings to \n so that `patch` can work with pasted input. use `svn di --extensions --ignore-eol-style` to verify no diff.
fix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): added warning that this will truncate the geoscrub database tables
inputs/.geoscrub/geoscrub_output/run: documented postprocess() rm=1 runtime (6 min)
fix: inputs/.geoscrub/geoscrub_output/postprocess.sql: map_geovalidity(): unscrubbable names should actually be geo*in*valid, not geovalid=NULL, according to Brad
fix: inputs/.geoscrub/geoscrub_output/: added _no_import because these tables are metadata that is used in the analytical DB. this is better than relying on bin/import_all not to import these.
inputs/input.Makefile: add: verify/: also svn:ignore *.log
*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
inputs/.geoscrub/county_centroids/test.xml.ref, inputs/.NCBI/{names.src,nodes.src}/test.xml.ref: accepted test outputs (generated now that these tables are in import_order.txt)
inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import
inputs/.geoscrub/run: documented import() runtime (20 min)
inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
inputs/.geoscrub/geoscrub_output/run: load_data(): updated runtime (4 min)
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: invoking derived/biengeo/geoscrub.sh: need to split the input file into separate dir and filename parts, because $DATAFILE actually is just the filename, not the entire path, and will otherwise get prepended with the default value for $DATADIR
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: also run geoscrub.sh. added export_() target to run just the export of the result table separately.
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/.geoscrub/geoscrub_output/postprocess.sql: added nullable unique index on the inputs, for use by analytical_stem_view. note that it must be nullable in order to create a match when not all of the input fields are populated. this uses array[] to create a nullable index, which is much better than column-based import and VegBIEN's use of COALESCE because the expression is the same for every type and no NULL sentinel value is needed.
inputs/.geoscrub/geoscrub_output/postprocess.sql: added geovalid derived column, for use by analytical_stem_view
inputs/.geoscrub/geoscrub_output/postprocess.sql, run: updated runtimes
inputs/.geoscrub/geoscrub_output/run: documented full load_data() runtime (9 min @starscream)
inputs/.geoscrub/geoscrub_output/postprocess.sql: updated runtimes for refreshed data, which now has 4x as many rows (1,707,970->6,747,650)
inputs/.geoscrub/geoscrub_output/: refreshed geoscrub data. removed +header.csv because the extract now contains the header in the first row of the file.
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: include only the columns that Jim provided in his extract (the geoscrub table contains additional internal columns that are not part of the geovalidation data for VegBIEN). documented runtime (30 s) and upload time (1.5 min).
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: removed no longer needed setting of $local_server, $local_user (and use of $local_pg_database instead of $database) because the use_local bug in local.sh has been fixed
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to manually set local_server, local_user to "" so that they do not default to their bien-user values
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to set $local_pg_database instead of $database because use_local (in psql()) does not currently avoid clobbering already-set versions of the applicable env vars
added inputs/.geoscrub/geoscrub_output/geoscrub.csv.run to export the geoscrub table (must be run on vegbiendev)
inputs/.geoscrub/_src/README.TXT: added e-mail from John Donoghue with general description of the BIEN2 geovalidation workflow
inputs/.geoscrub/_src/README.TXT: added link to geovalidation description in wiki
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/.geoscrub/Source/map.csv: mapped datasetURL
fix: mappings/VegCore-VegBIEN.csv: source__modified_date: remapped to pubdate instead of datelastmodified because this is actually metadata for the source itself, rather than for the VegBIEN record of the source
fix: inputs/.geoscrub/Source/map.csv: source__modified_date: use the mtime of the CSV file instead, since this is closer to the actual version of the biengeo code at the time it was run
inputs/.geoscrub/Source/map.csv: mapped source__modified_date. note that the test must be run with inputs/.geoscrub/Source/run instead of `make inputs/.geoscrub/Source/test` to add these metadata columns to the staging table.
mappings/VegCore.htm: regenerated from wiki. added source__version (= edition), source__modified_date.
mappings/VegCore-VegBIEN.csv: mapped edition
bugfix: inputs/.geoscrub/{Source,geoscrub_output}/VegBIEN.csv: switched to the version needed for new-style datasources
inputs/.geoscrub/Source/map.csv: mapped edition (the version), using `svn info derived/biengeo/`
inputs/.geoscrub/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource
inputs/.geoscrub/geoscrub_output/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
bugfix: inputs/*/*/postprocess.sql: made all operations idempotent, so that postprocess.sql can be run repeatedly (e.g. by new-style import)
inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table
bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
inputs/*/: added table.run for use by the table subdirs in new-style import. datasources without table subdirs do not need this.
inputs/*/: added top-level Makefile which includes inputs/input.Makefile, so that make can be run directly on the datasrc dir without needing to specify `--makefile=../input.Makefile` (see input.Makefile $(selfMake))
bugfix: inputs/*/Source/: use installed staging table (with blank-line data.csv) in order to also work with new-style import. this also fixes a benign diff between the by-row and by-col test outputs, where row-based import would not import the Source/ entries because there was not at least one row in the input. note that in order to ensure that all datasources are properly run, you need to check `svn st|sort` against the datasource schema names to see if any are missing.
added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install
inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values
inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().
bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt
schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together
mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL
inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT
inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files
inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
inputs/.geoscrub/import_order.txt: Added Source
mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
inputs/.geoscrub/_src/README.TXT: Added e-mails from Jim about how the county_centroids data was generated
Added inputs/.geoscrub/county_centroids/ from Jim
inputs/.geoscrub/import_order.txt: Added geoscrub_output
inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim
inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table
Added inputs/.geoscrub/_src/geovalidity-table.txt, which was attached to Jim's geovalidation e-mail (provided in README.TXT)
inputs/*/Source/map.csv without mappings: Added referenceType, etc. mappings. This also ensures that the source table entry for the datasource will be created before the herbaria list is imported, causing all top-level datasources to sort at the top of the source table.
input.Makefile: SVN: add: verify: Also ignore *.xlsx
inputs/.geoscrub/geoscrub_output/map.csv: Removed no longer accurate comment that county is not yet used by VegBIEN
inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 2 ("Point is <=5km from putative GADM polygon, but still outside it") to true instead of false, because 5km is close enough to the polygon that the mismatch could result from shapefile simplifying, boundary changes, or other factors that don't affect geovalidity
inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 0 ("Complete name provided, but couldn't be scrubbed to GADM") to NULL instead of false, because the absence of a name match does not mean the coordinates are invalid
input.Makefile: SVN: add: Add a Source table to store datasource metadata. This adds a Source table to all herbaria which are listed in .herbaria, and therefore didn't previously need a Source table to indicate their referenceType and sampleType.
inputs/input.Makefile: SVN: add: verify/: Added *.xls to svn:ignore
inputs/.geoscrub/geoscrub_output/postprocess.sql: Added index on decimallatitude, decimallongitude
Added inputs/.geoscrub/geoscrub_output/postprocess.sql, which adds NOT NULL constraints on decimallatitude, decimallongitude
Removed no longer used geoscrub.*.sql. Use geoscrub_output instead.
Removed no longer used geoscrub_cleaned_unique. Use geoscrub_output instead.
Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.
db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names
mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()
inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty
inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty
schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.
Added inputs/.geoscrub/geoscrub_cleaned_unique/_no_import to disable geoscrub_cleaned_unique, since the new geoscrub_output supersedes it
Added inputs/.geoscrub/geoscrub_output/
Added inputs/.geoscrub/_src/README.TXT
Added inputs/.geoscrub/_src/ to store Jim's geoscrub CSV
inputs/.geoscrub/geoscrub_cultivated/create.sql: Fixed bug where need to filter out NULL lat/longs because primary keys can't contain NULL values
schemas/vegbien.sql: place: Renamed geosource_valid to geovalid. (It had gotten renamed in the reference -> source rename.)
schemas/vegbien.sql: Renamed reference -> source to make this table more broadly applicable, and because this now stores the datasource metadata
inputs/.geoscrub/import_order.txt: Fixed bug where geoscrub_cultivated needs to be installed after geoscrub_cleaned_unique, not before as it would be with the default alphabetical sort order
inputs/.geoscrub/geoscrub_cultivated/: Use _no_import file to exclude geoscrub_cultivated from the import, because it's used directly as a lookup table by analytical_stem rather than being imported. This ensures that there is no import log or input row count for geoscrub_cultivated in the import times, which would skew the import row count because the row count would be included even though no columns are mapped.
mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates entry itself needed to have its datasource (reference) set to geoscrub, in addition to the place entry that uses it, in order to match up properly with geoscrub's corresponding input place (whose coordinates as well as place are owned by the geoscrub datasource)