fix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): added warning that this will truncate the geoscrub database tables
inputs/.geoscrub/geoscrub_output/run: documented postprocess() rm=1 runtime (6 min)
fix: inputs/.geoscrub/geoscrub_output/postprocess.sql: map_geovalidity(): unscrubbable names should actually be geo*in*valid, not geovalid=NULL, according to Brad
fix: inputs/.geoscrub/geoscrub_output/: added _no_import because these tables are metadata that is used in the analytical DB. this is better than relying on bin/import_all not to import these.
*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
inputs/.geoscrub/geoscrub_output/run: load_data(): updated runtime (4 min)
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: invoking derived/biengeo/geoscrub.sh: need to split the input file into separate dir and filename parts, because $DATAFILE actually is just the filename, not the entire path, and will otherwise get prepended with the default value for $DATADIR
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: also run geoscrub.sh. added export_() target to run just the export of the result table separately.
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/.geoscrub/geoscrub_output/postprocess.sql: added nullable unique index on the inputs, for use by analytical_stem_view. note that it must be nullable in order to create a match when not all of the input fields are populated. this uses array[] to create a nullable index, which is much better than column-based import and VegBIEN's use of COALESCE because the expression is the same for every type and no NULL sentinel value is needed.
inputs/.geoscrub/geoscrub_output/postprocess.sql: added geovalid derived column, for use by analytical_stem_view
inputs/.geoscrub/geoscrub_output/postprocess.sql, run: updated runtimes
inputs/.geoscrub/geoscrub_output/run: documented full load_data() runtime (9 min @starscream)
inputs/.geoscrub/geoscrub_output/postprocess.sql: updated runtimes for refreshed data, which now has 4x as many rows (1,707,970->6,747,650)
inputs/.geoscrub/geoscrub_output/: refreshed geoscrub data. removed +header.csv because the extract now contains the header in the first row of the file.
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: include only the columns that Jim provided in his extract (the geoscrub table contains additional internal columns that are not part of the geovalidation data for VegBIEN). documented runtime (30 s) and upload time (1.5 min).
inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: removed no longer needed setting of $local_server, $local_user (and use of $local_pg_database instead of $database) because the use_local bug in local.sh has been fixed
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to manually set local_server, local_user to "" so that they do not default to their bien-user values
bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to set $local_pg_database instead of $database because use_local (in psql()) does not currently avoid clobbering already-set versions of the applicable env vars
added inputs/.geoscrub/geoscrub_output/geoscrub.csv.run to export the geoscrub table (must be run on vegbiendev)
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
bugfix: inputs/.geoscrub/{Source,geoscrub_output}/VegBIEN.csv: switched to the version needed for new-style datasources
inputs/.geoscrub/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource
inputs/.geoscrub/geoscrub_output/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table
added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install
inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values
inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().
bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt
schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together
mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL
inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT
inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files
inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
inputs/.geoscrub/geoscrub_output/map.csv: Removed no longer accurate comment that county is not yet used by VegBIEN
inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 2 ("Point is <=5km from putative GADM polygon, but still outside it") to true instead of false, because 5km is close enough to the polygon that the mismatch could result from shapefile simplifying, boundary changes, or other factors that don't affect geovalidity
inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 0 ("Complete name provided, but couldn't be scrubbed to GADM") to NULL instead of false, because the absence of a name match does not mean the coordinates are invalid
inputs/.geoscrub/geoscrub_output/postprocess.sql: Added index on decimallatitude, decimallongitude
Added inputs/.geoscrub/geoscrub_output/postprocess.sql, which adds NOT NULL constraints on decimallatitude, decimallongitude
db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names
mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()
inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty
inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty
schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.
Added inputs/.geoscrub/geoscrub_output/