Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
14579 08/26/2014 02:52 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: documented postprocess() rm=1 runtime (6 min)

14556 08/21/2014 07:31 PM Aaron Marcuse-Kubitza

fix: inputs/.geoscrub/geoscrub_output/postprocess.sql: map_geovalidity(): unscrubbable names should actually be geo*in*valid, not geovalid=NULL, according to Brad

13975 07/11/2014 07:34 AM Aaron Marcuse-Kubitza

fix: inputs/.geoscrub/geoscrub_output/: added _no_import because these tables are metadata that is used in the analytical DB. this is better than relying on bin/import_all not to import these.

12968 03/29/2014 04:06 AM Aaron Marcuse-Kubitza

*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11792 11/27/2013 09:24 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)

11790 11/27/2013 08:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)

11786 11/26/2013 11:07 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)

11785 11/26/2013 11:00 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)

11782 11/26/2013 09:57 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)

11781 11/26/2013 09:45 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.

11595 11/07/2013 04:00 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: load_data(): updated runtime (4 min)

11593 11/07/2013 08:34 AM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: invoking derived/biengeo/geoscrub.sh: need to split the input file into separate dir and filename parts, because $DATAFILE actually is just the filename, not the entire path, and will otherwise get prepended with the default value for $DATADIR

11592 11/06/2013 04:57 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: also run geoscrub.sh. added export_() target to run just the export of the result table separately.

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11388 10/20/2013 04:37 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: added nullable unique index on the inputs, for use by analytical_stem_view. note that it must be nullable in order to create a match when not all of the input fields are populated. this uses array[] to create a nullable index, which is much better than column-based import and VegBIEN's use of COALESCE because the expression is the same for every type and no NULL sentinel value is needed.

11375 10/20/2013 12:48 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: added geovalid derived column, for use by analytical_stem_view

11369 10/19/2013 01:29 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql, run: updated runtimes

11368 10/19/2013 01:13 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: documented full load_data() runtime (9 min @starscream)

11367 10/19/2013 01:12 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: updated runtimes for refreshed data, which now has 4x as many rows (1,707,970->6,747,650)

11366 10/19/2013 12:54 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/: refreshed geoscrub data. removed +header.csv because the extract now contains the header in the first row of the file.

11364 10/19/2013 12:27 AM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: include only the columns that Jim provided in his extract (the geoscrub table contains additional internal columns that are not part of the geovalidation data for VegBIEN). documented runtime (30 s) and upload time (1.5 min).

11363 10/18/2013 10:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: removed no longer needed setting of $local_server, $local_user (and use of $local_pg_database instead of $database) because the use_local bug in local.sh has been fixed

11361 10/18/2013 09:55 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to manually set local_server, local_user to "" so that they do not default to their bien-user values

11354 10/18/2013 06:13 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to set $local_pg_database instead of $database because use_local (in psql()) does not currently avoid clobbering already-set versions of the applicable env vars

11351 10/18/2013 05:30 PM Aaron Marcuse-Kubitza

added inputs/.geoscrub/geoscrub_output/geoscrub.csv.run to export the geoscrub table (must be run on vegbiendev)

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10713 08/22/2013 04:36 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/{Source,geoscrub_output}/VegBIEN.csv: switched to the version needed for new-style datasources

10390 07/24/2013 12:44 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource

10389 07/24/2013 12:15 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10209 07/10/2013 02:32 AM Aaron Marcuse-Kubitza

inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table

10091 06/27/2013 12:28 PM Aaron Marcuse-Kubitza

added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install

9921 06/19/2013 09:11 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values

9920 06/19/2013 09:02 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().

9459 05/17/2013 06:00 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt

9415 05/16/2013 04:15 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together

9413 05/16/2013 04:06 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL

9404 05/16/2013 02:24 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT

8801 05/02/2013 08:53 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files

8176 03/25/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)

7464 02/05/2013 03:40 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.

6665 12/06/2012 08:58 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Removed no longer accurate comment that county is not yet used by VegBIEN

6664 12/06/2012 08:56 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 2 ("Point is <=5km from putative GADM polygon, but still outside it") to true instead of false, because 5km is close enough to the polygon that the mismatch could result from shapefile simplifying, boundary changes, or other factors that don't affect geovalidity

6663 12/06/2012 08:52 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 0 ("Complete name provided, but couldn't be scrubbed to GADM") to NULL instead of false, because the absence of a name match does not mean the coordinates are invalid

6658 12/06/2012 08:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: Added index on decimallatitude, decimallongitude

6657 12/06/2012 08:30 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_output/postprocess.sql, which adds NOT NULL constraints on decimallatitude, decimallongitude

6406 11/24/2012 07:50 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names

6403 11/24/2012 07:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()

6280 11/19/2012 02:53 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty

6272 11/19/2012 01:25 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty

6265 11/19/2012 11:48 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.

6237 11/16/2012 12:47 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_output/