/trunk/inputs/.geoscrub/geoscrub_output - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs/.geoscrub/geoscrub_output @ 14222

svn:ignore: *

#	Date	Author	Comment
13975	07/11/2014 07:34 AM	Aaron Marcuse-Kubitza	fix: inputs/.geoscrub/geoscrub_output/: added _no_import because these tables are metadata that is used in the analytical DB. this is better than relying on bin/import_all not to import these.
12968	03/29/2014 04:06 AM	Aaron Marcuse-Kubitza	*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11792	11/27/2013 09:24 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
11790	11/27/2013 08:33 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
11786	11/26/2013 11:07 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
11785	11/26/2013 11:00 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
11782	11/26/2013 09:57 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
11781	11/26/2013 09:45 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
11595	11/07/2013 04:00 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: load_data(): updated runtime (4 min)
11593	11/07/2013 08:34 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: invoking derived/biengeo/geoscrub.sh: need to split the input file into separate dir and filename parts, because $DATAFILE actually is just the filename, not the entire path, and will otherwise get prepended with the default value for $DATADIR
11592	11/06/2013 04:57 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: also run geoscrub.sh. added export_() target to run just the export of the result table separately.
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
11388	10/20/2013 04:37 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: added nullable unique index on the inputs, for use by analytical_stem_view. note that it must be nullable in order to create a match when not all of the input fields are populated. this uses array[] to create a nullable index, which is much better than column-based import and VegBIEN's use of COALESCE because the expression is the same for every type and no NULL sentinel value is needed.
11375	10/20/2013 12:48 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: added geovalid derived column, for use by analytical_stem_view
11369	10/19/2013 01:29 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql, run: updated runtimes
11368	10/19/2013 01:13 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: documented full load_data() runtime (9 min @starscream)
11367	10/19/2013 01:12 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: updated runtimes for refreshed data, which now has 4x as many rows (1,707,970->6,747,650)
11366	10/19/2013 12:54 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/: refreshed geoscrub data. removed +header.csv because the extract now contains the header in the first row of the file.
11364	10/19/2013 12:27 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: include only the columns that Jim provided in his extract (the geoscrub table contains additional internal columns that are not part of the geovalidation data for VegBIEN). documented runtime (30 s) and upload time (1.5 min).
11363	10/18/2013 10:33 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: removed no longer needed setting of $local_server, $local_user (and use of $local_pg_database instead of $database) because the use_local bug in local.sh has been fixed
11361	10/18/2013 09:55 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to manually set local_server, local_user to "" so that they do not default to their bien-user values
11354	10/18/2013 06:13 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to set $local_pg_database instead of $database because use_local (in psql()) does not currently avoid clobbering already-set versions of the applicable env vars
11351	10/18/2013 05:30 PM	Aaron Marcuse-Kubitza	added inputs/.geoscrub/geoscrub_output/geoscrub.csv.run to export the geoscrub table (must be run on vegbiendev)
10866	09/04/2013 11:06 PM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
10713	08/22/2013 04:36 PM	Aaron Marcuse-Kubitza	bugfix: inputs/.geoscrub/{Source,geoscrub_output}/VegBIEN.csv: switched to the version needed for new-style datasources
10390	07/24/2013 12:44 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource
10389	07/24/2013 12:15 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
10209	07/10/2013 02:32 AM	Aaron Marcuse-Kubitza	inputs///map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table
10091	06/27/2013 12:28 PM	Aaron Marcuse-Kubitza	added inputs///header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install
9921	06/19/2013 09:11 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values
9920	06/19/2013 09:02 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().
9459	05/17/2013 06:00 PM	Aaron Marcuse-Kubitza	bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt
9415	05/16/2013 04:15 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together
9413	05/16/2013 04:06 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL
9404	05/16/2013 02:24 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT
8801	05/02/2013 08:53 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: SVN: add, %/add: /logs: also svn:ignore .gz, used for compressed log files
8176	03/25/2013 09:01 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
7464	02/05/2013 03:40 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
6665	12/06/2012 08:58 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: Removed no longer accurate comment that county is not yet used by VegBIEN
6664	12/06/2012 08:56 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 2 ("Point is <=5km from putative GADM polygon, but still outside it") to true instead of false, because 5km is close enough to the polygon that the mismatch could result from shapefile simplifying, boundary changes, or other factors that don't affect geovalidity
6663	12/06/2012 08:52 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 0 ("Complete name provided, but couldn't be scrubbed to GADM") to NULL instead of false, because the absence of a name match does not mean the coordinates are invalid
6658	12/06/2012 08:33 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/postprocess.sql: Added index on decimallatitude, decimallongitude
6657	12/06/2012 08:30 PM	Aaron Marcuse-Kubitza	Added inputs/.geoscrub/geoscrub_output/postprocess.sql, which adds NOT NULL constraints on decimallatitude, decimallongitude
6406	11/24/2012 07:50 AM	Aaron Marcuse-Kubitza	db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names
6403	11/24/2012 07:29 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()
6280	11/19/2012 02:53 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty
6272	11/19/2012 01:25 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty
6265	11/19/2012 11:48 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.
6237	11/16/2012 12:47 PM	Aaron Marcuse-Kubitza	Added inputs/.geoscrub/geoscrub_output/

Project

General

Profile