Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
12029 02/02/2014 11:10 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/county_centroids/test.xml.ref, inputs/.NCBI/{names.src,nodes.src}/test.xml.ref: accepted test outputs (generated now that these tables are in import_order.txt)

12018 02/02/2014 12:49 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11873 12/09/2013 04:16 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import

11864 12/06/2013 06:56 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/run: documented import() runtime (20 min)

11792 11/27/2013 09:24 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)

11790 11/27/2013 08:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)

11789 11/26/2013 11:18 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run

11786 11/26/2013 11:07 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)

11785 11/26/2013 11:00 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)

11782 11/26/2013 09:57 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)

11781 11/26/2013 09:45 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.

11595 11/07/2013 04:00 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: load_data(): updated runtime (4 min)

11593 11/07/2013 08:34 AM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: invoking derived/biengeo/geoscrub.sh: need to split the input file into separate dir and filename parts, because $DATAFILE actually is just the filename, not the entire path, and will otherwise get prepended with the default value for $DATADIR

11592 11/06/2013 04:57 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: also run geoscrub.sh. added export_() target to run just the export of the result table separately.

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11388 10/20/2013 04:37 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: added nullable unique index on the inputs, for use by analytical_stem_view. note that it must be nullable in order to create a match when not all of the input fields are populated. this uses array[] to create a nullable index, which is much better than column-based import and VegBIEN's use of COALESCE because the expression is the same for every type and no NULL sentinel value is needed.

11375 10/20/2013 12:48 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: added geovalid derived column, for use by analytical_stem_view

11369 10/19/2013 01:29 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql, run: updated runtimes

11368 10/19/2013 01:13 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/run: documented full load_data() runtime (9 min @starscream)

11367 10/19/2013 01:12 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: updated runtimes for refreshed data, which now has 4x as many rows (1,707,970->6,747,650)

11366 10/19/2013 12:54 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/: refreshed geoscrub data. removed +header.csv because the extract now contains the header in the first row of the file.

11364 10/19/2013 12:27 AM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: include only the columns that Jim provided in his extract (the geoscrub table contains additional internal columns that are not part of the geovalidation data for VegBIEN). documented runtime (30 s) and upload time (1.5 min).

11363 10/18/2013 10:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: removed no longer needed setting of $local_server, $local_user (and use of $local_pg_database instead of $database) because the use_local bug in local.sh has been fixed

11361 10/18/2013 09:55 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to manually set local_server, local_user to "" so that they do not default to their bien-user values

11354 10/18/2013 06:13 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: need to set $local_pg_database instead of $database because use_local (in psql()) does not currently avoid clobbering already-set versions of the applicable env vars

11351 10/18/2013 05:30 PM Aaron Marcuse-Kubitza

added inputs/.geoscrub/geoscrub_output/geoscrub.csv.run to export the geoscrub table (must be run on vegbiendev)

11168 10/08/2013 12:36 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: added e-mail from John Donoghue with general description of the BIEN2 geovalidation workflow

11164 10/03/2013 12:25 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: added link to geovalidation description in wiki

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10723 08/23/2013 11:43 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/Source/map.csv: mapped datasetURL

10720 08/22/2013 06:12 PM Aaron Marcuse-Kubitza

fix: mappings/VegCore-VegBIEN.csv: source__modified_date: remapped to pubdate instead of datelastmodified because this is actually metadata for the source itself, rather than for the VegBIEN record of the source

10719 08/22/2013 05:56 PM Aaron Marcuse-Kubitza

fix: inputs/.geoscrub/Source/map.csv: source__modified_date: use the mtime of the CSV file instead, since this is closer to the actual version of the biengeo code at the time it was run

10718 08/22/2013 05:41 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/Source/map.csv: mapped source__modified_date. note that the test must be run with inputs/.geoscrub/Source/run instead of `make inputs/.geoscrub/Source/test` to add these metadata columns to the staging table.

10716 08/22/2013 05:36 PM Aaron Marcuse-Kubitza

mappings/VegCore.htm: regenerated from wiki. added source__version (= edition), source__modified_date.

10714 08/22/2013 04:38 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: mapped edition

10713 08/22/2013 04:36 PM Aaron Marcuse-Kubitza

bugfix: inputs/.geoscrub/{Source,geoscrub_output}/VegBIEN.csv: switched to the version needed for new-style datasources

10712 08/22/2013 04:12 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/Source/map.csv: mapped edition (the version), using `svn info derived/biengeo/`

10390 07/24/2013 12:44 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/: switched to new-style import, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource

10389 07/24/2013 12:15 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/: translated single-column filters to postprocessing derived columns, using the steps at wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10245 07/11/2013 12:55 AM Aaron Marcuse-Kubitza

bugfix: inputs/*/*/postprocess.sql: made all operations idempotent, so that postprocess.sql can be run repeatedly (e.g. by new-style import)

10209 07/10/2013 02:32 AM Aaron Marcuse-Kubitza

inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table

10208 07/10/2013 02:27 AM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.

10199 07/09/2013 04:44 PM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.

10179 07/06/2013 05:39 PM Aaron Marcuse-Kubitza

inputs/*/: added table.run for use by the table subdirs in new-style import. datasources without table subdirs do not need this.

10178 07/06/2013 05:35 PM Aaron Marcuse-Kubitza

inputs/*/: added top-level Makefile which includes inputs/input.Makefile, so that make can be run directly on the datasrc dir without needing to specify `--makefile=../input.Makefile` (see input.Makefile $(selfMake))

10170 07/06/2013 02:26 PM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/: use installed staging table (with blank-line data.csv) in order to also work with new-style import. this also fixes a benign diff between the by-row and by-col test outputs, where row-based import would not import the Source/ entries because there was not at least one row in the input. note that in order to ensure that all datasources are properly run, you need to check `svn st|sort` against the datasource schema names to see if any are missing.

10091 06/27/2013 12:28 PM Aaron Marcuse-Kubitza

added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install

9921 06/19/2013 09:11 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values

9920 06/19/2013 09:02 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().

9459 05/17/2013 06:00 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt

9415 05/16/2013 04:15 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together

9413 05/16/2013 04:06 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL

9404 05/16/2013 02:24 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT

8801 05/02/2013 08:53 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files

8176 03/25/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)

7765 02/27/2013 07:27 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/import_order.txt: Added Source

7464 02/05/2013 03:40 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.

7324 01/22/2013 02:06 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: Added e-mails from Jim about how the county_centroids data was generated

7322 01/22/2013 01:10 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/county_centroids/ from Jim

7321 01/22/2013 01:09 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/import_order.txt: Added geoscrub_output

7254 01/16/2013 11:59 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim

7253 01/16/2013 11:57 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table

7029 01/02/2013 07:03 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/_src/geovalidity-table.txt, which was attached to Jim's geovalidation e-mail (provided in README.TXT)

6855 12/14/2012 09:29 AM Aaron Marcuse-Kubitza

inputs/*/Source/map.csv without mappings: Added referenceType, etc. mappings. This also ensures that the source table entry for the datasource will be created before the herbaria list is imported, causing all top-level datasources to sort at the top of the source table.

6805 12/12/2012 06:07 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: verify: Also ignore *.xlsx

6665 12/06/2012 08:58 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Removed no longer accurate comment that county is not yet used by VegBIEN

6664 12/06/2012 08:56 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 2 ("Point is <=5km from putative GADM polygon, but still outside it") to true instead of false, because 5km is close enough to the polygon that the mismatch could result from shapefile simplifying, boundary changes, or other factors that don't affect geovalidity

6663 12/06/2012 08:52 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: *validity: Remapped 0 ("Complete name provided, but couldn't be scrubbed to GADM") to NULL instead of false, because the absence of a name match does not mean the coordinates are invalid

6661 12/06/2012 08:50 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Add a Source table to store datasource metadata. This adds a Source table to all herbaria which are listed in .herbaria, and therefore didn't previously need a Source table to indicate their referenceType and sampleType.

6660 12/06/2012 08:44 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Add a Source table to store datasource metadata. This adds a Source table to all herbaria which are listed in .herbaria, and therefore didn't previously need a Source table to indicate their referenceType and sampleType.

6659 12/06/2012 08:43 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: SVN: add: verify/: Added *.xls to svn:ignore

6658 12/06/2012 08:33 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/postprocess.sql: Added index on decimallatitude, decimallongitude

6657 12/06/2012 08:30 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_output/postprocess.sql, which adds NOT NULL constraints on decimallatitude, decimallongitude

6443 11/24/2012 02:29 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub.*.sql. Use geoscrub_output instead.

6442 11/24/2012 02:27 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cleaned_unique. Use geoscrub_output instead.

6441 11/24/2012 02:25 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.

6440 11/24/2012 02:25 PM Aaron Marcuse-Kubitza

Removed no longer used geoscrub_cultivated. Use analytical_stem_view.cultivated instead.

6406 11/24/2012 07:50 AM Aaron Marcuse-Kubitza

db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names

6403 11/24/2012 07:29 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()

6280 11/19/2012 02:53 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Mapped to county, acceptedCounty

6272 11/19/2012 01:25 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_output/map.csv: Mapped countyvalidity to latLongInCounty

6265 11/19/2012 11:48 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Removed placecode to prevent datasources from creating duplicate entries for the same place, with different placecodes. This was a problem with the original BIEN2 geoscrub dataset, which contained duplicates.

6238 11/16/2012 12:49 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_cleaned_unique/_no_import to disable geoscrub_cleaned_unique, since the new geoscrub_output supersedes it

6237 11/16/2012 12:47 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_output/

6236 11/16/2012 12:46 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/_src/README.TXT

6234 11/16/2012 12:24 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/_src/ to store Jim's geoscrub CSV

6205 11/15/2012 06:27 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_cultivated/create.sql: Fixed bug where need to filter out NULL lat/longs because primary keys can't contain NULL values

6185 11/15/2012 02:16 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place: Renamed geosource_valid to geovalid. (It had gotten renamed in the reference -> source rename.)

6179 11/14/2012 06:30 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Renamed reference -> source to make this table more broadly applicable, and because this now stores the datasource metadata

6159 11/14/2012 02:25 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/import_order.txt: Fixed bug where geoscrub_cultivated needs to be installed after geoscrub_cleaned_unique, not before as it would be with the default alphabetical sort order

6158 11/14/2012 02:24 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_cultivated/: Use _no_import file to exclude geoscrub_cultivated from the import, because it's used directly as a lookup table by analytical_stem rather than being imported. This ensures that there is no import log or input row count for geoscrub_cultivated in the import times, which would skew the import row count because the row count would be included even though no columns are mapped.

6123 11/13/2012 02:30 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates entry itself needed to have its datasource (reference) set to geoscrub, in addition to the place entry that uses it, in order to match up properly with geoscrub's corresponding input place (whose coordinates as well as place are owned by the geoscrub datasource)

6122 11/13/2012 02:22 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates mappings with and without matched_place_id=0 need to sort together in order to be merged, by prepending ".," to the place attrs list

6112 11/09/2012 06:37 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Removed no longer needed index on latitudeDecimalVerbatim, longitudeDecimalVerbatim, which is now on geoscrub_cultivated instead

6110 11/09/2012 06:26 PM Aaron Marcuse-Kubitza

Added inputs/.geoscrub/geoscrub_cultivated/

6109 11/09/2012 06:04 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Added index on latitudeDecimalVerbatim, longitudeDecimalVerbatim for use by analytical_stem_view

6106 11/09/2012 05:25 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/geoscrub_cleaned_unique/create.sql: Change latitudeDecimalVerbatim, longitudeDecimalVerbatim types to double precision to allow merge-joining with coordinates.latitude_deg, longitude_deg in analytical_stem_view

6035 11/06/2012 03:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Always map taxonNameOrEpithet to taxonomicname, now that it's globally unique at all ranks in the datasource that provides it (NCBI)

6012 11/06/2012 09:27 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/_MySQL/geoscrub.*.sql.make: Use new my2pg_export