Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
12018 02/02/2014 12:49 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11961 01/15/2014 10:18 AM Aaron Marcuse-Kubitza

inputs/VegBank/import_order.txt: added projectcontributor_

11960 01/15/2014 10:11 AM Aaron Marcuse-Kubitza

inputs/VegBank/projectcontributor_/map.csv, postprocess.sql: added project_participant

11957 01/15/2014 09:41 AM Aaron Marcuse-Kubitza

added inputs/VegBank/projectcontributor_/

11956 01/15/2014 09:29 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: projectcontributor.surname: prepend table name to avoid join collisions

11955 01/15/2014 09:23 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql, inputs/CVS/cvs.~.clean_up.sql: Prevent "column name specified more than once" errors when tables are joined: put tables in alphabetical order for consistency

11934 12/20/2013 04:41 PM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns

11933 12/20/2013 04:31 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table

11932 12/20/2013 04:25 PM Aaron Marcuse-Kubitza

mapped inputs/VegBank/project/, which includes the projectName for attribution

11904 12/11/2013 10:42 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately

11801 12/03/2013 06:51 AM Aaron Marcuse-Kubitza

inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled

11788 11/26/2013 11:11 PM Aaron Marcuse-Kubitza

**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)

11705 11/21/2013 12:24 AM Aaron Marcuse-Kubitza

copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.

11679 11/18/2013 04:27 AM Aaron Marcuse-Kubitza

inputs/VegBank/stemlocation_/header.csv: updated from reinstalling stemlocation_

11604 11/09/2013 02:20 AM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/test.xml.ref: updated inserted row count, now that CVS plots have been removed

11601 11/08/2013 10:28 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/: need to remove inter-datasource duplicates from plot instead of the left-joined plot_ table, because the fkeys needed to do the cascading deletes are all to the plot table. this requires doing the column-renaming and postprocessing on plot before it's left-joined.

11600 11/08/2013 09:57 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/create.sql: updated runtime (5 s) for previous bugfix

11539 10/31/2013 07:51 AM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/import_order.txt: updated name of ^taxon_observation.**.sample table

11538 10/31/2013 07:16 AM Aaron Marcuse-Kubitza

fix: inputs/VegBank/^taxon_observation.**.sample/create.sql: moved continent before country

11537 10/31/2013 06:54 AM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/create.sql: added missing columns that were recently mapped to VegBIEN (identifiedBy)

11536 10/31/2013 06:52 AM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/create.sql: synced column order to analytical_plot

11535 10/31/2013 06:49 AM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/create.sql: synced column order to analytical_plot

11534 10/31/2013 06:47 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/map.csv, postprocess.sql: mapped identifiedBy (the join_words() of identifiedBy_first, etc.)

11524 10/31/2013 02:46 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/map.csv, postprocess.sql: mapped identifiedBy (the join_words() of identifiedBy_first, etc.)

11523 10/31/2013 02:34 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/create.sql: also join party_id to get the identifiedBy (not mapped yet). note that the inserted row count changes, because taxonobservation_ does not yet have a pkey to do a stable ordering with.

11521 10/31/2013 02:06 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: taxoninterpretation.party_id: don't rename to taxoninterpretation_party_id, so that this can be used directly in taxonobservation_/create.sql with a USING join

11520 10/31/2013 01:52 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/create.sql: join taxonobservation to taxoninterpretation (as in CVS) instead of vice versa, since taxonobservation is the primary, operative table. having VegBank and CVS do things the same way helps ensure that fixes in one can transfer easily to the other.

11518 10/31/2013 01:30 AM Aaron Marcuse-Kubitza

inputs/VegBank/^taxon_observation.**.sample/create.sql: synced with taxon_observation.**

11517 10/31/2013 01:22 AM Aaron Marcuse-Kubitza

(for r11396) fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11514 10/30/2013 11:03 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original

11513 10/30/2013 09:49 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: mapped taxon_determination__is_current, taxon_determination__is_original

11511 10/30/2013 09:07 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/map.csv: originalinterpretation, currentinterpretation: removed table name prefix so these would automap

11488 10/30/2013 04:23 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters__from_fuzzing: need to convert km to m in the fuzzing radii. updated derived cols runtimes.

11487 10/30/2013 04:05 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/postprocess.sql: remove duplicated CVS plots (2323 of 7079 CVS plots are removed by this)

11439 10/25/2013 09:24 AM Aaron Marcuse-Kubitza

fix: inputs/VegBank/taxonobservation_/map.csv: remapped authorplantname to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.

11438 10/25/2013 09:22 AM Aaron Marcuse-Kubitza

fix: inputs/VegBank/taxonobservation_/map.csv: remapped int_* to OMIT because these are not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead). see wiki.vegpath.org/Spot-checking#2013-10-10 > Mike Lee's conference call feedback.

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11265 10/13/2013 12:10 AM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/create.sql: documented runtime (5 min)

11261 10/12/2013 04:20 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation.**.sample/create.sql: updated to match taxon_observation.** columns

11260 10/12/2013 04:16 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation.**.sample/create.sql: updated to match taxon_observation.** columns

11257 10/12/2013 03:05 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.txt, inputs/VegBank/+taxon_observation.**.sample/create.sql: updated to match taxon_observation.** columns

11256 10/12/2013 03:03 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/+taxon_observation.**.sample/: renamed to ^taxon_observation.**.sample because a leading + has a special meaning to bash (it indicates a shell option, and you will get an error "invalid option name"), as well as to make (it indicates that a recipe command invokes make recursively)

11255 10/12/2013 02:14 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxon_observation.**/header.csv: updated for observation_/map.csv bugfix, which added new hasobservationsynonym field. this fixes a strange test bug caused by the taxon_observation.**/map.csv column list being mismatched/misaligned with what was in the underlying tables. (column mismatches will often cause unexplainable errors in unrelated sections of code the same way that buffer overflows do in C++.)

11254 10/12/2013 02:01 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxon_observation.**.sample/: renamed to +taxon_observation.**.sample so that the -expansion of taxon_observation.* doesn't add taxon_observation.**.sample (which causes it to attempt to install taxon_observation.**.sample before taxon_observation.** is installed)

11249 10/10/2013 06:50 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/observation_/header.csv, map.csv: updated for refresh, which inserts hasobservationsynonym at the end of the observation table

11248 10/10/2013 05:46 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**.sample/create.sql: reordered columns in the same order as analytical_plot, for easier validation

11244 10/10/2013 02:40 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**.sample/create.sql: include only the subset of columns that is imported to VegBIEN

11243 10/10/2013 02:32 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**.sample/test.xml.ref: updated inserted row count (which was most likely generated before the output column names had been set to the input column names)

11242 10/10/2013 01:55 PM Aaron Marcuse-Kubitza

added inputs/VegBank/verify/input_cols.include.txt, with runscript to generate it

11241 10/10/2013 01:26 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.unmapped.txt*: renamed to input_cols.exclude.txt* because this now includes mapped columns as well

11240 10/10/2013 01:18 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.unmapped.txt*: renamed to input_cols.exclude.txt* because this now includes mapped columns as well

11239 10/10/2013 01:11 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.unmapped.txt.run: remove unmapped join columns, since these would be included in the extract

11238 10/10/2013 01:09 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.unmapped.txt.run: take input directly from input_cols.txt to avoid needing to first copy and paste it into input_cols.unmapped.txt

11237 10/10/2013 01:03 PM Aaron Marcuse-Kubitza

inputs/VegBank/verify/input_cols.unmapped.txt.run: added back deliberately excluded columns (DUPLICATE#of:..., etc.) so that the # of rows in the file can be subtracted from the total # of columns to get the # of input columns that would be included in the extract

11235 10/10/2013 12:23 PM Aaron Marcuse-Kubitza

added inputs/VegBank/verify/input_cols.txt, input_cols.unmapped.txt (with runscript to filter input_cols.unmapped.txt)

11233 10/10/2013 08:18 AM Aaron Marcuse-Kubitza

inputs/VegBank/stratum/postprocess.sql: added pkey

11232 10/10/2013 08:05 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input

11231 10/10/2013 07:54 AM Aaron Marcuse-Kubitza

inputs/VegBank/observation_/postprocess.sql: added __parent index on locationID to facilitate the LEFT JOINs used to create the validation input

11230 10/10/2013 07:45 AM Aaron Marcuse-Kubitza

inputs/VegBank/import_order.txt: added taxon_observation.**.sample so it will automatically be kept up to date

11229 10/10/2013 07:32 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**.sample/create.sql: set runtime (1 s)

11228 10/10/2013 07:30 AM Aaron Marcuse-Kubitza

inputs/VegBank/: added taxon_observation.**.sample subset of plots to use in the validation. this avoids the need to import all of VegBank just to validate a few of the plots.

11225 10/09/2013 06:31 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**/: updated for data refresh

11224 10/09/2013 06:25 PM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/: mapped columns, since this is now included in import_order.txt and therefore gets processed by the column-renaming runscripts. note that this means that in taxonobservation_/map.csv, the plantconcept_ input column names need to be changed to what they are mapped to.

11223 10/09/2013 06:16 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/create.sql: updated runtime (20 s)

11178 10/09/2013 08:54 AM Aaron Marcuse-Kubitza

inputs/VegBank/plantconcept_/create.sql: documented runtime (21 min)

11177 10/09/2013 08:28 AM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/plantconcept_/: added new-style import files

11176 10/09/2013 08:27 AM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/import_order.txt: added plantconcept_, because new-style import needs it to be explicitly listed in import_order.txt in order to run it

11175 10/09/2013 08:24 AM Aaron Marcuse-Kubitza

inputs/VegBank/run: refresh(): added usage

11174 10/09/2013 08:22 AM Aaron Marcuse-Kubitza

inputs/VegBank/run: refresh(): documented that this should be run on vegbiendev

11173 10/09/2013 07:58 AM Aaron Marcuse-Kubitza

inputs/VegBank/_archive/2012-8-30/: svn:ignore the data exports

11172 10/09/2013 07:57 AM Aaron Marcuse-Kubitza

inputs/VegBank/run: added refresh() target

11171 10/09/2013 07:54 AM Aaron Marcuse-Kubitza

inputs/VegBank/: refreshed VegBank so that all of Mike Lee's sample plots would be included in the input data. (VegBank was last refreshed from the live DB on 2012-8-30.) split vegbank.sql into vegbank.schema.sql and vegbank.data.sql so that the schema can be examined and imported separately, like for MySQL datasources. inputs/VegBank/vegbank.~.clean_up.sql: commented out setting comminterpretation.commname to NOT NULL, because after the refresh it is now NULL in 10 rows, where commconcept_id is also NULL.

11160 10/02/2013 05:56 AM Aaron Marcuse-Kubitza

inputs/VegBank/: added observation__community/

11159 10/02/2013 05:49 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: commclass.commcode,commname: rename to prevent collisions

11158 10/02/2013 04:26 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: indicate required column comminterpretation.commname

11157 10/02/2013 04:20 AM Aaron Marcuse-Kubitza

inputs/VegBank/vegbank.~.clean_up.sql: commconcept.commname: rename to prevent collision with commname.commname

11107 09/29/2013 08:58 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.

11106 09/29/2013 08:37 PM Aaron Marcuse-Kubitza

inputs/VegBank/observationcontributor_/test.xml.ref: updated inserted row count

11105 09/28/2013 10:40 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: stratum's locationevent: link this to the parent locationevent, so that the parent locationevent's information (such as locationeventcontributors) is accessible to the stratum's locationevent

11104 09/28/2013 09:08 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxon_observation.**/postprocess.sql: inlined _join() so that taxon_observation.** wouldn't get cascadingly deleted whenever the util schema (where this normally resides) gets reinstalled

11098 09/28/2013 06:53 AM Aaron Marcuse-Kubitza

added inputs/VegBank/observationcontributor_/

11095 09/28/2013 05:04 AM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/: taxonOccurrenceID: include the aggregateOrganismObservationID in this so that there is one taxonoccurrence for each stratum's taxonImportance. this allows the different strata to have separate taxonoccurrences that are associated with the stratum-specific locationevents, rather than all being lumped into one taxonoccurrence, with only one locationevent.

11082 09/24/2013 02:14 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: mapped stratum__name

11071 09/22/2013 08:25 PM Aaron Marcuse-Kubitza

mappings/VegCore.htm: regenerated from wiki. added Stratum table.

11024 09/19/2013 06:49 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/import_order.txt: added stratum

11023 09/19/2013 06:48 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**/postprocess.sql: added stratum, stratumtype to the left-join

11022 09/19/2013 06:46 PM Aaron Marcuse-Kubitza

inputs/VegBank/stemcount_/map.csv: stratum_id: removed table prefix so it can be used as a join column

11021 09/19/2013 06:45 PM Aaron Marcuse-Kubitza

inputs/VegBank/: mapped stratum

11020 09/19/2013 06:39 PM Aaron Marcuse-Kubitza

inputs/VegBank/: mapped stratumtype

11016 09/19/2013 11:31 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/map.csv: taxonomic ranks not in VegCore: removed table prefix so they will be automapped (they are globally unique)

11013 09/19/2013 02:55 AM Aaron Marcuse-Kubitza

inputs//: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead

11012 09/19/2013 01:22 AM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**/postprocess.sql: run mk_subset_by_row_num_func() to add a subset function that uses sort_col. this is used by column-based import, and also provides a common subsetting/sorting API for all the left-joined views. test.xml.ref: the inserted row count most likely changes because the sort order changes.

11009 09/18/2013 11:53 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/stemlocation_/map.csv: remapped stemcount-related fields to OMIT, so that these don't collide with fields of the same name in stemcount_ when they are left-joined together in taxon_observation.** . having the same name causes these to be incorrectly interpreted as shared fkey columns in the NATURAL JOIN (and without the NATURAL JOIN, they would instead be collision errors).

11008 09/18/2013 10:35 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/stemlocation_/postprocess.sql: added missing index on aggregateOrganismObservationID, needed for the 1:many portion of the taxon_observation.** left-join

11007 09/18/2013 10:28 PM Aaron Marcuse-Kubitza

inputs/VegBank/stemcount_/postprocess.sql: moved stemcount___parent index before the derived columns section because it does not depend on them

11006 09/18/2013 10:26 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/stemcount_/postprocess.sql: added missing index on taxonOccurrenceID, needed for the 1:many portion of the taxon_observation.** left-join

11004 09/18/2013 03:50 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxon_observation.**/postprocess.sql: added sort_col (=identificationID) at beginning because column-based import will always sort a view by the first column, which may lead to slow query plans if the first column is not a joined table's pkey

11003 09/18/2013 02:04 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxon_observation.**/postprocess.sql: documented that there is no row_num because left-join to stemcount_, stemlocation_ adds rows to each taxonobservation_

11002 09/18/2013 02:03 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxon_observation.**/postprocess.sql: removed row_num (=identificationID), because there is actually more than one row per VegBank taxonobservation_, so this does not properly enumerate the view rows. this is because there is a 1:many left-join to stemcount_, stemlocation_ which adds rows to each taxonobservation_. since the row_num is gone, any row-subsetting of the view using OFFSET will always need to materialize the entire view up to the OFFSET value. this works for smaller datasources like VegBank that fit almost entirely into one column-based import chunk (1 million rows), but not for larger datasources like FIA where it would be much slower to materialize all preceding 16 million rows on the last chunk (which is what OFFSET normally does with left-joins).