/trunk/inputs/FIA - Changes - BIEN 3 - NCEAS Projects

root/trunk/inputs/FIA @ 12623

svn:ignore: *

#	Date	Author	Comment
12028	02/02/2014 10:31 PM	Aaron Marcuse-Kubitza	inputs/FIA/taxon_observation.**/header.csv: updated for new REF_RESEARCH_STATION.country metadata value col
12018	02/02/2014 12:49 AM	Aaron Marcuse-Kubitza	inputs/input.Makefile: add!: verify/: also svn:ignore .tsv, .txt
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11871	12/09/2013 03:37 PM	Aaron Marcuse-Kubitza	inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
11843	12/05/2013 11:38 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
11788	11/26/2013 11:11 PM	Aaron Marcuse-Kubitza	**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
11705	11/21/2013 12:24 AM	Aaron Marcuse-Kubitza	copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
11107	09/29/2013 08:58 PM	Aaron Marcuse-Kubitza	bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
10950	09/14/2013 09:26 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/: renamed to taxon_observation. to clarify what an occurrence is. taxon_observation has the same meaning as in VegCore, where each taxon_determination is considered a separate taxon_observation of the associated specimen or vouchered plant (vegpath.org/VegCore/ERD/). (note that having multiple taxon_determinations only makes sense when there is something to reobserve.) the . is SQL dotpath syntax (wiki.vegpath.org/SQL_dotpaths) for the recursive expansion of all tables to which taxon_observation has forward fkeys (i.e. "the left-join").
10949	09/14/2013 09:01 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/postprocess.sql: use :table_str var instead of hardcoding the table name as occurrence_all, to avoid needing to change this file when renaming the view
10948	09/14/2013 05:38 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/postprocess.sql: renamed occurrence_all--row_num to just row_num because this is for the entire view, rather than a specific table in it, and thus does not need a disambiguating table prefix. this also avoids embedding the view name in its own columns.
10947	09/14/2013 04:34 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: use -- to separate the table and column name instead of - , to conform with the u-name format (wiki.vegpath.org/u-name#format), which works even when only one of _- can be used in the name. -s are needed in this case to linewrap the column on a separate line as the table in phpPgAdmin.
10930	09/12/2013 03:37 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/postprocess.sql: use much simpler LEFT JOINs instead of nested RIGHT JOINs, which required lots of () to get them to happen in the right order. note that the columns are now provided in reverse instead of forwards path order, but this is still much clearer than the nested mess of RIGHT JOINs. this approach can also be used to simplify VegBank's joins.
10866	09/04/2013 11:06 PM	Aaron Marcuse-Kubitza	inputs///test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
10242	07/10/2013 10:07 PM	Aaron Marcuse-Kubitza	inputs/*/Source/VegBIEN.csv: regenerated for new-style import, which uses a symlink to mappings/VegCore-VegBIEN.csv instead of a custom mapping using the original column names
10199	07/09/2013 04:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
10174	07/06/2013 03:55 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/VegBIEN.csv: for new-style datasources, use a symlink to mappings/VegCore-VegBIEN.csv directly instead of prefiltering VegCore-VegBIEN.csv to include only the columns in map.csv. prefiltering used to be performed as part of mapping the map.csv VegCore output terms to VegBIEN using bin/join, but is no longer needed because the staging table columns are now VegCore terms. instead, the full VegCore-VegBIEN.csv is needed so that derived columns added in stage I or II validations are detected by bin/map (rather than just the original source columns in map.csv).
10173	07/06/2013 03:37 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: cultivated, oldGrowth: use just cultivated if it's provided, rather than /_alt-ing it back with oldGrowth (which it was generated from)
10172	07/06/2013 03:30 PM	Aaron Marcuse-Kubitza	bugfix: mappings/VegCore-VegBIEN.csv: fixed priority of cultivated and oldGrowth so cultivated is used first if it's available
10166	07/06/2013 11:29 AM	Aaron Marcuse-Kubitza	bugfix: inputs/*/Source/data.csv for new-style datasources: need to include a blank row (plus a blank header) so that the metadata values are imported at least once instead of zero times, now that there is an installed staging table that will be iterated over. the blank row did not used to be necessary, because db_xml.put_table() has a special case for metadata-only tables with no installed table, which avoids iterating over the table's rows.
10163	07/03/2013 10:20 PM	Aaron Marcuse-Kubitza	inputs/*/Source/ for new-style datasources: use an actual staging table instead of a metadata-only table, so that metadata values can be stored in the staging table instead of the map.csv (as will be required by new-style import)
10091	06/27/2013 12:28 PM	Aaron Marcuse-Kubitza	added inputs///header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install
10090	06/27/2013 12:23 PM	Aaron Marcuse-Kubitza	added inputs/FIA/*/{VegBIEN.csv,test.xml.ref}, which are now generated by the mapping process for the joined-together tables (even though they are not used by the import, because only occurrence_all is imported)
10086	06/27/2013 12:12 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/postprocess.sql: removed svn:executable attribute using `svn pdel svn:executable ...` now that these are not shell scripts
10085	06/27/2013 12:11 PM	Aaron Marcuse-Kubitza	removed no longer needed inputs/FIA/import. use inputs/FIA/run instead.
10084	06/27/2013 12:10 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/import: changed to postprocess.sql for use by the runscripts
10083	06/27/2013 04:27 AM	Aaron Marcuse-Kubitza	added inputs/FIA/run
10082	06/27/2013 04:26 AM	Aaron Marcuse-Kubitza	added inputs/FIA//run. these do not yet use the postprocessing operations in /import.
10081	06/27/2013 04:24 AM	Aaron Marcuse-Kubitza	added inputs/FIA/table.run (for use by table subdirs) and helper Makefile
10078	06/27/2013 01:41 AM	Aaron Marcuse-Kubitza	bugfix: inputs/FIA/occurrence_all/import: don't re-prepend * to terms because this is a view, and the underlying columns have already been mapped
10073	06/26/2013 06:57 PM	Aaron Marcuse-Kubitza	bugfix: inputs/FIA/REF_SPECIES/import: PLANT_SYMBOL_TYPE: prepended * since it's a datasource column, and needs to match up with *PLANT_SYMBOL_TYPE in other table for joins
10072	06/26/2013 06:57 PM	Aaron Marcuse-Kubitza	bugfix: inputs/FIA/REF_SPECIES/import: PLANT_SYMBOL_TYPE: prepended * since it's a datasource column, and needs to match up with *PLANT_SYMBOL_TYPE in other table for joins
10070	06/26/2013 03:36 PM	Aaron Marcuse-Kubitza	added inputs/FIA/_src/run, which runs ./download
10058	06/26/2013 12:00 PM	Aaron Marcuse-Kubitza	inputs/FIA/_src/Makefile: Extraction: $(zips): use $(allZips) containing a zip for each state so that states that have not yet been downloaded and extracted (or had an empty dir created for them) will be downloaded. previously, the extract target only expanded existing zips but did not download new zips unless no zips had yet been downloaded. (this had been necessary because some states do not have a download, and the download of them would be continuously retried every time the Makefile was run.)
10057	06/26/2013 11:51 AM	Aaron Marcuse-Kubitza	bugfix: inputs/FIA/_src/Makefile: `%: %.zip`: if unzip fails because the download does not exist, create an empty dir for the state instead of aborting make
10056	06/26/2013 11:33 AM	Aaron Marcuse-Kubitza	inputs/FIA/_src/Makefile: use curl instead of wget because that is also available on Mac
10054	06/26/2013 11:05 AM	Aaron Marcuse-Kubitza	inputs/FIA/SUBPLOT/map.csv, import: prepended * to all FIA terms to clearly distinguish them from the VegCore terms. this is the standard convention for all datasources, to indicate which terms have not yet been mapped, but was not yet implemented at the beginning of new-style import (the FIA refresh was the first new-style datasource)....
10053	06/26/2013 08:59 AM	Aaron Marcuse-Kubitza	inputs/FIA/import_order.txt: added remaining src tables, whose runscripts will be invoked in the order listed by lib/runscripts/datasrc_dir.run
10052	06/26/2013 08:58 AM	Aaron Marcuse-Kubitza	added inputs/FIA/*/_no_import to src tables that are joined together in occurrence_all and should not also be imported separately once they are in import_order.txt
9882	06/12/2013 10:49 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: filter out genera that contain numbers (using new _filter_genus()), which break TNRS and prevent it from matching any other parts of the name. later, these genera can instead be moved to the end of the name, where TNRS will correctly match them as Unmatched_terms.
9863	06/12/2013 06:32 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/test.xml.ref: update inserted row count
8949	05/09/2013 02:06 PM	Aaron Marcuse-Kubitza	inputs/FIA/_archive/2011-10-17/: set svn:ignore to *
8839	05/06/2013 05:27 AM	Aaron Marcuse-Kubitza	inputs/FIA/: archived no longer used BIEN2 FIA data from the nimoy geoscrub DB
8838	05/06/2013 05:26 AM	Aaron Marcuse-Kubitza	inputs/FIA/_archive/: moved FIA_COND_unique, Organism into 2011-10-17/ subdir
8801	05/02/2013 08:53 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: SVN: add, %/add: /logs: also svn:ignore .gz, used for compressed log files
8332	04/04/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Run mk_subset_by_row_num_func() to make the subset functions available for fast querying at large OFFSET values
8327	04/04/2013 08:14 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Added occurrence_all-row_num column for use with mk_subset_by_row_num_func()
8322	04/04/2013 05:24 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Use new force_update_view(), which only drops the view if its columns have changed and otherwise just uses CREATE OR REPLACE VIEW, rather than always first running DROP VIEW IF EXISTS
8318	04/04/2013 03:53 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/: Updated header.csv for new column order
8317	04/04/2013 03:40 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Use directional joins (LEFT/RIGHT JOIN) instead of inner joins to ensure that the PostgreSQL query planner always joins starting with the TREE table. Note that the directional joins are now needed for a different reason than when they were initially added, which had been to avoid slow sorts. The sorts (at least for LIMIT-only queries) went away when small tables such as COUNTY and REF_UNIT were added to the joins.
8316	04/04/2013 01:16 AM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: Changed newlines between table and field name to - because the newlines mess up the flow of queries and also break pgAdmin's display of EXPLAIN output. The - was chosen because it's a non-whitespace character that linewraps in browsers, phpPgAdmin, and Google spreadsheets (although unfortunately not in pgAdmin). It is better than space because you can set a text editor to treat it as a word character, allowing the entire column name (<table>-<field>) to be selected by double-clicking it.
8256	03/28/2013 07:36 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Archived no longer used subdirs from BIEN2 export
8255	03/28/2013 07:29 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Archived no longer used subdirs from BIEN2 export
8253	03/28/2013 07:19 PM	Aaron Marcuse-Kubitza	Added inputs/FIA/_archive
8248	03/28/2013 06:14 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Run remake_VegBIEN_mappings at end to keep mappings to next stage of import process up to date
8247	03/28/2013 06:14 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/: Accepted new test output
8243	03/28/2013 06:07 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/map.csv: Regenerated using new input table mappings
8240	03/28/2013 05:21 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Ran inputs/FIA/import. This maps to VegCore's commonName.
8237	03/28/2013 04:59 PM	Aaron Marcuse-Kubitza	inputs/FIA/PLOT/map.csv: ELEV: Remapped to elevation_ft, assuming units based on the actual elevation of the region for a sample plot record
8234	03/28/2013 04:18 PM	Aaron Marcuse-Kubitza	mappings/VegCore.htm: Regenerated from wiki. Added flower, fruit, commonName.
8222	03/28/2013 12:20 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/test.xml.ref: Updated inserted row count for new row sort order
8220	03/27/2013 11:43 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Removed no longer applicable comment that directional joins are needed for PostgreSQL query planner to avoid slow sorts
8219	03/27/2013 11:40 PM	Aaron Marcuse-Kubitza	inputs/FIA/TREE/import: Reclustered table by TREE.parent path index, to facilitate path-order joins
8218	03/27/2013 11:39 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Changed all RIGHT JOINs to inner joins so that tables would be joined in path order (i.e. general->specific). This optimizes the incremental joins so that the small tables are joined to each other before being joined to the large tables, rather than each row of the large tables being looked up in the small tables. This effect may not be noticeable for small LIMIT values, but would become apparent for large LIMIT values, such as the 1-million-row partitions used by db_xml.put_table() for column-based import. Note that inner joins used to cause the query planner to produce incorrect results containing slow sorts, but now this appears to no longer be an issue, perhaps because the result is not sorted by the TREE.ID index (which is not in the same order as the path indexes .unique, .parent).
8217	03/27/2013 10:46 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Removed trailing whitespace
8216	03/27/2013 10:30 PM	Aaron Marcuse-Kubitza	Removed unused inputs/FIA/COND_unique/. Use COND instead.
8215	03/27/2013 09:52 PM	Aaron Marcuse-Kubitza	inputs/FIA/import: Use `set -o errexit` instead of putting ` \|\| exit` after each command
8208	03/27/2013 08:35 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/import: Run mk_derived after postprocessing commands
8207	03/27/2013 08:28 PM	Aaron Marcuse-Kubitza	inputs/FIA/import_order.txt: Added occurrence_all/
8206	03/27/2013 08:23 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided
8205	03/27/2013 08:15 PM	Aaron Marcuse-Kubitza	Added inputs/FIA/map.csv, which maps shared columns to VegCore
8204	03/27/2013 08:12 PM	Aaron Marcuse-Kubitza	inputs/FIA/FIA_COND_unique/test.xml.ref: Updated now that PLOT, CONDID have been mapped
8203	03/27/2013 08:12 PM	Aaron Marcuse-Kubitza	inputs/FIA//map.csv for pre-refresh tables: Added back before unmapped column names
8192	03/27/2013 03:11 PM	Aaron Marcuse-Kubitza	inputs/FIA//import: Use new mk__col()
8186	03/27/2013 02:22 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/import: oldGrowth: Updated expr column names
8184	03/27/2013 12:54 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/import: Removed util. before function names because util is in the search_path
8183	03/27/2013 12:43 PM	Aaron Marcuse-Kubitza	schemas/functions.sql: Renamed to util.sql because now that these schemas are used by the new-style import scripts, there can be more than just functions in them
8179	03/25/2013 10:31 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/map.csv: Mapped SLOPE, ASPECT
8177	03/25/2013 09:45 PM	Aaron Marcuse-Kubitza	inputs/FIA//map.csv: Replaced . between table and column name with newline, so that table viewers like pgAdmin will display both the table and column name at the left edge of the header cell, rather than displaying only the table name because the column name doesn't fit. This fixes the problem of seeing a bunch of columns whose names all start with a table name, and not knowing what each of them is. It also preserves the ability to see at a glance which table a column is in, which helps in navigating wide tables. Removed before unmapped terms, because whether a term is mapped is generally obvious from the table name itself.
8176	03/25/2013 09:01 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
8174	03/25/2013 08:12 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/import: Renamed COND.oldgrowth to VegCore name oldGrowth
8173	03/25/2013 07:52 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: Ensured that joined columns are globally unique, so they don't map to an ambiguous VegCore term in the future
8172	03/25/2013 07:38 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: Mapped terms to VegCore
8170	03/25/2013 06:57 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/import: Updated column names to match map.csv
8168	03/25/2013 06:34 PM	Aaron Marcuse-Kubitza	inputs/FIA/import: Abort if any invoked script encounters an error
8166	03/25/2013 04:55 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: Removed no longer needed leading . from joined fields (globally-unique terms), because functions.to_global_col_names() is not used anymore
8165	03/25/2013 04:46 PM	Aaron Marcuse-Kubitza	Added inputs/FIA/occurrence_all/, which combines all the core tables in a denormalized view. Note that it is not necessary to materialize this view into a (large) denormalized table, because the unique indexes and left/right joins allow the rows to be denormalized on the fly.
8164	03/25/2013 04:36 PM	Aaron Marcuse-Kubitza	inputs/FIA/*/import: Use map_table to set column names based on the contents of map.csv, instead of using functions.to_global_col_names() and functions.rename_if_exists(). Added map.csv for all tables.
8163	03/25/2013 03:19 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Changed postprocess.sql scripts to import scripts that can be run directly. Added top-level inputs/FIA/import to run all of them together.
8162	03/25/2013 03:05 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/postprocess.sql: Removed trailing whitespace
8136	03/21/2013 06:01 AM	Aaron Marcuse-Kubitza	inputs/FIA/*/postprocess.sql: Avoid using :table, :table_str so that the commands in the script can also be run by pasting them into pgAdmin
8130	03/21/2013 01:27 AM	Aaron Marcuse-Kubitza	inputs/FIA//postprocess.sql: Added index on .CN (autogen IDs)
8128	03/21/2013 01:08 AM	Aaron Marcuse-Kubitza	inputs/FIA/TREE/postprocess.sql: TREE.unique index: Renamed to TREE.ID because this is on an autogenerated pkey rather than on domain values (for which a set of unique columns has not yet been found and may not exist)
8127	03/21/2013 01:03 AM	Aaron Marcuse-Kubitza	inputs/FIA/REF_SPECIES/postprocess.sql: Matched SPECIES_SYMBOL to .SYMBOL. Added .SYMBOL_TYPE for use in joining to REF_PLANT_DICTIONARY.
8126	03/21/2013 12:41 AM	Aaron Marcuse-Kubitza	Added inputs/FIA/REF_UNIT/postprocess.sql
8125	03/21/2013 12:36 AM	Aaron Marcuse-Kubitza	Added inputs/FIA/REF_RESEARCH_STATION/postprocess.sql
8124	03/21/2013 12:17 AM	Aaron Marcuse-Kubitza	Added inputs/FIA/COUNTY/postprocess.sql
8123	03/21/2013 12:02 AM	Aaron Marcuse-Kubitza	Added inputs/FIA/REF_PLANT_DICTIONARY/postprocess.sql
8122	03/20/2013 05:16 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/postprocess.sql: Matched COND.HABTYPCD1, COND.HABTYPCD1_PUB_CD to REF_HABTYP_DESCRIPTION

Project

General

Profile