inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
inputs/FIA/occurrence_all/: renamed to taxon_observation.** to clarify what an occurrence is. taxon_observation has the same meaning as in VegCore, where each taxon_determination is considered a separate taxon_observation of the associated specimen or vouchered plant (vegpath.org/VegCore/ERD/). (note that having multiple taxon_determinations only makes sense when there is something to reobserve.) the .** is SQL dotpath syntax (wiki.vegpath.org/SQL_dotpaths) for the recursive expansion of all tables to which taxon_observation has forward fkeys (i.e. "the left-join").
inputs/FIA/occurrence_all/postprocess.sql: use :table_str var instead of hardcoding the table name as occurrence_all, to avoid needing to change this file when renaming the view
inputs/FIA/occurrence_all/postprocess.sql: renamed occurrence_all--row_num to just row_num because this is for the entire view, rather than a specific table in it, and thus does not need a disambiguating table prefix. this also avoids embedding the view name in its own columns.
inputs/FIA/*/map.csv: use -- to separate the table and column name instead of - , to conform with the u-name format (wiki.vegpath.org/u-name#format), which works even when only one of _- can be used in the name. -s are needed in this case to linewrap the column on a separate line as the table in phpPgAdmin.
inputs/FIA/occurrence_all/postprocess.sql: use much simpler LEFT JOINs instead of nested RIGHT JOINs, which required lots of () to get them to happen in the right order. note that the columns are now provided in reverse instead of forwards path order, but this is still much clearer than the nested mess of RIGHT JOINs. this approach can also be used to simplify VegBank's joins.
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/*/Source/VegBIEN.csv: regenerated for new-style import, which uses a symlink to mappings/VegCore-VegBIEN.csv instead of a custom mapping using the original column names
bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.
bugfix: inputs/input.Makefile: %/VegBIEN.csv: for new-style datasources, use a symlink to mappings/VegCore-VegBIEN.csv directly instead of prefiltering VegCore-VegBIEN.csv to include only the columns in map.csv. prefiltering used to be performed as part of mapping the map.csv VegCore output terms to VegBIEN using bin/join, but is no longer needed because the staging table columns are now VegCore terms. instead, the full VegCore-VegBIEN.csv is needed so that derived columns added in stage I or II validations are detected by bin/map (rather than just the original source columns in map.csv).
mappings/VegCore-VegBIEN.csv: cultivated, oldGrowth: use just cultivated if it's provided, rather than /_alt-ing it back with oldGrowth (which it was generated from)
bugfix: mappings/VegCore-VegBIEN.csv: fixed priority of cultivated and oldGrowth so cultivated is used first if it's available
bugfix: inputs/*/Source/data.csv for new-style datasources: need to include a blank row (plus a blank header) so that the metadata values are imported at least once instead of zero times, now that there is an installed staging table that will be iterated over. the blank row did not used to be necessary, because db_xml.put_table() has a special case for metadata-only tables with no installed table, which avoids iterating over the table's rows.
inputs/*/Source/ for new-style datasources: use an actual staging table instead of a metadata-only table, so that metadata values can be stored in the staging table instead of the map.csv (as will be required by new-style import)
added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install
added inputs/FIA/*/{VegBIEN.csv,test.xml.ref}, which are now generated by the mapping process for the joined-together tables (even though they are not used by the import, because only occurrence_all is imported)
inputs/FIA/*/postprocess.sql: removed svn:executable attribute using `svn pdel svn:executable ...` now that these are not shell scripts
removed no longer needed inputs/FIA/import. use inputs/FIA/run instead.
inputs/FIA/*/import: changed to postprocess.sql for use by the runscripts
added inputs/FIA/run
added inputs/FIA/*/run. these do not yet use the postprocessing operations in */import.
added inputs/FIA/table.run (for use by table subdirs) and helper Makefile
bugfix: inputs/FIA/occurrence_all/import: don't re-prepend * to terms because this is a view, and the underlying columns have already been mapped
bugfix: inputs/FIA/REF_SPECIES/import: PLANT_SYMBOL_TYPE: prepended * since it's a datasource column, and needs to match up with *PLANT_SYMBOL_TYPE in other table for joins
added inputs/FIA/_src/run, which runs ./download
inputs/FIA/_src/Makefile: Extraction: $(zips): use $(allZips) containing a zip for each state so that states that have not yet been downloaded and extracted (or had an empty dir created for them) will be downloaded. previously, the extract target only expanded existing zips but did not download new zips unless no zips had yet been downloaded. (this had been necessary because some states do not have a download, and the download of them would be continuously retried every time the Makefile was run.)
bugfix: inputs/FIA/_src/Makefile: `%: %.zip`: if unzip fails because the download does not exist, create an empty dir for the state instead of aborting make
inputs/FIA/_src/Makefile: use curl instead of wget because that is also available on Mac
inputs/FIA/SUBPLOT/map.csv, import: prepended * to all FIA terms to clearly distinguish them from the VegCore terms. this is the standard convention for all datasources, to indicate which terms have not yet been mapped, but was not yet implemented at the beginning of new-style import (the FIA refresh was the first new-style datasource)....
inputs/FIA/import_order.txt: added remaining src tables, whose runscripts will be invoked in the order listed by lib/runscripts/datasrc_dir.run
added inputs/FIA/*/_no_import to src tables that are joined together in occurrence_all and should not also be imported separately once they are in import_order.txt
mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: filter out genera that contain numbers (using new _filter_genus()), which break TNRS and prevent it from matching any other parts of the name. later, these genera can instead be moved to the end of the name, where TNRS will correctly match them as Unmatched_terms.
inputs/FIA/occurrence_all/test.xml.ref: update inserted row count
inputs/FIA/_archive/2011-10-17/: set svn:ignore to *
inputs/FIA/: archived no longer used BIEN2 FIA data from the nimoy geoscrub DB
inputs/FIA/_archive/: moved FIA_COND_unique, Organism into 2011-10-17/ subdir
inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files
inputs/FIA/occurrence_all/import: Run mk_subset_by_row_num_func() to make the subset functions available for fast querying at large OFFSET values
inputs/FIA/occurrence_all/import: Added occurrence_all-row_num column for use with mk_subset_by_row_num_func()
inputs/FIA/occurrence_all/import: Use new force_update_view(), which only drops the view if its columns have changed and otherwise just uses CREATE OR REPLACE VIEW, rather than always first running DROP VIEW IF EXISTS
inputs/FIA/occurrence_all/: Updated header.csv for new column order
inputs/FIA/occurrence_all/import: Use directional joins (LEFT/RIGHT JOIN) instead of inner joins to ensure that the PostgreSQL query planner always joins starting with the TREE table. Note that the directional joins are now needed for a different reason than when they were initially added, which had been to avoid slow sorts. The sorts (at least for LIMIT-only queries) went away when small tables such as COUNTY and REF_UNIT were added to the joins.
inputs/FIA/*/map.csv: Changed newlines between table and field name to - because the newlines mess up the flow of queries and also break pgAdmin's display of EXPLAIN output. The - was chosen because it's a non-whitespace character that linewraps in browsers, phpPgAdmin, and Google spreadsheets (although unfortunately not in pgAdmin). It is better than space because you can set a text editor to treat it as a word character, allowing the entire column name (<table>-<field>) to be selected by double-clicking it.
inputs/FIA/: Archived no longer used subdirs from BIEN2 export
Added inputs/FIA/_archive
inputs/FIA/occurrence_all/import: Run remake_VegBIEN_mappings at end to keep mappings to next stage of import process up to date
inputs/FIA/occurrence_all/: Accepted new test output
inputs/FIA/occurrence_all/map.csv: Regenerated using new input table mappings
inputs/FIA/: Ran inputs/FIA/import. This maps to VegCore's commonName.
inputs/FIA/PLOT/map.csv: ELEV: Remapped to elevation_ft, assuming units based on the actual elevation of the region for a sample plot record
mappings/VegCore.htm: Regenerated from wiki. Added flower, fruit, commonName.
inputs/FIA/occurrence_all/test.xml.ref: Updated inserted row count for new row sort order
inputs/FIA/occurrence_all/import: Removed no longer applicable comment that directional joins are needed for PostgreSQL query planner to avoid slow sorts
inputs/FIA/TREE/import: Reclustered table by TREE.parent path index, to facilitate path-order joins
inputs/FIA/occurrence_all/import: Changed all RIGHT JOINs to inner joins so that tables would be joined in path order (i.e. general->specific). This optimizes the incremental joins so that the small tables are joined to each other before being joined to the large tables, rather than each row of the large tables being looked up in the small tables. This effect may not be noticeable for small LIMIT values, but would become apparent for large LIMIT values, such as the 1-million-row partitions used by db_xml.put_table() for column-based import. Note that inner joins used to cause the query planner to produce incorrect results containing slow sorts, but now this appears to no longer be an issue, perhaps because the result is not sorted by the TREE.ID index (which is not in the same order as the path indexes *.unique, *.parent).
inputs/FIA/occurrence_all/import: Removed trailing whitespace
Removed unused inputs/FIA/COND_unique/. Use COND instead.
inputs/FIA/import: Use `set -o errexit` instead of putting ` || exit` after each command
inputs/FIA/*/import: Run mk_derived after postprocessing commands
inputs/FIA/import_order.txt: Added occurrence_all/
mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided
Added inputs/FIA/map.csv, which maps shared columns to VegCore
inputs/FIA/FIA_COND_unique/test.xml.ref: Updated now that PLOT, CONDID have been mapped
inputs/FIA/*/map.csv for pre-refresh tables: Added back * before unmapped column names
inputs/FIA/*/import: Use new mk_*_col()
inputs/FIA/COND/import: oldGrowth: Updated expr column names
inputs/FIA/*/import: Removed util. before function names because util is in the search_path
schemas/*functions.sql: Renamed to *util.sql because now that these schemas are used by the new-style import scripts, there can be more than just functions in them
inputs/FIA/COND/map.csv: Mapped SLOPE, ASPECT
inputs/FIA/*/map.csv: Replaced . between table and column name with newline, so that table viewers like pgAdmin will display both the table and column name at the left edge of the header cell, rather than displaying only the table name because the column name doesn't fit. This fixes the problem of seeing a bunch of columns whose names all start with a table name, and not knowing what each of them is. It also preserves the ability to see at a glance which table a column is in, which helps in navigating wide tables. Removed * before unmapped terms, because whether a term is mapped is generally obvious from the table name itself.
inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
inputs/FIA/COND/import: Renamed COND.oldgrowth to VegCore name oldGrowth
inputs/FIA/*/map.csv: Ensured that joined columns are globally unique, so they don't map to an ambiguous VegCore term in the future
inputs/FIA/*/map.csv: Mapped terms to VegCore
inputs/FIA/*/import: Updated column names to match map.csv
inputs/FIA/import: Abort if any invoked script encounters an error
inputs/FIA/*/map.csv: Removed no longer needed leading . from joined fields (globally-unique terms), because functions.to_global_col_names() is not used anymore
Added inputs/FIA/occurrence_all/, which combines all the core tables in a denormalized view. Note that it is not necessary to materialize this view into a (large) denormalized table, because the unique indexes and left/right joins allow the rows to be denormalized on the fly.
inputs/FIA/*/import: Use map_table to set column names based on the contents of map.csv, instead of using functions.to_global_col_names() and functions.rename_if_exists(). Added map.csv for all tables.
inputs/FIA/: Changed postprocess.sql scripts to import scripts that can be run directly. Added top-level inputs/FIA/import to run all of them together.
inputs/FIA/COND/postprocess.sql: Removed trailing whitespace
inputs/FIA/*/postprocess.sql: Avoid using :table, :table_str so that the commands in the script can also be run by pasting them into pgAdmin
inputs/FIA/*/postprocess.sql: Added index on *.CN (autogen IDs)
inputs/FIA/TREE/postprocess.sql: TREE.unique index: Renamed to TREE.ID because this is on an autogenerated pkey rather than on domain values (for which a set of unique columns has not yet been found and may not exist)
inputs/FIA/REF_SPECIES/postprocess.sql: Matched SPECIES_SYMBOL to .SYMBOL. Added .SYMBOL_TYPE for use in joining to REF_PLANT_DICTIONARY.
Added inputs/FIA/REF_UNIT/postprocess.sql
Added inputs/FIA/REF_RESEARCH_STATION/postprocess.sql
Added inputs/FIA/COUNTY/postprocess.sql
Added inputs/FIA/REF_PLANT_DICTIONARY/postprocess.sql
inputs/FIA/COND/postprocess.sql: Matched COND.HABTYPCD1, COND.HABTYPCD1_PUB_CD to REF_HABTYP_DESCRIPTION
inputs/FIA/REF_HABTYP_DESCRIPTION/postprocess.sql: Prepare columns for joining with COND