Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11871 12/09/2013 03:37 PM Aaron Marcuse-Kubitza

inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)

11843 12/05/2013 11:38 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy

11788 11/26/2013 11:11 PM Aaron Marcuse-Kubitza

**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)

11705 11/21/2013 12:24 AM Aaron Marcuse-Kubitza

copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11107 09/29/2013 08:58 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.

10950 09/14/2013 09:26 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/: renamed to taxon_observation.** to clarify what an occurrence is. taxon_observation has the same meaning as in VegCore, where each taxon_determination is considered a separate taxon_observation of the associated specimen or vouchered plant (vegpath.org/VegCore/ERD/). (note that having multiple taxon_determinations only makes sense when there is something to reobserve.) the .** is SQL dotpath syntax (wiki.vegpath.org/SQL_dotpaths) for the recursive expansion of all tables to which taxon_observation has forward fkeys (i.e. "the left-join").

10949 09/14/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/postprocess.sql: use :table_str var instead of hardcoding the table name as occurrence_all, to avoid needing to change this file when renaming the view

10948 09/14/2013 05:38 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/postprocess.sql: renamed occurrence_all--row_num to just row_num because this is for the entire view, rather than a specific table in it, and thus does not need a disambiguating table prefix. this also avoids embedding the view name in its own columns.

10947 09/14/2013 04:34 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: use -- to separate the table and column name instead of - , to conform with the u-name format (wiki.vegpath.org/u-name#format), which works even when only one of _- can be used in the name. -s are needed in this case to linewrap the column on a separate line as the table in phpPgAdmin.

10930 09/12/2013 03:37 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/postprocess.sql: use much simpler LEFT JOINs instead of nested RIGHT JOINs, which required lots of () to get them to happen in the right order. note that the columns are now provided in reverse instead of forwards path order, but this is still much clearer than the nested mess of RIGHT JOINs. this approach can also be used to simplify VegBank's joins.

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10242 07/10/2013 10:07 PM Aaron Marcuse-Kubitza

inputs/*/Source/VegBIEN.csv: regenerated for new-style import, which uses a symlink to mappings/VegCore-VegBIEN.csv instead of a custom mapping using the original column names

10199 07/09/2013 04:44 PM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.

10174 07/06/2013 03:55 PM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: %/VegBIEN.csv: for new-style datasources, use a symlink to mappings/VegCore-VegBIEN.csv directly instead of prefiltering VegCore-VegBIEN.csv to include only the columns in map.csv. prefiltering used to be performed as part of mapping the map.csv VegCore output terms to VegBIEN using bin/join, but is no longer needed because the staging table columns are now VegCore terms. instead, the full VegCore-VegBIEN.csv is needed so that derived columns added in stage I or II validations are detected by bin/map (rather than just the original source columns in map.csv).

10173 07/06/2013 03:37 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: cultivated, oldGrowth: use just cultivated if it's provided, rather than /_alt-ing it back with oldGrowth (which it was generated from)

10172 07/06/2013 03:30 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: fixed priority of cultivated and oldGrowth so cultivated is used first if it's available

10166 07/06/2013 11:29 AM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/data.csv for new-style datasources: need to include a blank row (plus a blank header) so that the metadata values are imported at least once instead of zero times, now that there is an installed staging table that will be iterated over. the blank row did not used to be necessary, because db_xml.put_table() has a special case for metadata-only tables with no installed table, which avoids iterating over the table's rows.

10163 07/03/2013 10:20 PM Aaron Marcuse-Kubitza

inputs/*/Source/ for new-style datasources: use an actual staging table instead of a metadata-only table, so that metadata values can be stored in the staging table instead of the map.csv (as will be required by new-style import)

10091 06/27/2013 12:28 PM Aaron Marcuse-Kubitza

added inputs/*/*/header.csv for CSV inputs, which are now generated by inputs/input.Makefile %/install

10090 06/27/2013 12:23 PM Aaron Marcuse-Kubitza

added inputs/FIA/*/{VegBIEN.csv,test.xml.ref}, which are now generated by the mapping process for the joined-together tables (even though they are not used by the import, because only occurrence_all is imported)

10086 06/27/2013 12:12 PM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: removed svn:executable attribute using `svn pdel svn:executable ...` now that these are not shell scripts

10085 06/27/2013 12:11 PM Aaron Marcuse-Kubitza

removed no longer needed inputs/FIA/import. use inputs/FIA/run instead.

10084 06/27/2013 12:10 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: changed to postprocess.sql for use by the runscripts

10083 06/27/2013 04:27 AM Aaron Marcuse-Kubitza

added inputs/FIA/run

10082 06/27/2013 04:26 AM Aaron Marcuse-Kubitza

added inputs/FIA/*/run. these do not yet use the postprocessing operations in */import.

10081 06/27/2013 04:24 AM Aaron Marcuse-Kubitza

added inputs/FIA/table.run (for use by table subdirs) and helper Makefile

10078 06/27/2013 01:41 AM Aaron Marcuse-Kubitza

bugfix: inputs/FIA/occurrence_all/import: don't re-prepend * to terms because this is a view, and the underlying columns have already been mapped

10073 06/26/2013 06:57 PM Aaron Marcuse-Kubitza

bugfix: inputs/FIA/REF_SPECIES/import: PLANT_SYMBOL_TYPE: prepended * since it's a datasource column, and needs to match up with *PLANT_SYMBOL_TYPE in other table for joins

10072 06/26/2013 06:57 PM Aaron Marcuse-Kubitza

bugfix: inputs/FIA/REF_SPECIES/import: PLANT_SYMBOL_TYPE: prepended * since it's a datasource column, and needs to match up with *PLANT_SYMBOL_TYPE in other table for joins

10070 06/26/2013 03:36 PM Aaron Marcuse-Kubitza

added inputs/FIA/_src/run, which runs ./download

10058 06/26/2013 12:00 PM Aaron Marcuse-Kubitza

inputs/FIA/_src/Makefile: Extraction: $(zips): use $(allZips) containing a zip for each state so that states that have not yet been downloaded and extracted (or had an empty dir created for them) will be downloaded. previously, the extract target only expanded existing zips but did not download new zips unless no zips had yet been downloaded. (this had been necessary because some states do not have a download, and the download of them would be continuously retried every time the Makefile was run.)

10057 06/26/2013 11:51 AM Aaron Marcuse-Kubitza

bugfix: inputs/FIA/_src/Makefile: `%: %.zip`: if unzip fails because the download does not exist, create an empty dir for the state instead of aborting make

10056 06/26/2013 11:33 AM Aaron Marcuse-Kubitza

inputs/FIA/_src/Makefile: use curl instead of wget because that is also available on Mac

10054 06/26/2013 11:05 AM Aaron Marcuse-Kubitza

inputs/FIA/SUBPLOT/map.csv, import: prepended * to all FIA terms to clearly distinguish them from the VegCore terms. this is the standard convention for all datasources, to indicate which terms have not yet been mapped, but was not yet implemented at the beginning of new-style import (the FIA refresh was the first new-style datasource)....

10053 06/26/2013 08:59 AM Aaron Marcuse-Kubitza

inputs/FIA/import_order.txt: added remaining src tables, whose runscripts will be invoked in the order listed by lib/runscripts/datasrc_dir.run

10052 06/26/2013 08:58 AM Aaron Marcuse-Kubitza

added inputs/FIA/*/_no_import to src tables that are joined together in occurrence_all and should not also be imported separately once they are in import_order.txt

9882 06/12/2013 10:49 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: filter out genera that contain numbers (using new _filter_genus()), which break TNRS and prevent it from matching any other parts of the name. later, these genera can instead be moved to the end of the name, where TNRS will correctly match them as Unmatched_terms.

9863 06/12/2013 06:32 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/test.xml.ref: update inserted row count

8949 05/09/2013 02:06 PM Aaron Marcuse-Kubitza

inputs/FIA/_archive/2011-10-17/: set svn:ignore to *

8839 05/06/2013 05:27 AM Aaron Marcuse-Kubitza

inputs/FIA/: archived no longer used BIEN2 FIA data from the nimoy geoscrub DB

8838 05/06/2013 05:26 AM Aaron Marcuse-Kubitza

inputs/FIA/_archive/: moved FIA_COND_unique, Organism into 2011-10-17/ subdir

8801 05/02/2013 08:53 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files

8332 04/04/2013 08:48 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Run mk_subset_by_row_num_func() to make the subset functions available for fast querying at large OFFSET values

8327 04/04/2013 08:14 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Added occurrence_all-row_num column for use with mk_subset_by_row_num_func()

8322 04/04/2013 05:24 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Use new force_update_view(), which only drops the view if its columns have changed and otherwise just uses CREATE OR REPLACE VIEW, rather than always first running DROP VIEW IF EXISTS

8318 04/04/2013 03:53 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/: Updated header.csv for new column order

8317 04/04/2013 03:40 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Use directional joins (LEFT/RIGHT JOIN) instead of inner joins to ensure that the PostgreSQL query planner always joins starting with the TREE table. Note that the directional joins are now needed for a different reason than when they were initially added, which had been to avoid slow sorts. The sorts (at least for LIMIT-only queries) went away when small tables such as COUNTY and REF_UNIT were added to the joins.

8316 04/04/2013 01:16 AM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Changed newlines between table and field name to - because the newlines mess up the flow of queries and also break pgAdmin's display of EXPLAIN output. The - was chosen because it's a non-whitespace character that linewraps in browsers, phpPgAdmin, and Google spreadsheets (although unfortunately not in pgAdmin). It is better than space because you can set a text editor to treat it as a word character, allowing the entire column name (<table>-<field>) to be selected by double-clicking it.

8256 03/28/2013 07:36 PM Aaron Marcuse-Kubitza

inputs/FIA/: Archived no longer used subdirs from BIEN2 export

8255 03/28/2013 07:29 PM Aaron Marcuse-Kubitza

inputs/FIA/: Archived no longer used subdirs from BIEN2 export

8253 03/28/2013 07:19 PM Aaron Marcuse-Kubitza

Added inputs/FIA/_archive

8248 03/28/2013 06:14 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Run remake_VegBIEN_mappings at end to keep mappings to next stage of import process up to date

8247 03/28/2013 06:14 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/: Accepted new test output

8243 03/28/2013 06:07 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/map.csv: Regenerated using new input table mappings

8240 03/28/2013 05:21 PM Aaron Marcuse-Kubitza

inputs/FIA/: Ran inputs/FIA/import. This maps to VegCore's commonName.

8237 03/28/2013 04:59 PM Aaron Marcuse-Kubitza

inputs/FIA/PLOT/map.csv: ELEV: Remapped to elevation_ft, assuming units based on the actual elevation of the region for a sample plot record

8234 03/28/2013 04:18 PM Aaron Marcuse-Kubitza

mappings/VegCore.htm: Regenerated from wiki. Added flower, fruit, commonName.

8222 03/28/2013 12:20 AM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/test.xml.ref: Updated inserted row count for new row sort order

8220 03/27/2013 11:43 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Removed no longer applicable comment that directional joins are needed for PostgreSQL query planner to avoid slow sorts

8219 03/27/2013 11:40 PM Aaron Marcuse-Kubitza

inputs/FIA/TREE/import: Reclustered table by TREE.parent path index, to facilitate path-order joins

8218 03/27/2013 11:39 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Changed all RIGHT JOINs to inner joins so that tables would be joined in path order (i.e. general->specific). This optimizes the incremental joins so that the small tables are joined to each other before being joined to the large tables, rather than each row of the large tables being looked up in the small tables. This effect may not be noticeable for small LIMIT values, but would become apparent for large LIMIT values, such as the 1-million-row partitions used by db_xml.put_table() for column-based import. Note that inner joins used to cause the query planner to produce incorrect results containing slow sorts, but now this appears to no longer be an issue, perhaps because the result is not sorted by the TREE.ID index (which is not in the same order as the path indexes *.unique, *.parent).

8217 03/27/2013 10:46 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Removed trailing whitespace

8216 03/27/2013 10:30 PM Aaron Marcuse-Kubitza

Removed unused inputs/FIA/COND_unique/. Use COND instead.

8215 03/27/2013 09:52 PM Aaron Marcuse-Kubitza

inputs/FIA/import: Use `set -o errexit` instead of putting ` || exit` after each command

8208 03/27/2013 08:35 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Run mk_derived after postprocessing commands

8207 03/27/2013 08:28 PM Aaron Marcuse-Kubitza

inputs/FIA/import_order.txt: Added occurrence_all/

8206 03/27/2013 08:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided

8205 03/27/2013 08:15 PM Aaron Marcuse-Kubitza

Added inputs/FIA/map.csv, which maps shared columns to VegCore

8204 03/27/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/FIA_COND_unique/test.xml.ref: Updated now that PLOT, CONDID have been mapped

8203 03/27/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv for pre-refresh tables: Added back * before unmapped column names

8192 03/27/2013 03:11 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Use new mk_*_col()

8186 03/27/2013 02:22 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/import: oldGrowth: Updated expr column names

8184 03/27/2013 12:54 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Removed util. before function names because util is in the search_path

8183 03/27/2013 12:43 PM Aaron Marcuse-Kubitza

schemas/*functions.sql: Renamed to *util.sql because now that these schemas are used by the new-style import scripts, there can be more than just functions in them

8179 03/25/2013 10:31 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/map.csv: Mapped SLOPE, ASPECT

8177 03/25/2013 09:45 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Replaced . between table and column name with newline, so that table viewers like pgAdmin will display both the table and column name at the left edge of the header cell, rather than displaying only the table name because the column name doesn't fit. This fixes the problem of seeing a bunch of columns whose names all start with a table name, and not knowing what each of them is. It also preserves the ability to see at a glance which table a column is in, which helps in navigating wide tables. Removed * before unmapped terms, because whether a term is mapped is generally obvious from the table name itself.

8176 03/25/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)

8174 03/25/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/import: Renamed COND.oldgrowth to VegCore name oldGrowth

8173 03/25/2013 07:52 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Ensured that joined columns are globally unique, so they don't map to an ambiguous VegCore term in the future

8172 03/25/2013 07:38 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Mapped terms to VegCore

8170 03/25/2013 06:57 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Updated column names to match map.csv

8168 03/25/2013 06:34 PM Aaron Marcuse-Kubitza

inputs/FIA/import: Abort if any invoked script encounters an error

8166 03/25/2013 04:55 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Removed no longer needed leading . from joined fields (globally-unique terms), because functions.to_global_col_names() is not used anymore

8165 03/25/2013 04:46 PM Aaron Marcuse-Kubitza

Added inputs/FIA/occurrence_all/, which combines all the core tables in a denormalized view. Note that it is not necessary to materialize this view into a (large) denormalized table, because the unique indexes and left/right joins allow the rows to be denormalized on the fly.

8164 03/25/2013 04:36 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Use map_table to set column names based on the contents of map.csv, instead of using functions.to_global_col_names() and functions.rename_if_exists(). Added map.csv for all tables.

8163 03/25/2013 03:19 PM Aaron Marcuse-Kubitza

inputs/FIA/: Changed postprocess.sql scripts to import scripts that can be run directly. Added top-level inputs/FIA/import to run all of them together.

8162 03/25/2013 03:05 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Removed trailing whitespace

8136 03/21/2013 06:01 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Avoid using :table, :table_str so that the commands in the script can also be run by pasting them into pgAdmin

8130 03/21/2013 01:27 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Added index on *.CN (autogen IDs)

8128 03/21/2013 01:08 AM Aaron Marcuse-Kubitza

inputs/FIA/TREE/postprocess.sql: TREE.unique index: Renamed to TREE.ID because this is on an autogenerated pkey rather than on domain values (for which a set of unique columns has not yet been found and may not exist)

8127 03/21/2013 01:03 AM Aaron Marcuse-Kubitza

inputs/FIA/REF_SPECIES/postprocess.sql: Matched SPECIES_SYMBOL to .SYMBOL. Added .SYMBOL_TYPE for use in joining to REF_PLANT_DICTIONARY.

8126 03/21/2013 12:41 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_UNIT/postprocess.sql

8125 03/21/2013 12:36 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_RESEARCH_STATION/postprocess.sql

8124 03/21/2013 12:17 AM Aaron Marcuse-Kubitza

Added inputs/FIA/COUNTY/postprocess.sql

8123 03/21/2013 12:02 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_PLANT_DICTIONARY/postprocess.sql

8122 03/20/2013 05:16 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Matched COND.HABTYPCD1, COND.HABTYPCD1_PUB_CD to REF_HABTYP_DESCRIPTION

8119 03/20/2013 04:30 PM Aaron Marcuse-Kubitza

inputs/FIA/REF_HABTYP_DESCRIPTION/postprocess.sql: Prepare columns for joining with COND

8112 03/20/2013 09:34 AM Aaron Marcuse-Kubitza

inputs/FIA/REF_SPECIES/postprocess.sql: Cast ID column to integer