input.Makefile: Add the bin folder to the PATH so .make scripts can easily use programs in it
input.Makefile: Staging tables installation: Support installing a DB export directly into the staging schema, without needing to first export it as CSVs
inputs/SALVIAS/: Added _src/ subdir to store original DB export (before re-export in a PostgreSQL-compatible form)
input.Makefile: `%: %.make`: Only remake if doesn't exist. This prevents unintentional remaking when the make script is newly checked out from svn (which sets the mod time to now) but the output is synced externally.
input.Makefile: `%: .make`: Removed no longer applicable comment, which applied when there were two separate `: %.make`-related rules
input.Makefile: Use $(inDatasrc) wherever its value was used
input.Makefile: Added $(inDatasrc)
sql_io.py: cleanup_table(): Only clean up text columns, to support staging tables with other column types
sql_gen.py: Added is_text_col()
sql_io.py: cleanup_table(): Add table to each column so its type can later be determined from the DB
inputs/NY/verify/specimens.ref: Regenerated from specimens.ref.sql. The counts have changed slightly because this is derived directly from the NY CSV file, rather than from the nybg_raw BIEN2 staging table.
inputs/NY/verify/specimens.ref.sql: Retrofitted to use PostgreSQL instead of MySQL syntax, since this now runs on the PostgreSQL staging tables
input.Makefile: Verification of import: Added `%.ref: %.ref.sql` rule to make datasource's summary statistics from its staging tables. (This was previously run on a MySQL installation of the datasource, and thus limited to MySQL inputs, but we are now able to use the staging tables for this.)
input.Makefile: Verification of import: $(verify): Factored psql command with output format settings into separate $(psqlExport) var
schemas/vegbien.sql: analytical_db_view: Switched join order of location and party (datasource) tables, to facilitate using a nested loop join to fill in the datasource names
schemas/vegbien.sql: party: Added party_datasource index on just the organizationname to facilitate querying just the datasources
schemas/vegbien.sql: make_analytical_db(): Removed explicit schema reference so that the function can be redirected to use the current (rotated) schema using the search_path
schemas/Makefile: Removed no longer needed analytical_db, which has been replaced by bin/make_analytical_db
README.TXT: After a new import: Use bin/make_analytical_db instead of `make schemas/analytical_db`, and run it asynchronously because it takes a long time
Added make_analytical_db
schemas/Makefile: Analytical DB: analytical_db: Time the creation of the analytical DB
README.TXT: After a new import: Added command to make the analytical DB
schemas/Makefile: Added analytical_db target
schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.
schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)
inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.
sql_io.py: null_strs: Added 'UNKNOWN'
Added inputs/FIA/
inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>
inputs/Madidi/1.organisms/map.csv: Mapped columns
inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude
input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)
mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS
inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.
Added inputs/VegBank/ with DB export
input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited
input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements
input.Makefile: General targets: `%: %.make`: Time the creating command
input.Makefile: General targets: Removed duplicate `%: %.make` rule
inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
mappings/Veg+-VegCore.csv: Mapped speciesInvID
mappings/Veg+.terms.csv: Added speciesInvID
mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID
mappings/Veg+.terms.csv: Added taxonOccurrenceID
inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.
inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only one of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).
inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID
mappings/VegCore-VegBIEN.csv: Mapped authorStemCode
mappings/Veg+.terms.csv: Added authorStemCode
mappings/VegCore-VegBIEN.csv: Mapped stemID
inputs/SALVIAS/2.stems/map.csv: Mapped stem_id
README.TXT: Datasource setup: Added steps to install any MySQL export
mappings/Veg+-VegCore.csv: Mapped stem_id
repl: Support treating all patterns as plain text (non-regexp)
mappings/Veg+.terms.csv: Added stem_id
mappings/Veg+.terms.csv: Added stemID
mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName
mappings/Veg+.terms.csv: Added CTFS taxonomic name columns
mappings/Veg+.terms.csv: Removed comments not applicable to the term itself
Inputs with multiple tables: Added explicit import_order.txt files, so that sort orders can later be removed from the subdir names
inputs/CTFS/: Added StemObservation/ and tables it is joined from
mappings/Veg+-VegCore.csv: Mapped stemTag
mappings/Veg+.terms.csv: Added stemTag
mappings/Veg+-VegCore.csv: Mapped DBH
mappings/Veg+.terms.csv: Added DBH
input.Makefile: Maps building: Added comment that you cannot make a subdir separately from the entire datasource dir
inputs/CTFS/Plot/create.sql: Added newline at end of file
inputs/CTFS/: Renamed Site.src to Plot.src to use a VegCSV name for the table
README.TXT: Datasource setup: Adding input data for each table: `make inputs/<datasrc>/<table>/add`: Added note explaining why you need to use this command instead of just creating an empty directory of the desired name
inputs/CTFS/: Added SubplotObservation/
mappings/VegCore-VegBIEN.csv: Redirect eventID, fieldNumber (authoreventcode) to parent locationevent when subplot columns exist
inputs/CTFS/import_order.txt: Added PlotObservation
inputs/CTFS/PlotObservation/: Remade (hadn't been automatically remade because it wasn't part of import_order.txt)
mappings/VegCore-VegBIEN.csv: Also redirect locationID/plotName to parent location if subplotID column was provided
mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Use _first to remove specimens-related alternatives for this field from consideration when plots-related alternatives exist. This avoids unintentionally using specimens-related columns for this field in plots data.
xml_func.py: Added _first() simplifying function
xml_func.py: Added helper functions variadic_args() and map_names()
mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Placed inside "if subplot" _if statement along with sourceaccessioncode to reduce the number of separate _if statements needing a condition mapping
xml_dom.py: NodeEntryIter: Support entries with multiple children
xml_dom.py: replace(): Support a list of new nodes to replace the old node with
xml_dom.py: Moved only_child() near related method has_one_child()
xml_dom.py: only_child(): Raise exception instead of failing assertion. Include invalid node in exception message for easier debugging.
xml_dom.py: Added only_child() and use it where its definition was used
mappings/VegCore-VegBIEN.csv: Changed _merge to _join wherever the duplicate-eliminating functionality of _merge is not needed and a simple concatenation of non-NULL values is sufficient
xml_func.py: Added _join() simplifying function
schemas/functions.sql: Added _join()
mappings/VegCore-VegBIEN.csv: Moved "if subplot" _if statement around /location/parent_id and /location/sourceaccessioncode themselves, so that only one _if cond mapping for subplot is needed. Note that this is only possible because this _if statement uses _exists, allowing it to be fully evaluated by the XML template simplifying mechanism, which supports subtrees as arguments to _if.
mappings/VegCore-VegBIEN.csv: Removed no longer used parentLocationID, parentPlotName (locationID and plotName now automatically map to the correct location). mappings/Veg+-VegCore.csv: Removed no longer used parentPlotID.
xml_func.py: passthru(): Use xml_dom.prune() so that after empty children are removed, the node itself is also removed if it's empty. This enables further pruning of any node that contains the pruned node.
xml_dom.py: Added prune()
xml_func.py: Removed no longer used prune() (use xml_dom.prune_children() instead)
xml_func.py: Use new xml_dom.prune_children()
xml_dom.py: Added prune_empty() and prune_children()
inputs/CTFS/: Moved VegX export subdir to _archive and renamed it to remove ".disabled" suffix and have a VegCSV-like name
inputs/CTFS/: Renamed README.TXT to DFtemp.analysis_query.txt because it relates only to a particular query from Shash, and moved it to the _archive/ subdir
inputs/CTFS/: Moved source files into new _src/ subdir to avoid cluttering up the main dir