Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4392 08/31/2012 08:15 PM Aaron Marcuse-Kubitza

schemas/Makefile: Added analytical_db target

4391 08/31/2012 08:09 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.

4390 08/31/2012 08:05 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)

4389 08/31/2012 02:23 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.

4388 08/30/2012 04:54 PM Aaron Marcuse-Kubitza

sql_io.py: null_strs: Added 'UNKNOWN'

4387 08/30/2012 04:02 PM Aaron Marcuse-Kubitza

Added inputs/FIA/

4386 08/30/2012 12:45 PM Aaron Marcuse-Kubitza

inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>

4385 08/30/2012 12:37 PM Aaron Marcuse-Kubitza

inputs/Madidi/1.organisms/map.csv: Mapped columns

4384 08/30/2012 11:46 AM Aaron Marcuse-Kubitza

inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude

4383 08/30/2012 11:40 AM Aaron Marcuse-Kubitza

input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)

4382 08/30/2012 11:35 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS

4381 08/30/2012 11:23 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.

4380 08/30/2012 11:10 AM Aaron Marcuse-Kubitza

Added inputs/VegBank/ with DB export

4379 08/30/2012 11:04 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited

4378 08/30/2012 11:00 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements

4377 08/30/2012 10:57 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Time the creating command

4376 08/30/2012 10:55 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: Removed duplicate `%: %.make` rule

4375 08/30/2012 10:43 AM Aaron Marcuse-Kubitza

inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused

4374 08/30/2012 10:42 AM Aaron Marcuse-Kubitza

inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused

4373 08/30/2012 10:32 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped speciesInvID

4372 08/30/2012 10:27 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added speciesInvID

4371 08/30/2012 10:25 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID

4370 08/30/2012 10:22 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added taxonOccurrenceID

4369 08/30/2012 10:14 AM Aaron Marcuse-Kubitza

inputs/CTFS/: Added TaxonOccurrence/ and its joined tables

4368 08/30/2012 10:13 AM Aaron Marcuse-Kubitza

inputs/CTFS/: Added TaxonOccurrence/ and its joined tables

4367 08/30/2012 10:06 AM Aaron Marcuse-Kubitza

inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.

4366 08/30/2012 09:42 AM Aaron Marcuse-Kubitza

inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only one of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).

4365 08/30/2012 09:08 AM Aaron Marcuse-Kubitza

inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID

4364 08/30/2012 09:05 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped authorStemCode

4363 08/30/2012 08:58 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added authorStemCode

4362 08/30/2012 08:58 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped stemID

4361 08/30/2012 08:52 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/2.stems/map.csv: Mapped stem_id

4360 08/30/2012 08:46 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added steps to install any MySQL export

4359 08/30/2012 08:13 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped stemID

4358 08/30/2012 08:10 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped stem_id

4357 08/30/2012 08:05 AM Aaron Marcuse-Kubitza

repl: Support treating all patterns as plain text (non-regexp)

4356 08/30/2012 07:52 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stem_id

4355 08/30/2012 07:51 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stemID

4354 08/30/2012 07:44 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName

4353 08/30/2012 07:43 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added CTFS taxonomic name columns

4352 08/30/2012 07:28 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Removed comments not applicable to the term itself

4351 08/30/2012 07:25 AM Aaron Marcuse-Kubitza

Inputs with multiple tables: Added explicit import_order.txt files, so that sort orders can later be removed from the subdir names

4350 08/29/2012 11:17 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Added StemObservation/ and tables it is joined from

4349 08/29/2012 11:09 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped stemTag

4348 08/29/2012 11:08 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stemTag

4347 08/29/2012 11:04 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped DBH

4346 08/29/2012 11:02 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added DBH

4345 08/29/2012 10:58 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Added comment that you cannot make a subdir separately from the entire datasource dir

4344 08/29/2012 10:17 PM Aaron Marcuse-Kubitza

inputs/CTFS/Plot/create.sql: Added newline at end of file

4343 08/29/2012 10:04 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Renamed Site.src to Plot.src to use a VegCSV name for the table

4342 08/29/2012 10:01 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Adding input data for each table: `make inputs/<datasrc>/<table>/add`: Added note explaining why you need to use this command instead of just creating an empty directory of the desired name

4341 08/29/2012 08:44 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Added SubplotObservation/

4340 08/29/2012 08:38 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Redirect eventID, fieldNumber (authoreventcode) to parent locationevent when subplot columns exist

4339 08/29/2012 08:23 PM Aaron Marcuse-Kubitza

inputs/CTFS/import_order.txt: Added PlotObservation

4338 08/29/2012 08:23 PM Aaron Marcuse-Kubitza

inputs/CTFS/PlotObservation/: Remade (hadn't been automatically remade because it wasn't part of import_order.txt)

4337 08/29/2012 08:13 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also redirect locationID/plotName to parent location if subplotID column was provided

4336 08/29/2012 08:08 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Use _first to remove specimens-related alternatives for this field from consideration when plots-related alternatives exist. This avoids unintentionally using specimens-related columns for this field in plots data.

4335 08/29/2012 08:06 PM Aaron Marcuse-Kubitza

xml_func.py: Added _first() simplifying function

4334 08/29/2012 08:05 PM Aaron Marcuse-Kubitza

xml_func.py: Added helper functions variadic_args() and map_names()

4333 08/29/2012 07:38 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Placed inside "if subplot" _if statement along with sourceaccessioncode to reduce the number of separate _if statements needing a condition mapping

4332 08/29/2012 07:32 PM Aaron Marcuse-Kubitza

xml_dom.py: NodeEntryIter: Support entries with multiple children

4331 08/29/2012 07:20 PM Aaron Marcuse-Kubitza

xml_dom.py: replace(): Support a list of new nodes to replace the old node with

4330 08/29/2012 07:01 PM Aaron Marcuse-Kubitza

xml_dom.py: Moved only_child() near related method has_one_child()

4329 08/29/2012 07:00 PM Aaron Marcuse-Kubitza

xml_dom.py: only_child(): Raise exception instead of failing assertion. Include invalid node in exception message for easier debugging.

4328 08/29/2012 06:57 PM Aaron Marcuse-Kubitza

xml_dom.py: Added only_child() and use it where its definition was used

4327 08/29/2012 06:33 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Changed _merge to _join wherever the duplicate-eliminating functionality of _merge is not needed and a simple concatenation of non-NULL values is sufficient

4326 08/29/2012 06:24 PM Aaron Marcuse-Kubitza

xml_func.py: Added _join() simplifying function

4325 08/29/2012 06:22 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added _join()

4324 08/29/2012 06:18 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Moved "if subplot" _if statement around /location/parent_id and /location/sourceaccessioncode themselves, so that only one _if cond mapping for subplot is needed. Note that this is only possible because this _if statement uses _exists, allowing it to be fully evaluated by the XML template simplifying mechanism, which supports subtrees as arguments to _if.

4323 08/29/2012 06:06 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed no longer used parentLocationID, parentPlotName (locationID and plotName now automatically map to the correct location). mappings/Veg+-VegCore.csv: Removed no longer used parentPlotID.

4322 08/29/2012 05:57 PM Aaron Marcuse-Kubitza

xml_func.py: passthru(): Use xml_dom.prune() so that after empty children are removed, the node itself is also removed if it's empty. This enables further pruning of any node that contains the pruned node.

4321 08/29/2012 05:55 PM Aaron Marcuse-Kubitza

xml_dom.py: Added prune()

4320 08/29/2012 05:52 PM Aaron Marcuse-Kubitza

xml_func.py: Removed no longer used prune() (use xml_dom.prune_children() instead)

4319 08/29/2012 05:51 PM Aaron Marcuse-Kubitza

xml_func.py: Use new xml_dom.prune_children()

4318 08/29/2012 05:51 PM Aaron Marcuse-Kubitza

xml_dom.py: Added prune_empty() and prune_children()

4317 08/29/2012 05:29 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Moved VegX export subdir to _archive and renamed it to remove ".disabled" suffix and have a VegCSV-like name

4316 08/29/2012 05:24 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Renamed README.TXT to DFtemp.analysis_query.txt because it relates only to a particular query from Shash, and moved it to the _archive/ subdir

4315 08/29/2012 05:21 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Moved source files into new _src/ subdir to avoid cluttering up the main dir

4314 08/29/2012 05:16 PM Aaron Marcuse-Kubitza

Added inputs/CTFS/_src/

4313 08/29/2012 05:02 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Added non-data files that weren't under version control

4312 08/29/2012 04:59 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Moved _scripts_to_drop_extra_tables to _archive because they are for a different version of the CTFS database than the extract we received (bci.sql)

4311 08/29/2012 04:57 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Moved DBv5.txt to _archive because it's for a different version of the CTFS database than the extract we received (bci.sql)

4310 08/29/2012 04:49 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Moved CTFS_conversion_bci.php to _archive since it's just for the DFtemp (aggregated) mapping

4309 08/29/2012 04:48 PM Aaron Marcuse-Kubitza

Added inputs/CTFS/_archive

4308 08/29/2012 04:39 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4307 08/28/2012 07:56 PM Aaron Marcuse-Kubitza

Added inputs/CTFS/PlotObservation/

4306 08/28/2012 07:54 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: fieldNumber (authoreventcode): Don't copy to location.authorlocationcode if an actual locationID was specified

4305 08/28/2012 07:51 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Removed no longer needed pass-through optimizations for XML functions, which are now handled by each function's own simplifying function

4304 08/28/2012 07:50 PM Aaron Marcuse-Kubitza

xml_func.py: Added _name simplifying function

4303 08/28/2012 07:48 PM Aaron Marcuse-Kubitza

xml_func.py: Added _alt, _merge simplifying functions

4302 08/28/2012 07:45 PM Aaron Marcuse-Kubitza

xml_func.py: passthru(): First prune the node

4301 08/28/2012 07:43 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Use new passthru()

4300 08/28/2012 07:43 PM Aaron Marcuse-Kubitza

xml_func.py: Added passthru()

4299 08/28/2012 07:36 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Use new prune()

4298 08/28/2012 07:36 PM Aaron Marcuse-Kubitza

xml_func.py: Added prune()

4297 08/28/2012 07:26 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped eventID

4296 08/28/2012 07:24 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped CTFS Census terms

4295 08/28/2012 07:20 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added CTFS Census terms

4294 08/28/2012 07:17 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Changed plotEventStartDate, plotEventEndDate to startDate, endDate because a date range always applies to the event

4293 08/28/2012 07:13 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added startDate, endDate