Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4429 09/05/2012 01:39 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Also translate MySQL data to PostgreSQL

4428 09/05/2012 01:38 AM Aaron Marcuse-Kubitza

Added my2pg.data

4427 09/05/2012 01:28 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Place MySQL exports in separate _MySQL/ subdir so they don't clutter up the main dir, which will contain PostgreSQL translations

4426 09/05/2012 01:03 AM Aaron Marcuse-Kubitza

Added my2pg

4425 09/05/2012 01:02 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: DB exports: Concatenate all exports together, with schemas first, so that any config options which were applied only in the schema export will remain active when the data is imported. Changed `%.pg.sql: .my.sql` to `.schema.sql: %.schema.my.sql` so there doesn't need to be a .pg suffix for PostgreSQL schemas and only the schema gets translated.

4424 09/05/2012 12:15 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: $(dbExports): Don't consider MySQL DB exports as part of the DB exports that get installed, because they are not directly installable

4423 09/05/2012 12:13 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Added `%.pg.sql: %.my.sql` to translate MySQL DB schemas to PostgreSQL

4422 09/04/2012 09:20 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/_src/: Added salvias_plots.sql.url to provide a link to where salvias_plots.sql was exported from (it was not a raw file given to us by the data provider)

4421 09/04/2012 08:57 PM Aaron Marcuse-Kubitza

Added cc_tty

4420 09/04/2012 08:57 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: `%: %.make`: Don't automatically redirect stderr to a log file, because some .make scripts need to display password prompts, etc. on the TTY and output them to stderr instead of /dev/tty

4419 09/04/2012 08:49 PM Aaron Marcuse-Kubitza

inputs/REMIB/nodes.make: Fixed bin dir path for new subdir layout

4418 09/04/2012 08:48 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/tapir.make: Write log messages to a log file ($0.log) instead of to stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.

4417 09/04/2012 08:41 PM Aaron Marcuse-Kubitza

inputs/REMIB/nodes.make: Updated path to node exports to use new subdir layout (in Specimen subdir, and without .specimens suffix)

4416 09/04/2012 08:38 PM Aaron Marcuse-Kubitza

inputs/REMIB/nodes.make: Fixed lib dir path in sys.path.append() for new subdir layout

4415 09/04/2012 08:37 PM Aaron Marcuse-Kubitza

inputs/REMIB/nodes.make: Write log messages to a log file ($0.log) instead of to sys.stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.

4414 09/04/2012 08:23 PM Aaron Marcuse-Kubitza

input.Makefile: Add the bin folder to the PATH so .make scripts can easily use programs in it

4413 09/04/2012 08:06 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Support installing a DB export directly into the staging schema, without needing to first export it as CSVs

4412 09/04/2012 07:52 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/: Added _src/ subdir to store original DB export (before re-export in a PostgreSQL-compatible form)

4411 09/04/2012 07:31 PM Aaron Marcuse-Kubitza

input.Makefile: `%: %.make`: Only remake if doesn't exist. This prevents unintentional remaking when the make script is newly checked out from svn (which sets the mod time to now) but the output is synced externally.

4410 09/04/2012 07:23 PM Aaron Marcuse-Kubitza

input.Makefile: `%: .make`: Removed no longer applicable comment, which applied when there were two separate `: %.make`-related rules

4409 09/04/2012 06:55 PM Aaron Marcuse-Kubitza

input.Makefile: Use $(inDatasrc) wherever its value was used

4408 09/04/2012 06:54 PM Aaron Marcuse-Kubitza

input.Makefile: Added $(inDatasrc)

4407 09/04/2012 06:40 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Only clean up text columns, to support staging tables with other column types

4406 09/04/2012 06:40 PM Aaron Marcuse-Kubitza

sql_gen.py: Added is_text_col()

4405 09/04/2012 06:29 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Add table to each column so its type can later be determined from the DB

4404 09/04/2012 06:13 PM Aaron Marcuse-Kubitza

inputs/NY/verify/specimens.ref: Regenerated from specimens.ref.sql. The counts have changed slightly because this is derived directly from the NY CSV file, rather than from the nybg_raw BIEN2 staging table.

4403 09/04/2012 06:11 PM Aaron Marcuse-Kubitza

inputs/NY/verify/specimens.ref.sql: Retrofitted to use PostgreSQL instead of MySQL syntax, since this now runs on the PostgreSQL staging tables

4402 09/04/2012 06:09 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: Added `%.ref: %.ref.sql` rule to make datasource's summary statistics from its staging tables. (This was previously run on a MySQL installation of the datasource, and thus limited to MySQL inputs, but we are now able to use the staging tables for this.)

4401 09/04/2012 06:04 PM Aaron Marcuse-Kubitza

input.Makefile: Verification of import: $(verify): Factored psql command with output format settings into separate $(psqlExport) var

4400 09/04/2012 05:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_db_view: Switched join order of location and party (datasource) tables, to facilitate using a nested loop join to fill in the datasource names

4399 09/04/2012 05:55 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: party: Added party_datasource index on just the organizationname to facilitate querying just the datasources

4398 09/04/2012 04:25 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: make_analytical_db(): Removed explicit schema reference so that the function can be redirected to use the current (rotated) schema using the search_path

4397 08/31/2012 08:32 PM Aaron Marcuse-Kubitza

schemas/Makefile: Removed no longer needed analytical_db, which has been replaced by bin/make_analytical_db

4396 08/31/2012 08:31 PM Aaron Marcuse-Kubitza

README.TXT: After a new import: Use bin/make_analytical_db instead of `make schemas/analytical_db`, and run it asynchronously because it takes a long time

4395 08/31/2012 08:29 PM Aaron Marcuse-Kubitza

Added make_analytical_db

4394 08/31/2012 08:22 PM Aaron Marcuse-Kubitza

schemas/Makefile: Analytical DB: analytical_db: Time the creation of the analytical DB

4393 08/31/2012 08:18 PM Aaron Marcuse-Kubitza

README.TXT: After a new import: Added command to make the analytical DB

4392 08/31/2012 08:15 PM Aaron Marcuse-Kubitza

schemas/Makefile: Added analytical_db target

4391 08/31/2012 08:09 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.

4390 08/31/2012 08:05 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)

4389 08/31/2012 02:23 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.

4388 08/30/2012 04:54 PM Aaron Marcuse-Kubitza

sql_io.py: null_strs: Added 'UNKNOWN'

4387 08/30/2012 04:02 PM Aaron Marcuse-Kubitza

Added inputs/FIA/

4386 08/30/2012 12:45 PM Aaron Marcuse-Kubitza

inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>

4385 08/30/2012 12:37 PM Aaron Marcuse-Kubitza

inputs/Madidi/1.organisms/map.csv: Mapped columns

4384 08/30/2012 11:46 AM Aaron Marcuse-Kubitza

inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude

4383 08/30/2012 11:40 AM Aaron Marcuse-Kubitza

input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)

4382 08/30/2012 11:35 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS

4381 08/30/2012 11:23 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.

4380 08/30/2012 11:10 AM Aaron Marcuse-Kubitza

Added inputs/VegBank/ with DB export

4379 08/30/2012 11:04 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited

4378 08/30/2012 11:00 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements

4377 08/30/2012 10:57 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: `%: %.make`: Time the creating command

4376 08/30/2012 10:55 AM Aaron Marcuse-Kubitza

input.Makefile: General targets: Removed duplicate `%: %.make` rule

4375 08/30/2012 10:43 AM Aaron Marcuse-Kubitza

inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused

4374 08/30/2012 10:42 AM Aaron Marcuse-Kubitza

inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused

4373 08/30/2012 10:32 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped speciesInvID

4372 08/30/2012 10:27 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added speciesInvID

4371 08/30/2012 10:25 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID

4370 08/30/2012 10:22 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added taxonOccurrenceID

4369 08/30/2012 10:14 AM Aaron Marcuse-Kubitza

inputs/CTFS/: Added TaxonOccurrence/ and its joined tables

4368 08/30/2012 10:13 AM Aaron Marcuse-Kubitza

inputs/CTFS/: Added TaxonOccurrence/ and its joined tables

4367 08/30/2012 10:06 AM Aaron Marcuse-Kubitza

inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.

4366 08/30/2012 09:42 AM Aaron Marcuse-Kubitza

inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only one of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).

4365 08/30/2012 09:08 AM Aaron Marcuse-Kubitza

inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID

4364 08/30/2012 09:05 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped authorStemCode

4363 08/30/2012 08:58 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added authorStemCode

4362 08/30/2012 08:58 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped stemID

4361 08/30/2012 08:52 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/2.stems/map.csv: Mapped stem_id

4360 08/30/2012 08:46 AM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Added steps to install any MySQL export

4359 08/30/2012 08:13 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped stemID

4358 08/30/2012 08:10 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped stem_id

4357 08/30/2012 08:05 AM Aaron Marcuse-Kubitza

repl: Support treating all patterns as plain text (non-regexp)

4356 08/30/2012 07:52 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stem_id

4355 08/30/2012 07:51 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stemID

4354 08/30/2012 07:44 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName

4353 08/30/2012 07:43 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added CTFS taxonomic name columns

4352 08/30/2012 07:28 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Removed comments not applicable to the term itself

4351 08/30/2012 07:25 AM Aaron Marcuse-Kubitza

Inputs with multiple tables: Added explicit import_order.txt files, so that sort orders can later be removed from the subdir names

4350 08/29/2012 11:17 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Added StemObservation/ and tables it is joined from

4349 08/29/2012 11:09 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped stemTag

4348 08/29/2012 11:08 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added stemTag

4347 08/29/2012 11:04 PM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Mapped DBH

4346 08/29/2012 11:02 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added DBH

4345 08/29/2012 10:58 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Added comment that you cannot make a subdir separately from the entire datasource dir

4344 08/29/2012 10:17 PM Aaron Marcuse-Kubitza

inputs/CTFS/Plot/create.sql: Added newline at end of file

4343 08/29/2012 10:04 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Renamed Site.src to Plot.src to use a VegCSV name for the table

4342 08/29/2012 10:01 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Adding input data for each table: `make inputs/<datasrc>/<table>/add`: Added note explaining why you need to use this command instead of just creating an empty directory of the desired name

4341 08/29/2012 08:44 PM Aaron Marcuse-Kubitza

inputs/CTFS/: Added SubplotObservation/

4340 08/29/2012 08:38 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Redirect eventID, fieldNumber (authoreventcode) to parent locationevent when subplot columns exist

4339 08/29/2012 08:23 PM Aaron Marcuse-Kubitza

inputs/CTFS/import_order.txt: Added PlotObservation

4338 08/29/2012 08:23 PM Aaron Marcuse-Kubitza

inputs/CTFS/PlotObservation/: Remade (hadn't been automatically remade because it wasn't part of import_order.txt)

4337 08/29/2012 08:13 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Also redirect locationID/plotName to parent location if subplotID column was provided

4336 08/29/2012 08:08 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Use _first to remove specimens-related alternatives for this field from consideration when plots-related alternatives exist. This avoids unintentionally using specimens-related columns for this field in plots data.

4335 08/29/2012 08:06 PM Aaron Marcuse-Kubitza

xml_func.py: Added _first() simplifying function

4334 08/29/2012 08:05 PM Aaron Marcuse-Kubitza

xml_func.py: Added helper functions variadic_args() and map_names()

4333 08/29/2012 07:38 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: location.authorlocationcode mappings: Placed inside "if subplot" _if statement along with sourceaccessioncode to reduce the number of separate _if statements needing a condition mapping

4332 08/29/2012 07:32 PM Aaron Marcuse-Kubitza

xml_dom.py: NodeEntryIter: Support entries with multiple children

4331 08/29/2012 07:20 PM Aaron Marcuse-Kubitza

xml_dom.py: replace(): Support a list of new nodes to replace the old node with

4330 08/29/2012 07:01 PM Aaron Marcuse-Kubitza

xml_dom.py: Moved only_child() near related method has_one_child()