/ - Changes - BIEN 3 - NCEAS Projects

root @ 4451

#	Date	Author	Comment
4451	09/05/2012 05:22 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/: Switched to using the DB export's staging tables instead of the exported CSVs
4450	09/05/2012 05:08 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Treat empty subdirs as referencing an already-installed staging table, and run cleanup and header export operations on them
4449	09/05/2012 04:48 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Factored out cleanup and header export operations for reuse in other types of table subdirs
4448	09/05/2012 04:23 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: `%/install: %/create.sql`: Removed deprecated (but benign) errors_table_only option to csv2db. Run csv2db without a command in order to clean up the created staging table.
4447	09/05/2012 03:57 AM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Removed no longer used cols param
4446	09/05/2012 03:56 AM	Aaron Marcuse-Kubitza	csv2db: When no command is specified, just clean up the specified table
4445	09/05/2012 03:55 AM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Always clean up all columns in the table
4444	09/05/2012 03:43 AM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Handle NullValueExceptions (due to setting values to NULL in a NOT NULL column) by dropping the NOT NULL constraint
4443	09/05/2012 03:32 AM	Aaron Marcuse-Kubitza	sql.py: Added drop_not_null()
4442	09/05/2012 03:29 AM	Aaron Marcuse-Kubitza	sql_gen.py: is_text_col(): Also consider character varying to be a text type
4441	09/05/2012 03:07 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer used errors_table_only option
4440	09/05/2012 03:00 AM	Aaron Marcuse-Kubitza	README.TXT: Schema changes: Removed step to reinstall errors tables, because they are now created automatically by column-based import
4439	09/05/2012 02:59 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import
4438	09/05/2012 02:58 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(dbExports): Fixed bug where it would be non-empty even when the input contains no DB exports, because += adds extra whitespace. This caused sql/install to be incorrectly included as part of $(allInstalls).
4437	09/05/2012 02:49 AM	Aaron Marcuse-Kubitza	db_xml.py: put_table(): Create errors table if it doesn't exist
4436	09/05/2012 02:48 AM	Aaron Marcuse-Kubitza	sql_io.py: Added mk_errors_table()
4435	09/05/2012 02:05 AM	Aaron Marcuse-Kubitza	inputs/Makefile: Input data: $(rsyncSrcs): Also exclude logs subdirs located at more than one level below the root, which occurs for example when a table subdir is moved into _archive/
4434	09/05/2012 01:56 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: sql/install: Fixed bug where _always was part of $+, causing cat to try to cat this nonexistent file
4433	09/05/2012 01:51 AM	Aaron Marcuse-Kubitza	Added inputs/SALVIAS/salvias_plots.schema.sql
4432	09/05/2012 01:50 AM	Aaron Marcuse-Kubitza	Added inputs/SALVIAS/_MySQL/
4431	09/05/2012 01:47 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: MySQL exports: Run all non-data-only exports through my2pg, not just schema-only exports. This supports transforming a combined schema+data export.
4430	09/05/2012 01:42 AM	Aaron Marcuse-Kubitza	my2pg: Also perform data-only replacements, since default values can contain data-specific replacements. This also allows my2pg to transform a combined schema+data export.
4429	09/05/2012 01:39 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Also translate MySQL data to PostgreSQL
4428	09/05/2012 01:38 AM	Aaron Marcuse-Kubitza	Added my2pg.data
4427	09/05/2012 01:28 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Place MySQL exports in separate _MySQL/ subdir so they don't clutter up the main dir, which will contain PostgreSQL translations
4426	09/05/2012 01:03 AM	Aaron Marcuse-Kubitza	Added my2pg
4425	09/05/2012 01:02 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: DB exports: Concatenate all exports together, with schemas first, so that any config options which were applied only in the schema export will remain active when the data is imported. Changed `%.pg.sql: .my.sql` to `.schema.sql: %.schema.my.sql` so there doesn't need to be a .pg suffix for PostgreSQL schemas and only the schema gets translated.
4424	09/05/2012 12:15 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: $(dbExports): Don't consider MySQL DB exports as part of the DB exports that get installed, because they are not directly installable
4423	09/05/2012 12:13 AM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Added `%.pg.sql: %.my.sql` to translate MySQL DB schemas to PostgreSQL
4422	09/04/2012 09:20 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/_src/: Added salvias_plots.sql.url to provide a link to where salvias_plots.sql was exported from (it was not a raw file given to us by the data provider)
4421	09/04/2012 08:57 PM	Aaron Marcuse-Kubitza	Added cc_tty
4420	09/04/2012 08:57 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: `%: %.make`: Don't automatically redirect stderr to a log file, because some .make scripts need to display password prompts, etc. on the TTY and output them to stderr instead of /dev/tty
4419	09/04/2012 08:49 PM	Aaron Marcuse-Kubitza	inputs/REMIB/nodes.make: Fixed bin dir path for new subdir layout
4418	09/04/2012 08:48 PM	Aaron Marcuse-Kubitza	inputs/SpeciesLink/tapir.make: Write log messages to a log file ($0.log) instead of to stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
4417	09/04/2012 08:41 PM	Aaron Marcuse-Kubitza	inputs/REMIB/nodes.make: Updated path to node exports to use new subdir layout (in Specimen subdir, and without .specimens suffix)
4416	09/04/2012 08:38 PM	Aaron Marcuse-Kubitza	inputs/REMIB/nodes.make: Fixed lib dir path in sys.path.append() for new subdir layout
4415	09/04/2012 08:37 PM	Aaron Marcuse-Kubitza	inputs/REMIB/nodes.make: Write log messages to a log file ($0.log) instead of to sys.stderr, because the verbose log messages should not fill up stderr. To view the progress, you should instead tail the created log file.
4414	09/04/2012 08:23 PM	Aaron Marcuse-Kubitza	input.Makefile: Add the bin folder to the PATH so .make scripts can easily use programs in it
4413	09/04/2012 08:06 PM	Aaron Marcuse-Kubitza	input.Makefile: Staging tables installation: Support installing a DB export directly into the staging schema, without needing to first export it as CSVs
4412	09/04/2012 07:52 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/: Added _src/ subdir to store original DB export (before re-export in a PostgreSQL-compatible form)
4411	09/04/2012 07:31 PM	Aaron Marcuse-Kubitza	input.Makefile: `%: %.make`: Only remake if doesn't exist. This prevents unintentional remaking when the make script is newly checked out from svn (which sets the mod time to now) but the output is synced externally.
4410	09/04/2012 07:23 PM	Aaron Marcuse-Kubitza	input.Makefile: `%: .make`: Removed no longer applicable comment, which applied when there were two separate `: %.make`-related rules
4409	09/04/2012 06:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Use $(inDatasrc) wherever its value was used
4408	09/04/2012 06:54 PM	Aaron Marcuse-Kubitza	input.Makefile: Added $(inDatasrc)
4407	09/04/2012 06:40 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Only clean up text columns, to support staging tables with other column types
4406	09/04/2012 06:40 PM	Aaron Marcuse-Kubitza	sql_gen.py: Added is_text_col()
4405	09/04/2012 06:29 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Add table to each column so its type can later be determined from the DB
4404	09/04/2012 06:13 PM	Aaron Marcuse-Kubitza	inputs/NY/verify/specimens.ref: Regenerated from specimens.ref.sql. The counts have changed slightly because this is derived directly from the NY CSV file, rather than from the nybg_raw BIEN2 staging table.
4403	09/04/2012 06:11 PM	Aaron Marcuse-Kubitza	inputs/NY/verify/specimens.ref.sql: Retrofitted to use PostgreSQL instead of MySQL syntax, since this now runs on the PostgreSQL staging tables
4402	09/04/2012 06:09 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: Added `%.ref: %.ref.sql` rule to make datasource's summary statistics from its staging tables. (This was previously run on a MySQL installation of the datasource, and thus limited to MySQL inputs, but we are now able to use the staging tables for this.)
4401	09/04/2012 06:04 PM	Aaron Marcuse-Kubitza	input.Makefile: Verification of import: $(verify): Factored psql command with output format settings into separate $(psqlExport) var
4400	09/04/2012 05:57 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_db_view: Switched join order of location and party (datasource) tables, to facilitate using a nested loop join to fill in the datasource names
4399	09/04/2012 05:55 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: party: Added party_datasource index on just the organizationname to facilitate querying just the datasources
4398	09/04/2012 04:25 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: make_analytical_db(): Removed explicit schema reference so that the function can be redirected to use the current (rotated) schema using the search_path
4397	08/31/2012 08:32 PM	Aaron Marcuse-Kubitza	schemas/Makefile: Removed no longer needed analytical_db, which has been replaced by bin/make_analytical_db
4396	08/31/2012 08:31 PM	Aaron Marcuse-Kubitza	README.TXT: After a new import: Use bin/make_analytical_db instead of `make schemas/analytical_db`, and run it asynchronously because it takes a long time
4395	08/31/2012 08:29 PM	Aaron Marcuse-Kubitza	Added make_analytical_db
4394	08/31/2012 08:22 PM	Aaron Marcuse-Kubitza	schemas/Makefile: Analytical DB: analytical_db: Time the creation of the analytical DB
4393	08/31/2012 08:18 PM	Aaron Marcuse-Kubitza	README.TXT: After a new import: Added command to make the analytical DB
4392	08/31/2012 08:15 PM	Aaron Marcuse-Kubitza	schemas/Makefile: Added analytical_db target
4391	08/31/2012 08:09 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Added make_analytical_db() and helper view analytical_db_view. Note that adding a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing the svn diff to change completely even though the DB structure has only been added to.
4390	08/31/2012 08:05 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: Removed OIDs from tables because we don't use them (tables have primary keys instead)
4389	08/31/2012 02:23 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import. This now includes CTFS.TaxonOccurrence (presence-only observations), FIA (11 million rows!), and Madidi.Organism. The addition of FIA almost doubles the # of rows to 26 million and increases the import time from 9.5 to 11.5 hours.
4388	08/30/2012 04:54 PM	Aaron Marcuse-Kubitza	sql_io.py: null_strs: Added 'UNKNOWN'
4387	08/30/2012 04:02 PM	Aaron Marcuse-Kubitza	Added inputs/FIA/
4386	08/30/2012 12:45 PM	Aaron Marcuse-Kubitza	inputs/: Renamed subfolders to VegCSV names, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Rename-subfolders-to-VegCSV-names>
4385	08/30/2012 12:37 PM	Aaron Marcuse-Kubitza	inputs/Madidi/1.organisms/map.csv: Mapped columns
4384	08/30/2012 11:46 AM	Aaron Marcuse-Kubitza	inputs/Madidi/0.plots/map.csv: Remapped DMS Latitude/Longitude to verbatimLatitude/verbatimLongitude, since this is not the decimalLatitude/decimalLongitude
4383	08/30/2012 11:40 AM	Aaron Marcuse-Kubitza	input.Makefile: Testing: %-ok: Rename the test output to the accepted test output instead of copying it, because outputs of successful (including newly accepted) tests should be removed to reduce clutter (as $(runTest) does)
4382	08/30/2012 11:35 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Remapped CTFS QuadratID to subplot rather than subplotID, because it's only unique within the parent plot, not globally unique, in CTFS
4381	08/30/2012 11:23 AM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated with stats from latest import. This now includes the core CTFS tables.
4380	08/30/2012 11:10 AM	Aaron Marcuse-Kubitza	Added inputs/VegBank/ with DB export
4379	08/30/2012 11:04 AM	Aaron Marcuse-Kubitza	input.Makefile: General targets: `%: %.make`: Don't always remake the target whenever it's visited, as other targets may depend on this file and it should not be remade whenever they are visited
4378	08/30/2012 11:00 AM	Aaron Marcuse-Kubitza	input.Makefile: General targets: `%: %.make`: Changed log file suffix to .log, because this log does not necessarily contain SQL statements
4377	08/30/2012 10:57 AM	Aaron Marcuse-Kubitza	input.Makefile: General targets: `%: %.make`: Time the creating command
4376	08/30/2012 10:55 AM	Aaron Marcuse-Kubitza	input.Makefile: General targets: Removed duplicate `%: %.make` rule
4375	08/30/2012 10:43 AM	Aaron Marcuse-Kubitza	inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
4374	08/30/2012 10:42 AM	Aaron Marcuse-Kubitza	inputs/CTFS/TaxonOccurrence/map.csv: Documented that InfraSpecificLevel is unused
4373	08/30/2012 10:32 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Mapped speciesInvID
4372	08/30/2012 10:27 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added speciesInvID
4371	08/30/2012 10:25 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped taxonOccurrenceID
4370	08/30/2012 10:22 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added taxonOccurrenceID
4369	08/30/2012 10:14 AM	Aaron Marcuse-Kubitza	inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
4368	08/30/2012 10:13 AM	Aaron Marcuse-Kubitza	inputs/CTFS/: Added TaxonOccurrence/ and its joined tables
4367	08/30/2012 10:06 AM	Aaron Marcuse-Kubitza	inputs/CTFS/_archive/Organism.VegX/README.TXT: Added calculation of StemObservation rows distribution for each plot, which indicates that the bci plot actually contains 90% of the StemObservation rows. This brings the size inflation of VegX down to ~6x.
4366	08/30/2012 09:42 AM	Aaron Marcuse-Kubitza	inputs/CTFS/_archive/Organism.VegX/: Added README.TXT describing that this VegX export includes only one of 157 CTFS plots. This is important, because it indicates that VegX creates a ~1000x (!) increase in storage size (613.6 MB for bci.sql with 157 plots vs. 3.78 GB for VegX_CTFS_row_*.xml with 1 plot, assuming roughly equal #s of stems per plot).
4365	08/30/2012 09:08 AM	Aaron Marcuse-Kubitza	inputs/CTFS/StemObservation/map.csv: Remapped StemID to authorStemCode since it's only unique within the parent organism (Tree), not a globally unique ID as is required for stemID
4364	08/30/2012 09:05 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped authorStemCode
4363	08/30/2012 08:58 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added authorStemCode
4362	08/30/2012 08:58 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped stemID
4361	08/30/2012 08:52 AM	Aaron Marcuse-Kubitza	inputs/SALVIAS/2.stems/map.csv: Mapped stem_id
4360	08/30/2012 08:46 AM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Added steps to install any MySQL export
4359	08/30/2012 08:13 AM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: Mapped stemID
4358	08/30/2012 08:10 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Mapped stem_id
4357	08/30/2012 08:05 AM	Aaron Marcuse-Kubitza	repl: Support treating all patterns as plain text (non-regexp)
4356	08/30/2012 07:52 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added stem_id
4355	08/30/2012 07:51 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added stemID
4354	08/30/2012 07:44 AM	Aaron Marcuse-Kubitza	mappings/Veg+-VegCore.csv: Mapped speciesName, subSpeciesName
4353	08/30/2012 07:43 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Added CTFS taxonomic name columns
4352	08/30/2012 07:28 AM	Aaron Marcuse-Kubitza	mappings/Veg+.terms.csv: Removed comments not applicable to the term itself

Project

General

Profile

root @ 4451