/ - Changes - BIEN 3 - NCEAS Projects

root @ 8341

#	Date	Author	Comment
8341	04/09/2013 03:25 AM	Aaron Marcuse-Kubitza	schemas/VegCore/VegCore.ERD.mwb: Populated legend
8340	04/05/2013 12:23 AM	Aaron Marcuse-Kubitza	schemas/VegCore/VegCore.ERD.mwb: Added exports
8339	04/05/2013 12:23 AM	Aaron Marcuse-Kubitza	schemas/VegCore/VegCore.ERD.mwb: Fixed lines and settings for the Linux MySQL Workbench
8338	04/04/2013 10:01 PM	Aaron Marcuse-Kubitza	schemas/VegCore/VegCore.ERD.mwb: Added table colors
8337	04/04/2013 10:01 PM	Aaron Marcuse-Kubitza	Removed backup file schemas/VegCore/VegCore.ERD.mwb.bak
8336	04/04/2013 09:48 PM	Aaron Marcuse-Kubitza	Added schemas/VegCore/VegCore.ERD.mwb, VegCore.my.sql with first VegCore ERD and MySQL schema. All tables are in the ERD, but contain only pkey and fkey columns.
8335	04/04/2013 09:52 AM	Aaron Marcuse-Kubitza	lib/sql.py: mk_select(): using subset function: Turn off enable_sort (within the transaction) to avoid unwanted slow sorts. This change (along with the subset functions themselves) should significantly reduce the long FIA.occurrence_all table subset time (~8 hours altogether) and with it the total import time, which had more than doubled as a result of the FIA refresh. Note that this issue would have been even more pronounced for larger datasets, such as the GBIF refresh, which would have taken ~2.5 days longer (400 million rows * ~30% are plants * (FIA: ~8 hours/16.7 million rows) * 1 day/24 hours).
8334	04/04/2013 09:30 AM	Aaron Marcuse-Kubitza	lib/sql.py: mk_select(): Use subset function when it's available for fast querying at large OFFSET values
8333	04/04/2013 09:29 AM	Aaron Marcuse-Kubitza	lib/sql.py: Added has_subset_func()
8332	04/04/2013 08:48 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Run mk_subset_by_row_num_func() to make the subset functions available for fast querying at large OFFSET values
8331	04/04/2013 08:43 AM	Aaron Marcuse-Kubitza	schemas/util.sql: mk_subset_by_row_num_func(): regular subset function: Fixed bug where need to add 1 to the 0-based offset_ to get the 1-based row_num (which is usually a serial column)
8330	04/04/2013 08:38 AM	Aaron Marcuse-Kubitza	schemas/util.sql: mk_subset_by_row_num_func(): regular subset function: Fixed bug where need to subtract 1 from the end row_num because BETWEEN limits are inclusive of the bounds
8329	04/04/2013 08:33 AM	Aaron Marcuse-Kubitza	schemas/util.sql: mk_subset_by_row_num_func(): regular subset function: Fixed bug where also need to COALESCE offset_ to 0 when it's added to the limit_
8328	04/04/2013 08:20 AM	Aaron Marcuse-Kubitza	schemas/util.sql: mk_subset_by_row_num_func(): subset function which turns off enable_sort: Fixed bug where need to pass ($2, $3) instead of ($1, $2) to the regular subset function
8327	04/04/2013 08:14 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Added occurrence_all-row_num column for use with mk_subset_by_row_num_func()
8326	04/04/2013 08:12 AM	Aaron Marcuse-Kubitza	schemas/util.sql: mk_subset_by_row_num_func(): Also create subset function which turns off enable_sort. This is used for limit values greater than ~100,000 to avoid unwanted slow sorts. The regular subset function is still needed to work with EXPLAIN, so that it produces expanded output instead of just a function scan.
8325	04/04/2013 07:27 AM	Aaron Marcuse-Kubitza	schemas/util.sql: Added mk_subset_by_row_num_func()
8324	04/04/2013 07:10 AM	Aaron Marcuse-Kubitza	schemas/util.sql: Added type_qual_name()
8323	04/04/2013 06:33 AM	Aaron Marcuse-Kubitza	schemas/util.sql: force_update_view(): Fixed bug where also need to drop view for "cannot change name of view column" errors
8322	04/04/2013 05:24 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Use new force_update_view(), which only drops the view if its columns have changed and otherwise just uses CREATE OR REPLACE VIEW, rather than always first running DROP VIEW IF EXISTS
8321	04/04/2013 05:20 AM	Aaron Marcuse-Kubitza	schemas/util.sql: Added force_update_view()
8320	04/04/2013 04:23 AM	Aaron Marcuse-Kubitza	bin/make_analytical_db: Commented out export_analytical_db because we are not yet using the analytical DB in MySQL, and it doesn't make sense to generate a large, unused CSV export each time
8319	04/04/2013 04:19 AM	Aaron Marcuse-Kubitza	bin/export_analytical_db: Replaced analytical_aggregate with analytical_stem
8318	04/04/2013 03:53 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/: Updated header.csv for new column order
8317	04/04/2013 03:40 AM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Use directional joins (LEFT/RIGHT JOIN) instead of inner joins to ensure that the PostgreSQL query planner always joins starting with the TREE table. Note that the directional joins are now needed for a different reason than when they were initially added, which had been to avoid slow sorts. The sorts (at least for LIMIT-only queries) went away when small tables such as COUNTY and REF_UNIT were added to the joins.
8316	04/04/2013 01:16 AM	Aaron Marcuse-Kubitza	inputs/FIA/*/map.csv: Changed newlines between table and field name to - because the newlines mess up the flow of queries and also break pgAdmin's display of EXPLAIN output. The - was chosen because it's a non-whitespace character that linewraps in browsers, phpPgAdmin, and Google spreadsheets (although unfortunately not in pgAdmin). It is better than space because you can set a text editor to treat it as a word character, allowing the entire column name (<table>-<field>) to be selected by double-clicking it.
8315	04/03/2013 09:55 PM	Aaron Marcuse-Kubitza	Added planning/workflow/normalized_vs_denormalized/denormalized.generic_standardizations.png (a slide from Brad's bien3_architecture_denormalized.pptx PowerPoint), which shows the staging table preprocessing particularly well
8314	04/03/2013 09:45 PM	Aaron Marcuse-Kubitza	README.TXT: Full database import: record the import times in inputs/import.stats.xls: Added instructions for what to do if the rightmost imports start getting truncated due to the 255-column limit in spreadsheets. (This will occur in 8 imports.)
8313	04/03/2013 09:32 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Removed the previous imports from the current tab because they are also in the 2012-6~9 tab, and should not be in two places
8312	04/03/2013 09:28 PM	Aaron Marcuse-Kubitza	inputs/import.stats.xls: Updated import times. MO and FIA have been refreshed.
8311	04/02/2013 04:17 PM	Aaron Marcuse-Kubitza	Removed no longer needed inputs/GBIF/import. Use ./run instead.
8310	04/02/2013 04:17 PM	Aaron Marcuse-Kubitza	Removed no longer needed inputs/GBIF/_MySQL/import. Use ./run instead.
8309	04/02/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/run: import: Run make directly instead of via ./import
8308	04/02/2013 04:15 PM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/run: Use new import.run, which defines all()
8307	04/02/2013 04:06 PM	Aaron Marcuse-Kubitza	Added planning/workflow/normalized_vs_denormalized/bien3_architecture_(de)normalized.pptx
8306	04/02/2013 03:57 PM	Aaron Marcuse-Kubitza	Added planning/workflow/normalized_vs_denormalized/BIEN-modArch-Dec2010 NS-SBD 1.4.ppt.url
8305	04/02/2013 03:50 PM	Aaron Marcuse-Kubitza	planning/workflow/: Moved normalized vs. denormalized files to separate normalized_vs_denormalized/ subfolder
8304	04/02/2013 03:21 PM	Aaron Marcuse-Kubitza	Regenerated inputs/ACAD/Specimen/logs/steps.by_col.log.sql
8303	04/02/2013 03:15 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record/run: Override MySQL_export() so $filter can be customized
8302	04/02/2013 03:13 PM	Aaron Marcuse-Kubitza	inputs/GBIF/table.run: import(): Updated for lib/table.run template changes
8301	04/02/2013 03:09 PM	Aaron Marcuse-Kubitza	lib/table.run: template: import(): Also pass "$@" to superclass method
8300	04/02/2013 03:08 PM	Aaron Marcuse-Kubitza	lib/table.run: template: Use "$FUNCNAME" instead of hardcoding import
8299	04/02/2013 03:02 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/MySQL_export, used by ./table.run
8298	04/02/2013 02:57 PM	Aaron Marcuse-Kubitza	lib/util.run: echo_func: Fixed bug where need to use BASH_LINENO⁰ for the line #s to match up with the files. For some reason the required array indexes for BASH_SOURCE (1) and BASH_LINENO (0) differ by one.
8297	04/02/2013 02:51 PM	Aaron Marcuse-Kubitza	inputs/GBIF/run: Use new import.run, which defines all()
8296	04/02/2013 02:51 PM	Aaron Marcuse-Kubitza	lib/table.run: Use new import.run, which defines all()
8295	04/02/2013 02:49 PM	Aaron Marcuse-Kubitza	Added lib/import.run
8294	04/02/2013 02:48 PM	Aaron Marcuse-Kubitza	lib/util.run: echo_func: Include the line # of the function to make it easier to find where the code being run is
8293	04/02/2013 02:32 PM	Aaron Marcuse-Kubitza	lib/table.run: Added all (default target)
8292	04/02/2013 02:26 PM	Aaron Marcuse-Kubitza	lib/util.run: run_cmd: If bash exited with an error, don't run the "$@" command. This test is necessary because `trap run_cmd EXIT` will run run_cmd as the result of any exit from the shell, including an error.
8291	04/02/2013 02:21 PM	Aaron Marcuse-Kubitza	*run: Use -e option to bash on the #! line instead of separate `set -o errexit` line so that there is no issue with the `set -o errexit` line getting separated from the #! line (errexit is required for the scripts to work properly)
8290	04/02/2013 02:09 PM	Aaron Marcuse-Kubitza	lib/util.run: run_cmd: When no command specified, default to running the `all` command, just like make
8289	04/02/2013 02:07 PM	Aaron Marcuse-Kubitza	lib/util.run: Run run_cmd at shell exit (using trap) instead of requiring every runscript to have `run_cmd ` at the end of it
8288	04/02/2013 01:49 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/run
8287	04/02/2013 01:48 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/raw_occurrence_record/run
8286	04/02/2013 01:47 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/table.run
8285	04/02/2013 01:45 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/_MySQL/run
8284	04/02/2013 01:42 PM	Aaron Marcuse-Kubitza	lib/util.run: fwd: Check that $subdirs is defined. Added $subdirs to usage.
8283	04/02/2013 01:39 PM	Aaron Marcuse-Kubitza	lib/util.run: fwd: Added usage
8282	04/02/2013 01:32 PM	Aaron Marcuse-Kubitza	lib/table.run: Switched from echo_run to echo_func
8281	04/02/2013 01:16 PM	Aaron Marcuse-Kubitza	lib/util.run: run_cmd: Echo the command being run, including the top-level run script. This is in addition to the echoing of the command in the function itself (using echo_func), which provides both the runscript that was run and the file where the invoked command was actually located (which may be different due to includes).
8280	04/02/2013 01:12 PM	Aaron Marcuse-Kubitza	lib/util.run: Echo the command at the beginning of each function using new echo_func, instead of having to type echo_run before every call to a function. Note that because echo_func uses BASH_SOURCE, the path to the file containing the function will be included in the debug message, which greatly facilitates locating which file a command is in.
8279	04/02/2013 01:08 PM	Aaron Marcuse-Kubitza	lib/util.run: Added echo_func
8278	04/02/2013 12:50 PM	Aaron Marcuse-Kubitza	lib/util.run: Added echo_cmd and use it in echo_run
8277	04/02/2013 12:46 PM	Aaron Marcuse-Kubitza	lib/util.run: echo_cmd(): Renamed to echo_run for clarity, because it also runs the command
8276	04/02/2013 12:39 PM	Aaron Marcuse-Kubitza	lib/util.run: Added inline_make()
8275	04/02/2013 12:39 PM	Aaron Marcuse-Kubitza	lib/util.run: Added echo_stdin()
8274	04/02/2013 12:30 PM	Aaron Marcuse-Kubitza	bin/my2pg_export: Put --password first because it's an authentication-related option
8273	04/02/2013 10:52 AM	Aaron Marcuse-Kubitza	Added lib/table.run, which includes the commands in import.sh but uses run scripts to allow running commands other than just import. (For example, map_table or postprocess can be run separately. Uninstall-related commands which would not belong in an import script can also be added, because import is only one of many commands a run script can offer.)
8272	04/02/2013 10:35 AM	Aaron Marcuse-Kubitza	Added lib/util.run with general functions and template for run scripts (a bash-based replacement for make). Unlike make, run scripts support full bash functionality including multiline commands. The run script template also includes syntax for various kinds of relative includes in bash.
8271	04/02/2013 12:03 AM	Aaron Marcuse-Kubitza	lib/common.Makefile: Added $(require_var)
8270	04/01/2013 10:42 PM	Aaron Marcuse-Kubitza	bin/publish_analytical_db: Fixed bug where need to remove `ESCAPED BY '"'` because this would causing " followed by an escape sequence char to be interpreted specially (e.g. "n -> \n). MySQL automatically takes care of quote doubling when you specify `FIELDS OPTIONALLY ENCLOSED BY`.
8269	04/01/2013 10:13 PM	Aaron Marcuse-Kubitza	lib/common.Makefile: Compression: Added `%:: .gz`, `.gz: %`
8268	04/01/2013 08:07 PM	Aaron Marcuse-Kubitza	planning/workflow/import_process_comparison.odg: Moved "staging tables" under the method labels to reduce empty space
8267	04/01/2013 07:52 PM	Aaron Marcuse-Kubitza	planning/workflow/import_process_comparison.odg: Removed margins so the labels would align with the page margin on the Import process wiki page <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Import_process>
8266	04/01/2013 07:32 PM	Aaron Marcuse-Kubitza	Added planning/workflow/import_process_comparison.odg and .png export
8265	04/01/2013 06:12 PM	Aaron Marcuse-Kubitza	lib/db_xml.py: put_table(): Fixed bug where command to advance start to fetch next set was unintentionally deleted when removing the is_view check
8264	04/01/2013 06:11 PM	Aaron Marcuse-Kubitza	inputs/UNCC/Specimen/new_terms.csv: Updated for updated VegCore vocab
8263	04/01/2013 03:53 PM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql.md5: Regenerated after appending agent table to GBIFPortalDB-2013-02-20.data.sql
8262	04/01/2013 03:51 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql.gz.md5
8261	03/28/2013 08:16 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/raw_occurrence_record/ from refresh
8260	03/28/2013 08:07 PM	Aaron Marcuse-Kubitza	inputs/GBIF/MySQL.schema.sql: Regenerated with inline enum type translated to CHECK constraint
8259	03/28/2013 08:07 PM	Aaron Marcuse-Kubitza	bin/my2pg: Translate inline enum type to CHECK constraint
8258	03/28/2013 07:43 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/**/MySQL.schema.sql
8257	03/28/2013 07:42 PM	Aaron Marcuse-Kubitza	Added inputs/GBIF/_MySQL/MySQL.*.sql.make
8256	03/28/2013 07:36 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Archived no longer used subdirs from BIEN2 export
8255	03/28/2013 07:29 PM	Aaron Marcuse-Kubitza	inputs/FIA/: Archived no longer used subdirs from BIEN2 export
8254	03/28/2013 07:22 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: SVN: add: Removed Source/map.csv prerequisite because it is not related to adding unversioned files in the dir. It was originally a prerequisite in order to auto-create it when the datasource dir is first created, but the map.csv recipe does not currently create metadata-only map.csvs. In the future, metadata-only map.csvs will be replaced with constant columns added to the applicable tables.
8253	03/28/2013 07:19 PM	Aaron Marcuse-Kubitza	Added inputs/FIA/_archive
8252	03/28/2013 07:19 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: %/map.csv: Fixed bug where can only make header.csv if map.csv does not exist, because some subdirs are metadata-only and don't have a corresponding DB table
8251	03/28/2013 07:02 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Install the staging tables: For a MySQL .sql export: Documented which password to use at each of the two password prompts my2pg_export will give you. You could also embed the value of the 2nd prompt in the _MySQL/*.make file using `--password="$(cat path/to/config/bien_password)"`.
8250	03/28/2013 06:56 PM	Aaron Marcuse-Kubitza	README.TXT: Datasource setup: Install the staging tables: Removed requirement that `make inputs/<datasrc>/reinstall quiet=1 &` be run on vegbiendev for MySQL .sql exports, because the hostname is now set to vegbiendev instead of localhost
8249	03/28/2013 06:38 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: sql/install: Use psql_script_vegbien instead of $(psqlNoSearchPath) (which uses psql_verbose_vegbien) because the insert statement for each data row should not be echoed
8248	03/28/2013 06:14 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/import: Run remake_VegBIEN_mappings at end to keep mappings to next stage of import process up to date
8247	03/28/2013 06:14 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/: Accepted new test output
8246	03/28/2013 06:13 PM	Aaron Marcuse-Kubitza	lib/import.sh: remake_VegBIEN_mappings(): Also remake VegBIEN.csv and test.xml.ref use `make test`
8245	03/28/2013 06:11 PM	Aaron Marcuse-Kubitza	lib/import.sh: Added remake_VegBIEN_mappings()
8244	03/28/2013 06:10 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: %/map.csv: make $*/header.csv first in case it doesn't exist (e.g. if it has been deleted so that it will be remade)
8243	03/28/2013 06:07 PM	Aaron Marcuse-Kubitza	inputs/FIA/occurrence_all/map.csv: Regenerated using new input table mappings
8242	03/28/2013 05:47 PM	Aaron Marcuse-Kubitza	lib/import.sh: Added make() and use it instead of the full make command

Project

General

Profile