/trunk - Changes - BIEN 3 - NCEAS Projects

root/trunk @ 13125

svn:ignore: extern

#	Date	Author	Comment
13125	04/14/2014 03:23 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
13124	04/14/2014 03:22 PM	Aaron Marcuse-Kubitza	inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
13123	04/10/2014 04:06 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
13122	04/10/2014 03:52 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
13121	04/10/2014 03:50 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
13120	04/10/2014 03:41 PM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: specimens*_of_unique_verb_subsp_taxa_with_author: include only names with subspecies (filtering by taxonverbatim.subspecies rather than taxonlabel.taxonomicname)
13119	04/10/2014 03:13 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set after opening the `screen` shell because array vars are apparently not inherited by the `screen` shell
13118	04/10/2014 02:42 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name
13117	04/10/2014 02:41 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added instructions for importing just a subset of the datasources
13116	04/10/2014 02:38 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: local_array/export_array: do need -a because that it's an array is apparently not autodetected by the () on Mac
13115	04/10/2014 02:24 PM	Aaron Marcuse-Kubitza	mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
13114	04/10/2014 02:05 PM	Aaron Marcuse-Kubitza	bugfix: web/.phpPgAdmin/.htaccess: work around phpPgAdmin bug that causes page to be ignored when not logged in
13113	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
13112	04/10/2014 01:25 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
13111	04/10/2014 01:17 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
13110	04/10/2014 12:59 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
13109	04/10/2014 12:57 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
13108	04/10/2014 12:35 PM	Aaron Marcuse-Kubitza	web/.phpPgAdmin/.htaccess: extract path components 1st->last: documented that can't use subject param for this because that goes to the last selected tab, not the default (leftmost) tab
13107	04/10/2014 12:03 PM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: specimens_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on _of_unique_verb_subsp_taxa_with_author instead)
13106	04/10/2014 11:41 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonverbatim: added subspecies, as decided in the conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema-2)
13105	04/10/2014 06:54 AM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: plots* with duplicated rows: removed duplicated rows
13104	04/10/2014 06:45 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: specimens*: ran through pipeline
13103	04/10/2014 06:38 AM	Aaron Marcuse-Kubitza	removed old version validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.sql. use validation/aggregating/plots/SALVIAS/_archive/bien3_validations_salvias_db_original.sql instead.
13102	04/10/2014 06:19 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
13101	04/10/2014 06:17 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
13100	04/10/2014 06:07 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: _specimens_16_list_distinct_specimen_descriptions: re-ran through pipeline after removing duplicated rows
13099	04/10/2014 06:02 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: rm_output_queries(): also support removing just a particular output query
13098	04/10/2014 05:26 AM	Aaron Marcuse-Kubitza	bugfix: schemas/util.sql: remake_diff_table(): need to rm_freq() type_table, because left/right_table don't have freq yet
13097	04/10/2014 05:18 AM	Aaron Marcuse-Kubitza	schemas/util.sql: auto_rm_freq(): use new rm_freq()
13096	04/10/2014 05:17 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added rm_freq(regclass[])
13095	04/10/2014 03:45 AM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
13094	04/10/2014 03:33 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: ran through pipeline
13093	04/10/2014 03:31 AM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: use same column names as input query
13092	04/10/2014 03:10 AM	Aaron Marcuse-Kubitza	schemas/util.sql: remake_diff_table(): result table comment: documented how to display NULL values that are extra or missing
13091	04/10/2014 02:40 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: ran through pipeline
13090	04/10/2014 02:38 AM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: also need to convert to text in GROUP BY/ORDER BY
13089	04/10/2014 02:34 AM	Aaron Marcuse-Kubitza	bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
13088	04/10/2014 02:32 AM	Aaron Marcuse-Kubitza	_license/UCSB/LICENSE.TXT: use (c) verbatim from the e-mail, not as displayed as © by Thunderbird
13087	04/10/2014 02:07 AM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
13086	04/09/2014 08:19 PM	Aaron Marcuse-Kubitza	fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
13085	04/09/2014 06:23 PM	Aaron Marcuse-Kubitza	validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query results, not the queries themselves.
13084	04/09/2014 02:55 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline
13083	04/09/2014 02:48 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL
13082	04/09/2014 02:46 PM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name
13081	04/09/2014 02:40 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql
13080	04/09/2014 02:25 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB
13079	04/09/2014 02:55 AM	Aaron Marcuse-Kubitza	lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression
13078	04/09/2014 02:54 AM	Aaron Marcuse-Kubitza	lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files
13077	04/09/2014 02:52 AM	Aaron Marcuse-Kubitza	bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files
13076	04/09/2014 02:50 AM	Aaron Marcuse-Kubitza	bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)
13075	04/08/2014 03:49 PM	Aaron Marcuse-Kubitza	fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
13074	04/08/2014 02:55 PM	Aaron Marcuse-Kubitza	validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text
13073	04/08/2014 02:52 PM	Aaron Marcuse-Kubitza	validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well
13072	04/08/2014 02:50 PM	Aaron Marcuse-Kubitza	validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark
13071	04/08/2014 01:52 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.
13070	04/08/2014 01:40 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
13069	04/08/2014 01:38 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.
13068	04/08/2014 01:19 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
13067	04/08/2014 12:49 PM	Aaron Marcuse-Kubitza	inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
13066	04/07/2014 06:21 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables
13065	04/07/2014 06:19 PM	Aaron Marcuse-Kubitza	inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
13064	04/07/2014 06:09 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
13063	04/07/2014 06:07 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: public_validations: added specimens queries to pipeline
13062	04/07/2014 05:51 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource
13061	04/07/2014 05:35 PM	Aaron Marcuse-Kubitza	validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()
13060	04/07/2014 05:17 PM	Aaron Marcuse-Kubitza	fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure both correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)
13059	04/07/2014 04:35 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)
13058	04/07/2014 01:43 PM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit
13057	04/07/2014 10:39 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: import_vars: documented that vars already set will not be overwritten
13056	04/07/2014 09:47 AM	Aaron Marcuse-Kubitza	inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
13055	04/04/2014 06:13 PM	Aaron Marcuse-Kubitza	added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
13054	04/03/2014 07:31 PM	Aaron Marcuse-Kubitza	added validation/aggregating/pipeline/validations_on_sparse_datasources.odg
13053	04/03/2014 04:13 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
13052	04/03/2014 04:09 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together
13051	04/03/2014 03:40 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
13050	04/03/2014 03:39 PM	Aaron Marcuse-Kubitza	planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)
13049	04/03/2014 08:53 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long
13048	04/03/2014 08:35 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count
13047	04/03/2014 08:04 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace
13046	04/03/2014 07:32 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace
13045	04/03/2014 07:31 AM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long
13044	04/02/2014 05:55 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions
13043	04/02/2014 05:36 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.
13042	04/02/2014 05:21 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
13041	04/02/2014 05:16 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_species_excluding_author: renamed to _species_binomials for clarity
13040	04/02/2014 05:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
13039	04/02/2014 05:01 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution
13038	04/02/2014 04:38 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
13037	04/02/2014 04:09 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _of_verbatim_subspecific_taxa_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13036	04/02/2014 04:03 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author
13035	04/02/2014 03:54 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _verbatim_species_without_author, etc.: renamed to _with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
13034	04/02/2014 03:14 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries
13033	04/02/2014 03:13 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
13032	04/02/2014 03:05 PM	Aaron Marcuse-Kubitza	validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
13031	04/02/2014 11:17 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
13030	04/02/2014 10:56 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
13029	04/02/2014 10:45 AM	Aaron Marcuse-Kubitza	fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
13028	04/02/2014 09:01 AM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
13027	04/01/2014 05:40 PM	Aaron Marcuse-Kubitza	/README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
13026	04/01/2014 04:24 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space

Project

General

Profile