Activity
From 03/17/2014 to 04/15/2014
04/15/2014
- 08:12 PM Revision 13142: bugfix: schemas/vegbien.sql: rm_output_queries(): need to account for the fact that util.truncated_prefixed_name_regexp() returns a whole-string regexp. this drops support for removing output queries with a particular group prefix, which we no longer use.
- 07:59 PM Revision 13141: bugfix: schemas/vegbien.sql: rm_output_queries(): need to include relations whose names were truncated, as well
- 07:14 PM Revision 13140: fix: schemas/vegbien.sql: public_validations schema comment: to remove a validations query so its columns can be changed: use rm_output_queries() rather than rm_query_view() because that also removes input queries
- 07:00 PM Revision 13139: bugfix: schemas/util.sql: is_castable(): need to pass NULL through, for proper NULL propagation
- 06:52 PM Revision 13138: fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric
- 06:29 PM Revision 13137: inputs/NY/validations.-.util.sql: added util.is_castable() wrapper
- 06:12 PM Revision 13136: schemas/util.sql: added is_castable()
- 06:10 PM Revision 13135: schemas/util.sql: added try_cast()
- 05:51 PM Revision 13134: schemas/util.sql: added util.cast(), which allows casting to an arbitrary type without eval()
04/14/2014
- 05:04 PM Revision 13133: bugfix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: DISTINCT: added coordsaccuracy_m
- 05:02 PM Revision 13132: bugfix: schemas/vegbien.sql: coordinates_unique: added coordsaccuracy_m
- 04:56 PM Revision 13131: fix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because the coordinates_unique unique constraint includes other columns as well, so there may be multiple instances of each lat/long
- 04:51 PM Revision 13130: bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
- 04:48 PM Revision 13129: fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
- 04:24 PM Revision 13128: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: integrated row labels into table
- 04:04 PM Revision 13127: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: added line for different rows (vs. missing/extra)
- 03:58 PM Revision 13126: inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
- 03:23 PM Revision 13125: inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
- 03:22 PM Revision 13124: inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
04/11/2014
- 04:02 PM Task #887 (Rejected): fix disk space leak that fills the disk and crashes the import
- _the bug that triggers this Postgres bug (#902) has now been fixed, so no need to fix this_
h3. issue
* in the ...
04/10/2014
- 04:06 PM Revision 13123: fix: inputs/NY/validations.sql: _specimens_*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
- 03:52 PM Revision 13122: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
- 03:50 PM Revision 13121: validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
- 03:41 PM Revision 13120: fix: schemas/vegbien.sql: _specimens_*_of_unique_verb_subsp_taxa_with_author: include only names with subspecies (filtering by taxonverbatim.subspecies rather than taxonlabel.taxonomicname)
- 03:13 PM Revision 13119: bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set *after* opening the `screen` shell because array vars are apparently *not* inherited by the `screen` shell
- 02:42 PM Revision 13118: /README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name
- 02:41 PM Revision 13117: /README.TXT: Full database import: added instructions for importing just a subset of the datasources
- 02:38 PM Revision 13116: bugfix: lib/sh/util.sh: local_array/export_array: *do* need -a because that it's an array is apparently *not* autodetected by the () on Mac
- 02:24 PM Revision 13115: mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
- 02:05 PM Revision 13114: bugfix: web/.phpPgAdmin/.htaccess: work around phpPgAdmin bug that causes page to be ignored when not logged in
- 01:25 PM Revision 13113: fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
- 01:25 PM Revision 13112: fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
- 01:17 PM Revision 13111: fix: inputs/NY/validations.sql: _specimens_*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
- 12:59 PM Revision 13110: bugfix: inputs/NY/validations.sql: _specimens_*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
- 12:57 PM Revision 13109: bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
- 12:35 PM Revision 13108: web/.phpPgAdmin/.htaccess: extract path components 1st->last: documented that can't use subject param for this because that goes to the last selected tab, not the default (leftmost) tab
- 12:03 PM Revision 13107: bugfix: inputs/NY/validations.sql: _specimens_*_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on *_of_unique_verb_subsp_taxa_with_author instead)
- 11:41 AM Revision 13106: schemas/vegbien.sql: taxonverbatim: added subspecies, as decided in the conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema-2)
- 06:54 AM Revision 13105: fix: schemas/vegbien.sql: _plots_* with duplicated rows: removed duplicated rows
- 06:45 AM Revision 13104: schemas/vegbien.sql: _specimens_*: ran through pipeline
- 06:38 AM Revision 13103: removed old version validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.sql. use validation/aggregating/plots/SALVIAS/_archive/bien3_validations_salvias_db_original.sql instead.
- 06:19 AM Revision 13102: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
- 06:17 AM Revision 13101: validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
- 06:07 AM Revision 13100: schemas/vegbien.sql: _specimens_16_list_distinct_specimen_descriptions: re-ran through pipeline after removing duplicated rows
- 06:02 AM Revision 13099: schemas/vegbien.sql: rm_output_queries(): also support removing just a particular output query
- 05:26 AM Revision 13098: bugfix: schemas/util.sql: remake_diff_table(): need to rm_freq() type_table, because left/right_table don't have freq yet
- 05:18 AM Revision 13097: schemas/util.sql: auto_rm_freq(): use new rm_freq()
- 05:17 AM Revision 13096: schemas/util.sql: added rm_freq(regclass[])
- 03:45 AM Revision 13095: fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
- 03:33 AM Revision 13094: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: ran through pipeline
- 03:31 AM Revision 13093: fix: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: use same column names as input query
- 03:24 AM Task #345 (Resolved): integrate GNRS into VegBIEN
- see "biengeo":http://vegbiendev.nceas.ucsb.edu/fs/derived/biengeo/
- 03:21 AM Task #326 (Rejected): generic MOU template to request data
- making the database public instead
- 03:19 AM Task #485: track data provider's citation requirements in VegBIEN
- the [[Datasource conditions of use|conditions of use]] have been gathered
- 03:10 AM Revision 13092: schemas/util.sql: remake_diff_table(): result table comment: documented how to display NULL values that are extra or missing
- 02:40 AM Revision 13091: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: ran through pipeline
- 02:38 AM Revision 13090: fix: schemas/vegbien.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: also need to convert to text in GROUP BY/ORDER BY
- 02:34 AM Revision 13089: bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
- 02:32 AM Revision 13088: _license/UCSB/LICENSE.TXT: use (c) verbatim from the e-mail, not as displayed as © by Thunderbird
- 02:07 AM Revision 13087: bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
04/09/2014
- 08:19 PM Revision 13086: fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
- 06:23 PM Revision 13085: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query *results*, not the queries themselves.
- 02:55 PM Revision 13084: schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline
- 02:48 PM Revision 13083: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL
- 02:46 PM Revision 13082: bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name
- 02:40 PM Revision 13081: schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql
- 02:25 PM Revision 13080: validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB
- 02:55 AM Revision 13079: lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression
- 02:54 AM Revision 13078: lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files
- 02:52 AM Revision 13077: bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files
- 02:50 AM Revision 13076: bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)
04/08/2014
- 03:49 PM Revision 13075: fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
- 02:55 PM Revision 13074: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text
- 02:52 PM Revision 13073: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well
- 02:50 PM Revision 13072: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark
- 01:52 PM Revision 13071: schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.
- 01:40 PM Revision 13070: inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
- 01:38 PM Revision 13069: schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.
- 01:19 PM Revision 13068: inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
- 12:49 PM Revision 13067: inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
04/07/2014
- 06:21 PM Revision 13066: schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables
- 06:19 PM Revision 13065: inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
- 06:09 PM Revision 13064: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
- 06:07 PM Revision 13063: schemas/vegbien.sql: public_validations: added specimens queries to pipeline
- 05:51 PM Revision 13062: validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource
- 05:35 PM Revision 13061: validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()
- 05:17 PM Revision 13060: fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure *both* correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)
- 04:35 PM Revision 13059: validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)
- 01:43 PM Revision 13058: bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit
- 12:13 PM Task #886 (New): move test DB to vegbiendev VM
- * avoids needing to maintain a separate testing machine for the purposes of using the test DB
* helps remove depende... - 10:39 AM Revision 13057: lib/sh/util.sh: import_vars: documented that vars already set will *not* be overwritten
- 09:47 AM Revision 13056: inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
04/04/2014
04/03/2014
- 07:31 PM Revision 13054: added validation/aggregating/pipeline/validations_on_sparse_datasources.odg
- 04:13 PM Revision 13053: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
- 04:09 PM Revision 13052: planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together
- 03:40 PM Revision 13051: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
- 03:39 PM Revision 13050: planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)
- 08:53 AM Revision 13049: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long
- 08:35 AM Revision 13048: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count
- 08:04 AM Revision 13047: validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace
- 07:32 AM Revision 13046: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace
- 07:31 AM Revision 13045: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long
04/02/2014
- 05:55 PM Revision 13044: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions
- 05:36 PM Revision 13043: validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.
- 05:21 PM Revision 13042: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
- 05:16 PM Revision 13041: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity
- 05:14 PM Revision 13040: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
- 05:01 PM Revision 13039: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution
- 04:38 PM Revision 13038: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
- 04:09 PM Revision 13037: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
- 04:03 PM Revision 13036: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author
- 03:54 PM Revision 13035: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
- 03:32 PM Task #884 (Rejected): fix Postgres bug that causes query planner to use seq scans and slow sorts instead of index scans in the import
- h3. issue
* see the following @pg_stat_activity@ snapshots (note the @EXPLAIN@ output below each query):... - 03:14 PM Revision 13034: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries
- 03:13 PM Revision 13033: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
- 03:05 PM Revision 13032: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
- 11:17 AM Revision 13031: /README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
- 10:56 AM Revision 13030: /README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
- 10:45 AM Revision 13029: fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
- 10:10 AM Task #882 (Rejected): add limit on the # of parallel import processes
- it turns out this would not fix the problem, because it occurs even when only a few datasources are running
- 10:07 AM Task #883: have import scripts regularly check disk space and pause processes if getting close to limit
- merging info in #882, so that this info is not maintained in two places
- 09:01 AM Revision 13028: /README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
04/01/2014
- 05:40 PM Revision 13027: /README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
- 05:02 PM Task #883 (Rejected): have import scripts regularly check disk space and pause processes if getting close to limit
- h3. issue
* there is no soft limit on disk space inside Postgres, so the hard limit gets reached instead, causing ... - 04:24 PM Revision 13026: /README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
- 04:13 PM Revision 13025: /README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
- 04:11 PM Task #882 (Rejected): add limit on the # of parallel import processes
- see description of problem in #883
- 04:04 PM Revision 13024: /README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
- 03:43 PM Revision 13023: /README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
- 03:37 PM Revision 13022: fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
- 02:04 PM Revision 13021: /README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
- 01:36 PM Revision 13020: /README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
- 01:31 PM Revision 13019: fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish
- 01:29 PM Revision 13018: lib/common.Makefile: added $(nice) and use it everywhere its definition is used
- 01:14 PM Revision 13017: /README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
- 12:47 PM Revision 13016: /README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
- 12:45 PM Revision 13015: /Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make
- 12:43 PM Revision 13014: fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run
- 12:42 PM Revision 13013: /Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step
- 12:30 PM Revision 13012: bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))
- 12:15 PM Revision 13011: /Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target
- 11:40 AM Revision 13010: /README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
- 11:37 AM Revision 13009: /README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
- 11:26 AM Revision 13008: backups/TNRS.backup.md5: updated
- 11:23 AM Revision 13007: /README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
- 11:08 AM Revision 13006: /README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
03/30/2014
- 07:54 PM Revision 13005: fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
- 07:53 PM Revision 13004: fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used
- 07:52 PM Revision 13003: lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)
- 06:52 PM Revision 13002: /README.TXT: Full database import: disk space: updated schema size (315GB)
- 06:45 PM Revision 13001: /README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
- 06:44 PM Revision 13000: /README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
- 06:41 PM Revision 12999: /README.TXT: use `up` instead of `svn up --force` for consistency
- 06:40 PM Revision 12998: fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
- 06:39 PM Revision 12997: /README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
- 06:38 PM Revision 12996: fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
- 06:31 PM Revision 12995: bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile
- 06:28 PM Revision 12994: bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) *before* initializing vars that use own-schema functions
- 06:12 PM Revision 12993: inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
- 06:08 PM Revision 12992: inputs/input.Makefile: import: validate at the end of the import
- 06:02 PM Revision 12991: inputs/input.Makefile: added new-style aggregating validations (`validate` target)
- 06:02 PM Revision 12990: bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path
- 06:00 PM Revision 12989: fix: bin/vegbien_dest: added public_validations
- 05:41 PM Revision 12988: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
- 05:39 PM Revision 12987: lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try
- 05:38 PM Revision 12986: fix: lib/sh/archives.sh: unzip(): support -p option, which pipes extracted data to stdout
- 05:11 PM Revision 12985: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
- 05:11 PM Revision 12984: added lib/runscripts/extract_header.run
- 05:09 PM Revision 12983: fix: lib/sh/make.sh: direct the user to use begin_target instead of set_make_vars (set_make_vars is now used by begin_target)
- 05:06 PM Revision 12982: fix: lib/runscripts/util.run: to_top_file(): handle $_remake properly, without requiring deferred_check_target_exists to set to_file()'s flags
- 05:03 PM Revision 12981: bugfix: lib/sh/util.sh: die(): usage: documented that if msg uses $(...), save_e is needed
- 04:59 PM Revision 12980: bugfix: lib/sh/util.sh: already_exists_msg(): need to save_e, because new $(mk_hint) call resets $?
- 04:55 PM Revision 12979: lib/sh/util.sh: die(): always errexit even if $e = 0, because die always indicates an error
- 04:53 PM Revision 12978: lib/sh/util.sh: added rethrow!(), which always errexits, even if $e = 0
- 04:53 PM Revision 12977: lib/sh/util.sh: rethrow(): also work in situations where $e is not set
- 04:50 PM Revision 12976: lib/sh/util.sh: rethrow: made it a function since there is now no need for it to be an alias
- 04:47 PM Revision 12975: lib/sh/util.sh: rethrow: removed `test "$e" != 0` since errexit only does anything if $e != 0
- 04:45 PM Revision 12974: lib/sh/util.sh: removed separate rethrow_exit*, rethrow_subshell*, since they now do the same thing as rethrow*
- 04:42 PM Revision 12973: lib/sh/util.sh: rethrow*!: use new errexit, which works in functions *and* subshells
- 04:38 PM Revision 12972: lib/sh/util.sh: added errexit(), used in place of (exit "$1") because a bug in bash prevents subshells from triggering errexit
- 04:18 PM Revision 12971: lib/sh/util.sh: added bool!()
- 03:08 PM Revision 12970: fix: lib/sh/util.sh: redir(): need to indent before invoking an external command (not just in command__exec(), but for all redir() calls)
03/29/2014
- 04:10 AM Revision 12969: lib/sh/make.sh: with_rm(): documented that it only works inside a runscript target that starts w/ begin_target
- 04:06 AM Revision 12968: *{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
- 03:58 AM Revision 12967: lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
03/28/2014
- 07:17 AM Revision 12966: schemas/vegbien.sql: updated _specimens_01_count_of_total_records_specimens_in_source_db
- 07:10 AM Revision 12965: validation/aggregating/specimens/qualitative_validations_specimens.sql: use taxonoccurrence instead of location as the table that all specimens should have, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
- 07:03 AM Revision 12964: lib/runscripts/util.run: support conventional main() method as well as `all` target
- 03:03 AM Task #562 (New): flatten the mappings
- normalized VegCore's @traceable.id_by_source@ now provides an alternate pkey that can be used for duplicate-merging, ...
- 02:55 AM Task #539 (Rejected): get analytical_stem_view to use merge joins instead of hash joins
- the query planner is likely right that hash joins are faster when joining entire tables rather than just the first fe...
- 02:53 AM Task #440: aggregating validations of imports
- see [[Aggregating validations]]
- 02:52 AM Task #290 (Resolved): benchmark tests for database loading
- this is now the [[Aggregating validations]]
- 02:39 AM Revision 12963: fix: inputs/*/*/map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
- 02:34 AM Revision 12962: fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
- 02:32 AM Revision 12961: fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
- 02:03 AM Revision 12960: bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
- 01:31 AM Revision 12959: /README.TXT: moved "to back up e-mails" and "to back up the version history" before settings backup so that the local backup of these is up to date when everything gets backed up
- 01:29 AM Revision 12958: inputs/XAL/Specimen/header.csv: updated
- 12:45 AM Revision 12957: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: do this before the general sync so that any reverse sync that's needed won't include it
- 12:44 AM Revision 12956: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: use bin/sync_upload now that this works for rsync-ignored files
- 12:36 AM Revision 12955: bugfix: lib/sh/sync.sh: don't unintentionally rsync-ignore explicitly-specified files
- 12:32 AM Revision 12954: lib/sh/util.sh: filesystem: added is_*(), could_be_*()
- 12:31 AM Revision 12953: lib/sh/util.sh: added contains_match()
- 12:31 AM Revision 12952: lib/sh/util.sh: added ends_with()
03/27/2014
- 11:13 PM Revision 12951: fix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: run `up` on all machines, not just jupiter, because all must be up-to-date to avoid extraneous diffs
- 11:11 PM Revision 12950: bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: `svn up` on jupiter: need to use up alias because that adds --force
- 11:10 PM Revision 12949: bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter: needs to be in main dir (~/bien), not ~/Dropbox/svn/
- 11:08 PM Revision 12948: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter to avoid extraneous diffs when rsyncing
- 10:41 AM Revision 12947: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
- 10:32 AM Revision 12946: planning/workflow/bien3_architecture.pptx: stage I: clarified that the database input is intended to be a *normalized* input, and its corresonding output is intended to be *denormalized*
- 10:29 AM Revision 12945: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: stage I: clarified that the database input is intended to be a *normalized* input, and its corresonding output is intended to be *denormalized*
- 09:02 AM Revision 12944: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: _specimens_16_list_distinct_specimen_descriptions: should use DISTINCT
- 09:01 AM Revision 12943: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_16_list_distinct_specimen_descriptions
- 09:00 AM Revision 12942: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_16_list_distinct_specimen_descriptions
- 08:53 AM Revision 12941: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_15_list_distinct_locality_descriptions
- 08:48 AM Revision 12940: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_09_list_of_unique_verbatim_author_taxa_with_genus
- 08:47 AM Revision 12939: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_08_count_of_unique_verbatim_author_taxa_with_genus
- 08:36 AM Revision 12938: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_05_list_of_verbatim_species_excluding_author
- 08:35 AM Revision 12937: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_04_count_of_unique_verbatim_species_without_author
- 08:23 AM Revision 12936: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_03_list_of_verbatim_families
- 08:18 AM Revision 12935: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_02_count_of_unique_verbatim_families
- 08:06 AM Revision 12934: schemas/vegbien.ERD.mwb: regenerated exports
- 08:04 AM Revision 12933: schemas/vegbien.sql: public_validations: added _specimens_01_count_of_total_records_specimens_in_source_db
- 07:35 AM Revision 12932: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_01_count_of_total_records_specimens_in_source_db
- 07:34 AM Revision 12931: validation/aggregating/specimens/qualitative_validations_specimens.sql: added config statements for datasource and query planner
- 05:06 AM Revision 12930: web/links/index.htm: updated to Firefox bookmarks: Firefox: added instructions for enabling security.password_lifetime and making all tabs load when the browser is opened
- 04:43 AM Revision 12929: /README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
- 04:43 AM Revision 12928: /README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
- 04:40 AM Revision 12927: /README.TXT: Schema changes: changed "update staging table column names" to "update mappings and staging table column names"
- 04:13 AM Revision 12926: fix: validation/aggregating/specimens/qualitative_validations_specimens.sql: use pg_dump's formatting for COMMENT ON to facilitate diffing against a pg_dump export of the DDL statements
- 04:07 AM Revision 12925: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: removed DDL statements so that running the query file does not alter the database, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
- 04:01 AM Revision 12924: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to DB, which pg_dump-formats the views
- 03:57 AM Revision 12923: validation/**.sql: replaced CREATE OR REPLACE VIEW with CREATE VIEW to match pg_dump output for diffing
- 03:36 AM Revision 12922: added inputs/NY/validations*.sql*
- 03:34 AM Revision 12921: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: use pg_dump's formatting for COMMENT ON to facilitate diffing against a pg_dump export of the DDL statements
- 03:31 AM Revision 12920: bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
- 03:27 AM Revision 12919: bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
03/26/2014
- 09:34 PM Revision 12918: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: decimalLatitude/decimalLongitude: need to cast to double precision for numeric comparisons
- 09:33 PM Revision 12917: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: CollectedDate: updated for refreshed NY data
- 09:30 PM Revision 12916: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: fixed typos in column aliases
- 09:23 PM Revision 12915: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated column names to VegCore, using `bin/in_place validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql env text=1 bin/repl inputs/NY/Ecatalog_all/map.csv` from the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 09:23 PM Revision 12914: fix: bin/repl: text mode (whether all patterns are plain text) should default to on, not off, if matching entire cells in a spreadsheet
- 07:16 PM Revision 12913: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: need to enclose additional mixed-case identifiers in "", using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 07:15 PM Revision 12912: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: need to enclose additional mixed-case identifiers in "", using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 06:09 PM Revision 12911: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: abbreviated view names longer than 63 chars to prevent them from being truncated
- 06:07 PM Revision 12910: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: escape any ' inside '...' by doubling them
- 06:04 PM Revision 12909: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated SQL to Postgres
- 05:32 PM Revision 12908: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: changed /* */ comments to COMMENT ON comments, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW
- 04:58 PM Revision 12907: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: removed no longer needed -- comments containing the query name, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW
- 03:47 PM Revision 12906: validation/aggregating/specimens/qualitative_validations_specimens.sql: moved notes to comments to after the query
- 03:46 PM Revision 12905: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: moved notes to comments to after the query
- 03:44 PM Revision 12904: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: moved "Check" comments to after the query, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 03:22 PM Revision 12903: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed "Check: should return [#] rows" comments because these only apply to the NY results, not to all specimens datasources
- 03:16 PM Revision 12902: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: prepended CREATE VIEW, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW and the same abbreviations as the output queries (validation/aggregating/specimens/qualitative_validations_specimens.sql)
- 03:01 PM Revision 12901: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: synced "Check" comments to output queries validation/aggregating/specimens/qualitative_validations_specimens.sql
- 02:49 PM Revision 12900: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: enclosed mixed-case identifiers in "" using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 02:37 PM Revision 12899: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated column names to VegCore, using `bin/in_place validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql env text=1 bin/repl inputs/NY/Ecatalog_all/map.csv` from the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 02:29 PM Revision 12898: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to use column names for refreshed NY data
- 02:17 PM Revision 12897: fix: bin/repl: don't consider uppercase SQL keywords to indicate that a word is in a sentence
- 12:02 AM Revision 12896: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: use our staging tables instead of the BIEN2 MySQL staging tables
03/25/2014
- 11:52 PM Revision 12895: validation/aggregating/specimens/**.sql: removed trailing whitespace, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
- 11:39 PM Revision 12894: archived validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.sql
- 11:39 PM Revision 12893: added validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql, copied from qualitative_validations_source_db_NYBG.sql
- 11:33 PM Revision 12892: validation/aggregating/specimens/qualitative_validations_specimens.sql: added ; at end of `CREATE OR REPLACE VIEW` statements
- 04:18 AM Revision 12891: inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
03/24/2014
- 06:22 PM Revision 12890: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed input-query-specific comments
- 06:21 PM Revision 12889: validation/aggregating/specimens/qualitative_validations_specimens.sql: reworded rowcount check comments to apply to the output queries
- 06:18 PM Revision 12888: validation/aggregating/specimens/qualitative_validations_specimens.sql: shortened view names to fit within the 63-char limit without truncation
- 05:45 PM Revision 12887: /README.TXT: `make inputs/{NVS,SALVIAS,TEAM}/test`: updated runtime (1 min)
- 05:35 PM Revision 12886: schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
- 05:32 PM Revision 12885: inputs/run: postprocess(): updated runtime (25 min)
- 05:22 PM Revision 12884: fix: validation/aggregating/specimens/qualitative_validations_specimens.sql: changed "Full inner join" to "Full outer join" because a FULL JOIN is a type of outer join, not inner join
- 05:04 PM Revision 12883: /README.TXT: calls to `inputs/run postprocess`: direct user to refer to inputs/run for this, so the runtime doesn't have to be updated in multiple places
- 05:02 PM Revision 12882: inputs/run: postprocess(): updated runtime (20 min)
- 05:01 PM Revision 12881: /README.TXT: Schema changes: added steps to update staging table column names on the local machine and vegbiendev
- 04:50 PM Revision 12880: fix: schemas/VegCore/mk_derived: added `EOF` at end to avoid (benign) "here-document delimited by end-of-file" warnings on Linux
- 01:49 AM Revision 12879: mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes|make inputs/{NVS,SALVIAS,TEAM}/test`.
- 01:43 AM Revision 12878: bugfix: lib/runscripts/table.run: schema/make calls: need to use `make schema` instead because old-style datasources don't have a top-level runscript (the absence of this identifies them as old-style so inputs/input.Makefile works correctly)
- 01:21 AM Revision 12877: /README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: recorded runtime (30 s)
- 01:17 AM Revision 12876: /README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: prepended `time` to enable obtaining the runtime
- 01:11 AM Revision 12875: /README.TXT: Maintenance: VegCore data dictionary: `inputs/run postprocess`: updated runtime (20 min)
- 12:45 AM Revision 12874: fix: schemas/util.sql: trim(): by default, cascadingly drop dependent columns so that they don't prevent trim() from succeeding. note that this requires the dependent columns to then be manually re-created.
03/23/2014
03/22/2014
- 06:20 AM Revision 12872: bugfix: lib/sh/util.sh: **DON'T** do `shopt -s lastpipe` because this causes a segfault on Linux in stderr_matches(). (it also isn't supported on Mac.) use @PIPESTATUS instead. note that we do not currently need lastpipe, since we use @PIPESTATUS (which actually provides more functionality for our purposes).
- 06:02 AM Revision 12871: fix: lib/sh/util.sh: echo_func(): file/line #: display with regular color because the lighter color actually draws attention *to* rather than *away from* the faded text
- 05:59 AM Revision 12870: lib/sh/util.sh: added plain()
- 05:56 AM Revision 12869: inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
- 05:50 AM Revision 12868: bugfix: lib/sql_io.py: put_table(): is_literals: `return sql.value(cur): need to use sql.value_or_none() instead to support multi-row functions, such as _split() used in specimens data`
- 05:06 AM Revision 12867: fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
- 04:55 AM Revision 12866: lib/runscripts/util.run: on_exit(): documented that users can also override gateway()/fallback() to perform other commands (or no commands) after the script is read
- 04:53 AM Revision 12865: bugfix: lib/sh/db.sh: pg_table_exists(): need ! to negate boolean result
- 04:44 AM Revision 12864: fix: lib/runscripts/table.run: table_make_install(): need to inform the user when it skips installing a table, because this is often unexpected
- 04:43 AM Revision 12863: fix: lib/runscripts/util.run: run_args_cmd(): need to indent the output of the target that it's running
- 04:15 AM Revision 12862: lib/runscripts/table.run: removed no longer used datasrc_make_install()
- 04:07 AM Revision 12861: fix: lib/sh/util.sh: fade(): use medium gray instead of light gray because it fades on white *and* black backgrounds
- 03:54 AM Revision 12860: lib/sh/util.sh: echo_func(): fade the file/line # to avoid distracting from the function call in the default log output
- 03:51 AM Revision 12859: lib/sh/util.sh: added fade()
- 03:37 AM Revision 12858: lib/sh/util.sh: highlight_msg(): renamed to highlight_log_msg() to clarify that this contains log++-specific functionality
- 03:35 AM Revision 12857: lib/sh/util.sh: moved terminal formatting commands to own section
- 03:34 AM Revision 12856: lib/sh/util.sh: highlight_msg(): moved formatting code into separate format() function
- 03:21 AM Revision 12855: lib/sh/util.sh: dp(): renamed to ps() to corresponding with pv/pf
- 03:19 AM Revision 12854: lib/sh/make.sh: echo_target: use `log-- echo_func`, which now puts the target name first but also provides much-needed indentation
- 03:16 AM Revision 12853: lib/sh/util.sh: echo_func(): put file/line # *after* function call instead of before so the function name is listed first
- 03:13 AM Revision 12852: lib/sh/util.sh: echo_func(): usage: removed no longer used/implemented minor=1 switch. use log++ instead.
- 03:07 AM Revision 12851: lib/runscripts/datasrc_dir.run: import(): use new schema/make, schema/rm
- 02:59 AM Revision 12850: lib/runscripts/table.run: load_data(): use the much simpler `schema/make` run target, rather than outsourcing to the legacy Makefile via the convoluted datasrc_make_install()/table_make_install()
- 02:26 AM Revision 12849: lib/runscripts/datasrc_dir.run: added schema/rm(), schema/make()
- 02:19 AM Revision 12848: lib/sh/util.sh: ignore_err_msg(): usage: added $ignore_e param from stderr_matches()
- 02:14 AM Revision 12847: lib/runscripts/table.run: psql: always include ; at end of statement
- 01:39 AM Revision 12846: fix: lib/sh/db.sh: pg_cmd(): hide PGPASSWORD at the normal verbosity so that the value of it doesn't appear in any log files
- 01:08 AM Revision 12845: lib/sh/util.sh: log_hint(): renamed to log_err_hint() for clarity, because this applies only to hints for errors
- 01:06 AM Revision 12844: bugfix: lib/sh/util.sh: log_hint!(): use log_err instead of log_info because hints as used here are attached to (possibly benign) errors. for other uses, use mk_hint().
- 01:00 AM Revision 12843: fix: lib/sh/util.sh: highlight_msg(): don't ' '-pad already-formatted text
- 12:57 AM Revision 12842: lib/sh/util.sh: manual terminal escape sequences: use highlight_msg() instead
- 12:53 AM Revision 12841: lib/sh/util.sh: highlight_msg(): auto-add padding around text if there is a background
- 12:51 AM Revision 12840: lib/sh/util.sh: highlight_msg(): use $format itself as the $highlight boolean
- 12:48 AM Revision 12839: lib/sh/util.sh: highlight_msg(): split apart the testing of $format and can_highlight_msg
- 12:39 AM Revision 12838: lib/sh/util.sh: added has_bg()
- 12:28 AM Revision 12837: bugfix: lib/sh/util.sh: highlight_msg(): need to reset any existing formatting before applying new formatting
- 12:25 AM Revision 12836: lib/sh/util.sh: added mk_hint() and use it in log_hint!()
- 12:16 AM Revision 12835: lib/sh/util.sh: bg_cmd(): also log the command being run
- 12:07 AM Revision 12834: fix: lib/sh/util.sh: need `function` before functions that have an alias with the same name
- 12:04 AM Revision 12833: lib/sh/util.sh: log!(): use new log:()
- 12:00 AM Revision 12832: lib/sh/util.sh: added log:(), which sets an explicit log_level. this also simplifies log+().
03/21/2014
- 11:55 PM Revision 12831: lib/sh/util.sh: log+(): set log_level before PS4 so that the PS4 expr doesn't also need to add to log_level
- 11:51 PM Revision 12830: lib/sh/util.sh: removed no longer needed log+ alias (which had been renamed from clog+)
- 11:48 PM Revision 12829: lib/sh/util.sh: clog*: renamed to log* for clarity (possible now that log* is no longer used for function-local log_level setting)
- 11:44 PM Revision 12828: *{.sh,run}: local setting of log_level: use log_local instead of relying on the log* aliases, so that these aliases can instead be used for wrapping commands (the more common use case)
- 11:40 PM Revision 12827: bugfix: lib/sh/util.sh: verbosity_compat alias: need to use `declare verbosity="$verbosity"` instead of `declare verbosity`, which would just clear $verbosity
- 11:38 PM Revision 12826: bugfix: lib/sh/util.sh: verbosity_min alias: need to use `declare verbosity="$verbosity"` instead of log_local now that verbosity is not one of the vars changed by log++
- 11:30 PM Revision 12825: lib/sh/util.sh: log+(): use easier-to-understand log_local instead of prefix-assignments to limit assignments to the invoked command
- 11:30 PM Revision 12824: lib/sh/util.sh: log+(): use easier-to-understand log_local instead of prefix-assignments to limit assignments to the invoked command
- 10:57 PM Revision 12823: *{.sh,run}: use clog* instead of "log*"
- 10:45 PM Revision 12822: bugfix: lib/sh/util.sh: log+(): removed spurious ; between setting of PS4 and log_level, which was causing erratic mismatches between PS4 and log_level. (the ; caused $PS4 to be set in the *caller* when invoked via one of the clog* aliases, rather than being passed as a command-specific env var.)
- 10:30 PM Revision 12821: lib/sh/util.sh: $verbosity: stay constant at what the user set it to instead of changing in tandem with $log_level, to facilitate debugging verbosity/log_level-related issues
- 10:11 PM Revision 12820: lib/sh/util.sh: log+(): usage: use aliases instead of ""-ed function names
- 06:58 PM Revision 12819: added schemas/VegCore.ERD.pdf symlink for easy access
- 06:50 PM Revision 12818: lib/sh/util.sh: log_err(): use red background for better visibility of errors, in the same way that lib/exc.py print_ex() does for column-based import
- 06:44 PM Revision 12817: bugfix: lib/sh/util.sh: removed echo_func in functions used by log++, to avoid spurious highlighted output
- 06:40 PM Revision 12816: lib/sh/util.sh: added missing clog+ alias
- 06:35 PM Revision 12815: bugfix: lib/sh/util.sh: log_hint(): use the standard log_fd and log_info() format, not err_fd and log_err() format, for hint messages
- 06:27 PM Revision 12814: fix: lib/sh/util.sh: log_msg!(): indent each line, not just the first
- 06:26 PM Revision 12813: lib/sh/util.sh: added split_lines()
- 06:05 PM Revision 12812: lib/sh/util.sh: log(): factored out helper function log_msg!()
- 06:00 PM Revision 12811: fix: lib/sh/util.sh: highlight_msg(): bold instead of underlining because the underlining interferes with the readability of the commands
- 05:57 PM Revision 12810: lib/sh/util.sh: highlight_msg(): allow turning off formatting w/ empty $format
- 05:53 PM Revision 12809: fix: lib/sh/util.sh: log_err() calls: removed manual highlighting
- 05:51 PM Revision 12808: lib/sh/util.sh: log_err(): highlight all error messages using highlight_msg()'s new $format
- 05:45 PM Revision 12807: lib/sh/util.sh: highlight_msg(): support custom format
- 05:35 PM Revision 12806: lib/sh/db.sh: pg_*_exists(): log the DB statements to check this at a higher log_level so that they don't clutter up the log output
- 05:25 PM Revision 12805: lib/sh/util.sh: log(): highlight log_level 1 messages to stand out against other output, for easier debugging
- 04:31 PM Revision 12804: *{.sh,run}: stderr_matches() wrapper calls: removed no longer needed prep_try/rethrow
- 04:12 PM Revision 12803: bugfix: catch(): also need to support $1='' because this is a now a use case of ignore_e()
- 04:02 PM Revision 12802: bugfix: lib/sh/util.sh: ignore_err_msg(): also need to ignore false exit status on no match
- 03:49 PM Revision 12801: lib/sh/util.sh: stderr_matches(): moved prep_try/rethrow into the function itself so that callers don't have to wrap this function in a complex sequence of prep_try/rethrow statements
- 03:42 PM Revision 12800: *{.sh,run}: stderr_matches() wrapper calls: removed no longer needed prep_try/rethrow
- 03:42 PM Revision 12799: lib/sh/util.sh: stderr_matches(): moved prep_try/rethrow into the function itself so that callers don't have to wrap this function in a complex sequence of prep_try/rethrow statements
- 03:25 PM Revision 12798: lib/sh/util.sh: added rethrow_exit alias
- 03:10 PM Revision 12797: fix: lib/sh/db.sh: pg_table_exists(): use stderr_matches() rather than just the exit status. this also avoids highlighting the benign error.
- 03:00 PM Revision 12796: fix: lib/sh/db.sh: pg_table_exists(): use stderr_matches() rather than just the exit status. this also avoids highlighting the benign error.
- 02:16 AM Revision 12795: fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
- 12:34 AM Revision 12794: fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
- 12:31 AM Revision 12793: bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
- 12:02 AM Revision 12792: inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)
03/20/2014
- 11:54 PM Revision 12791: inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
- 11:54 PM Revision 12790: inputs/VegBank/projectcontributor_/test.xml.ref: updated inserted row count
- 10:53 PM Revision 12789: bugfix: schemas/util.sql: is_constant(util.col_ref): updated to include standard newline at beginning of comment (applies to newly-imported staging tables)
- 10:44 PM Revision 12788: bugfix: inputs/VegBank/import_order.txt: added missing project, needed to trigger the staging table renaming for the project table
- 10:42 PM Revision 12787: inputs/VegBank/run: documented `rm=1 import()` runtime (>1.5 h)
- 10:40 PM Revision 12786: inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
- 08:27 PM Revision 12785: inputs/MO/Specimen/test.xml.ref: updated, which adds dateCollected mappings
- 08:20 PM Revision 12784: inputs/WIN/Specimen/test.xml.ref: updated to map.csv, which has eventDate->dateCollected
- 08:13 PM Revision 12783: inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
- 08:08 PM Revision 12782: lib/sh/make.sh: begin_target: echo all targets to facilitate debugging without needing the verbose stack trace mode
- 08:06 PM Revision 12781: bugfix: lib/sh/make.sh: echo_target: don't include filename/line #, since this is not for the stack trace mode
- 07:59 PM Revision 12780: lib/sh/make.sh: added echo_target
- 07:58 PM Revision 12779: *{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
- 07:51 PM Revision 12778: lib/runscripts/util.run: runscript template: added sample make target, using new make target template
- 07:48 PM Revision 12777: lib/sh/make.sh: added make target template
- 07:47 PM Revision 12776: inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
- 07:41 PM Revision 12775: lib/sh/make.sh: added begin_target alias
- 07:17 PM Revision 12774: lib/runscripts/datasrc_dir.run: documented how to reinstall staging tables (`rm=1 .../run import`)
- 07:13 PM Revision 12773: bugfix: *{.sh,run}: stderr_matches() wrapper callers: use the required wrapper caller usage, which now includes rethrow and prep_try
- 07:09 PM Revision 12772: bugfix: lib/sh/util.sh: rethrow*: only `return` if $e is actually nonzero, because rethrow is now being used as a catch-all in situations where there might not be an error
- 07:06 PM Revision 12771: lib/sh/util.sh: prep_try: initialize $e to 0 to simplify error-handling coding
- 06:56 PM Revision 12770: stderr_matches(): wrapper caller usage: added alternative usage when using `||`
- 06:50 PM Revision 12769: lib/sh/util.sh: stderr_matches(): wrapper caller usage: documented usage for a negated condition (ie. prefixed w/ !)
- 06:48 PM Revision 12768: lib/sh/util.sh: stderr_matches(): usage: split into wrapper usage and wrapper caller usage for clarity
- 06:45 PM Revision 12767: fix: *{.sh,run}: stderr_matches() wrappers: usage: added `rethrow`
- 06:45 PM Revision 12766: fix: *{.sh,run}: stderr_matches() wrappers: usage: added `rethrow`
- 06:43 PM Revision 12765: fix: lib/sh/util.sh: stderr_matches(): usage: `rethrow` must be called right after stderr_matches(), to avoid calling running other commands if there is an error
- 06:40 PM Revision 12764: fix: lib/sh/util.sh: stderr_matches(): when using $ignore_e, also set benign_error=1 to suppress the highlighting of the error
- 06:36 PM Revision 12763: bugfix: lib/sh/db.sh: pg_schema_exists(): need to ignore benign error exit status from the "cannot create temporary relation in non-temporary schema" error
- 06:34 PM Revision 12762: lib/sh/util.sh: stderr_matches(): supporting ignoring any benign error exit status associated with the error message being tested for
- 06:18 PM Revision 12761: lib/sh/util.sh: stderr_matches(): usage: documented where any ignore_e statement would go
- 05:37 PM Revision 12760: bugfix: lib/sh/util.sh: stderr_matches(): can't use `try` because this clears the exit status, which is needed for @PIPESTATUS to work. to support this, also need to avoid errexiting since @PIPESTATUS will be used instead.
- 01:25 AM Task #878 (New): fix crow's foot notation in ERD
- * when there is an open circle on the straight end, also put an open circle on the crow's foot end, so that the outgo...
03/18/2014
- 06:18 PM Revision 12759: lib/sh/util.sh: added dp(), which debug-prints a message
- 05:47 PM Revision 12758: bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
- 05:41 PM Revision 12757: inputs/CVS/plot_/postprocess.sql: add unique constraint on locationName (analogous to the unique constraint in plot), for use by inputs/VegBank/plot/postprocess.sql in removing inter-datasource duplication
- 05:26 PM Revision 12756: fix: schemas/util.sql: explain2notice_msg(): don't include EXPLAIN output for simple, single-value queries, to avoid cluttering up the log output
- 05:22 PM Revision 12755: schemas/util.sql: added fold_explain_msg()
- 05:22 PM Revision 12754: bugfix: bin/repl: only use excluded_prefix_re/excluded_suffix_re in text mode (used in renaming columns in SQL scripts), to prevent the special coding for column renames from also affecting regular regexp/word replacements
- 05:10 PM Revision 12753: inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
- 05:34 AM Revision 12752: inputs/run: postprocess(): documented runtime (30 min)
- 05:16 AM Revision 12751: bugfix: inputs/input.Makefile: %/postprocess.sql: don't perform replacements using map.csv, because map.csv is not idempotent. this functionality was only there to facilitate switching to new-style import, which is now largely done. (the remaining datasources NVS, SALVIAS, TEAM contain only 1 postprocess.sql: inputs/SALVIAS/projects/postprocess.sql (`st inputs/{NVS,SALVIAS,TEAM}/*/postprocess.sql`).)
- 04:59 AM Revision 12750: bugfix: bin/repl: text mode: also don't match if it's part of a '-'-separated identifier
- 04:57 AM Revision 12749: bugfix: bin/repl: text mode: also don't match if it's a word in a sentence
- 04:42 AM Revision 12748: bugfix: bin/repl: text mode: turned off the suffix matching, because there are cases where a mapping adds a suffix which would cause the same replacement to be performed repeatedly
- 04:33 AM Revision 12747: inputs/input.Makefile: %/postprocess.sql: *always* run this, not just if the associated map spreadsheets change, to avoid needing to `touch` them to cause %/postprocess.sql to run
- 04:25 AM Revision 12746: bin/repl: text mode: exclude prefixes that should not cause replacement, to avoid doubling leading *
- 04:24 AM Revision 12745: fix: inputs/*/*/postprocess.sql: un-doubled *
- 04:06 AM Revision 12744: bugfix: inputs/input.Makefile: %/postprocess.sql: also need to apply renames from mappings/VegCore.thesaurus.csv, as these have been applied to map.csv
- 04:04 AM Revision 12743: bugfix: lib/runscripts/table.run: custom_postprocess(): need to apply renames to SQL statements in postprocess.sql before it can be run
- 04:03 AM Revision 12742: bin/repl: text mode: also match w/ suffix (eg. _verbatim)
- 03:10 AM Task #577 (Rejected): use views instead of map spreadsheets to store the datasource mappings
- the staging table columns are now renamed instead of creating a view that maps the columns
- 03:07 AM Task #584 (Resolved): enable running all the import steps from one runscript
- 02:59 AM Revision 12741: bugfix: /README.TXT: Maintenance: VegCore data dictionary: apply new data dict mappings: need to use postprocess rather than import runscript target, so that the command also works on an svn checkout without the flat files (the flat files are not needed for the staging table renaming)
Also available in: Atom