bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: updated filter condition to match output query
inputs/NY/run: `make inputs/NY/validate`: updated runtime (8 min, with added queries)
fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
bugfix: lib/runscripts/import.run: all(): also need to propagate $rm to import()
bugfix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13*: also need to include coordinate pairs which have one of their coordinates NULL, by using OR instead of AND
bugfix: inputs/NY/validations.sql: _specimens_13b_list_of_all_decimal_lat_long: matched column types to output query
bugfix: inputs/NY/validations.sql: _specimens_13a_list_of_all_verbatim_lat_long: matched column types to output query
inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: added breakdowns _specimens_13a_list_of_all_verbatim_lat_long, _specimens_13b_list_of_all_decimal_lat_long to help troubleshoot the diff
fix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: count lat/longs together instead of separately, because the DISTINCT is by coordinate pair, not individual coordinate value (which wouldn't make much sense)
bugfix: schemas/vegbien.sql: rm_output_queries(): need to account for the fact that util.truncated_prefixed_name_regexp() returns a whole-string regexp. this drops support for removing output queries with a particular group prefix, which we no longer use.
bugfix: schemas/vegbien.sql: rm_output_queries(): need to include relations whose names were truncated, as well
fix: schemas/vegbien.sql: public_validations schema comment: to remove a validations query so its columns can be changed: use rm_output_queries() rather than rm_query_view() because that also removes input queries
bugfix: schemas/util.sql: is_castable(): need to pass NULL through, for proper NULL propagation
fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric
inputs/NY/validations.-.util.sql: added util.is_castable() wrapper
schemas/util.sql: added is_castable()
schemas/util.sql: added try_cast()
schemas/util.sql: added util.cast(), which allows casting to an arbitrary type without eval()
bugfix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: DISTINCT: added coordsaccuracy_m
bugfix: schemas/vegbien.sql: coordinates_unique: added coordsaccuracy_m
fix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because the coordinates_unique unique constraint includes other columns as well, so there may be multiple instances of each lat/long
bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: integrated row labels into table
validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: added line for different rows (vs. missing/extra)
inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
fix: schemas/vegbien.sql: specimens*_of_unique_verb_subsp_taxa_with_author: include only names with subspecies (filtering by taxonverbatim.subspecies rather than taxonlabel.taxonomicname)
bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set after opening the `screen` shell because array vars are apparently not inherited by the `screen` shell
/README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name
/README.TXT: Full database import: added instructions for importing just a subset of the datasources
bugfix: lib/sh/util.sh: local_array/export_array: do need -a because that it's an array is apparently not autodetected by the () on Mac
mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
bugfix: web/.phpPgAdmin/.htaccess: work around phpPgAdmin bug that causes page to be ignored when not logged in
fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
web/.phpPgAdmin/.htaccess: extract path components 1st->last: documented that can't use subject param for this because that goes to the last selected tab, not the default (leftmost) tab
bugfix: inputs/NY/validations.sql: specimens*_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on *_of_unique_verb_subsp_taxa_with_author instead)
schemas/vegbien.sql: taxonverbatim: added subspecies, as decided in the conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema-2)
fix: schemas/vegbien.sql: plots* with duplicated rows: removed duplicated rows
schemas/vegbien.sql: specimens*: ran through pipeline
removed old version validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.sql. use validation/aggregating/plots/SALVIAS/_archive/bien3_validations_salvias_db_original.sql instead.
schemas/vegbien.sql: _specimens_16_list_distinct_specimen_descriptions: re-ran through pipeline after removing duplicated rows
schemas/vegbien.sql: rm_output_queries(): also support removing just a particular output query
bugfix: schemas/util.sql: remake_diff_table(): need to rm_freq() type_table, because left/right_table don't have freq yet
schemas/util.sql: auto_rm_freq(): use new rm_freq()
schemas/util.sql: added rm_freq(regclass[])
fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: ran through pipeline
fix: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: use same column names as input query
schemas/util.sql: remake_diff_table(): result table comment: documented how to display NULL values that are extra or missing
schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: ran through pipeline
fix: schemas/vegbien.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: also need to convert to text in GROUP BY/ORDER BY
bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
_license/UCSB/LICENSE.TXT: use (c) verbatim from the e-mail, not as displayed as © by Thunderbird
bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query results, not the queries themselves.
schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline
validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL
bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name
schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql
validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB
lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression
lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files
bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files
bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)
fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text
validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well
validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark
schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.
inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.
inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables
inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
schemas/vegbien.sql: public_validations: added specimens queries to pipeline
validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource
validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()
fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure both correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)
validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)
bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit
lib/sh/util.sh: import_vars: documented that vars already set will not be overwritten
inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
added validation/aggregating/pipeline/validations_on_sparse_datasources.odg
planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx