Project

General

Profile

Statistics
| Revision:
  • svn:ignore: extern

# Date Author Comment
13149 04/16/2014 06:41 PM Aaron Marcuse-Kubitza

fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks

13148 04/16/2014 06:35 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/import.run: all(): also need to propagate $rm to import()

13147 04/16/2014 04:24 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13*: also need to include coordinate pairs which have one of their coordinates NULL, by using OR instead of AND

13146 04/16/2014 04:15 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: _specimens_13b_list_of_all_decimal_lat_long: matched column types to output query

13145 04/16/2014 04:14 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: _specimens_13a_list_of_all_verbatim_lat_long: matched column types to output query

13144 04/16/2014 03:13 PM Aaron Marcuse-Kubitza

inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: added breakdowns _specimens_13a_list_of_all_verbatim_lat_long, _specimens_13b_list_of_all_decimal_lat_long to help troubleshoot the diff

13143 04/16/2014 02:04 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: count lat/longs together instead of separately, because the DISTINCT is by coordinate pair, not individual coordinate value (which wouldn't make much sense)

13142 04/15/2014 08:12 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: rm_output_queries(): need to account for the fact that util.truncated_prefixed_name_regexp() returns a whole-string regexp. this drops support for removing output queries with a particular group prefix, which we no longer use.

13141 04/15/2014 07:59 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: rm_output_queries(): need to include relations whose names were truncated, as well

13140 04/15/2014 07:14 PM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: public_validations schema comment: to remove a validations query so its columns can be changed: use rm_output_queries() rather than rm_query_view() because that also removes input queries

13139 04/15/2014 07:00 PM Aaron Marcuse-Kubitza

bugfix: schemas/util.sql: is_castable(): need to pass NULL through, for proper NULL propagation

13138 04/15/2014 06:52 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric

13137 04/15/2014 06:29 PM Aaron Marcuse-Kubitza

inputs/NY/validations.-.util.sql: added util.is_castable() wrapper

13136 04/15/2014 06:12 PM Aaron Marcuse-Kubitza

schemas/util.sql: added is_castable()

13135 04/15/2014 06:10 PM Aaron Marcuse-Kubitza

schemas/util.sql: added try_cast()

13134 04/15/2014 05:51 PM Aaron Marcuse-Kubitza

schemas/util.sql: added util.cast(), which allows casting to an arbitrary type without eval()

13133 04/14/2014 05:04 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: DISTINCT: added coordsaccuracy_m

13132 04/14/2014 05:02 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: coordinates_unique: added coordsaccuracy_m

13131 04/14/2014 04:56 PM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because the coordinates_unique unique constraint includes other columns as well, so there may be multiple instances of each lat/long

13130 04/14/2014 04:51 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on

13129 04/14/2014 04:48 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import

13128 04/14/2014 04:24 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: integrated row labels into table

13127 04/14/2014 04:04 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: added line for different rows (vs. missing/extra)

13126 04/14/2014 03:58 PM Aaron Marcuse-Kubitza

inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count

13125 04/14/2014 03:23 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.

13124 04/14/2014 03:22 PM Aaron Marcuse-Kubitza

inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)

13123 04/10/2014 04:06 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: specimens*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname

13122 04/10/2014 03:52 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql

13121 04/10/2014 03:50 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB

13120 04/10/2014 03:41 PM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: specimens*_of_unique_verb_subsp_taxa_with_author: include only names with subspecies (filtering by taxonverbatim.subspecies rather than taxonlabel.taxonomicname)

13119 04/10/2014 03:13 PM Aaron Marcuse-Kubitza

bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set after opening the `screen` shell because array vars are apparently not inherited by the `screen` shell

13118 04/10/2014 02:42 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name

13117 04/10/2014 02:41 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added instructions for importing just a subset of the datasources

13116 04/10/2014 02:38 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: local_array/export_array: do need -a because that it's an array is apparently not autodetected by the () on Mac

13115 04/10/2014 02:24 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries

13114 04/10/2014 02:05 PM Aaron Marcuse-Kubitza

bugfix: web/.phpPgAdmin/.htaccess: work around phpPgAdmin bug that causes page to be ignored when not logged in

13113 04/10/2014 01:25 PM Aaron Marcuse-Kubitza

fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names

13112 04/10/2014 01:25 PM Aaron Marcuse-Kubitza

fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author

13111 04/10/2014 01:17 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: specimens*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name

13110 04/10/2014 12:59 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: specimens*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter

13109 04/10/2014 12:57 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)

13108 04/10/2014 12:35 PM Aaron Marcuse-Kubitza

web/.phpPgAdmin/.htaccess: extract path components 1st->last: documented that can't use subject param for this because that goes to the last selected tab, not the default (leftmost) tab

13107 04/10/2014 12:03 PM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: specimens*_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on *_of_unique_verb_subsp_taxa_with_author instead)

13106 04/10/2014 11:41 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonverbatim: added subspecies, as decided in the conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema-2)

13105 04/10/2014 06:54 AM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: plots* with duplicated rows: removed duplicated rows

13104 04/10/2014 06:45 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimens*: ran through pipeline

13103 04/10/2014 06:38 AM Aaron Marcuse-Kubitza

removed old version validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.sql. use validation/aggregating/plots/SALVIAS/_archive/bien3_validations_salvias_db_original.sql instead.

13102 04/10/2014 06:19 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql

13101 04/10/2014 06:17 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB

13100 04/10/2014 06:07 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _specimens_16_list_distinct_specimen_descriptions: re-ran through pipeline after removing duplicated rows

13099 04/10/2014 06:02 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: rm_output_queries(): also support removing just a particular output query

13098 04/10/2014 05:26 AM Aaron Marcuse-Kubitza

bugfix: schemas/util.sql: remake_diff_table(): need to rm_freq() type_table, because left/right_table don't have freq yet

13097 04/10/2014 05:18 AM Aaron Marcuse-Kubitza

schemas/util.sql: auto_rm_freq(): use new rm_freq()

13096 04/10/2014 05:17 AM Aaron Marcuse-Kubitza

schemas/util.sql: added rm_freq(regclass[])

13095 04/10/2014 03:45 AM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT

13094 04/10/2014 03:33 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: ran through pipeline

13093 04/10/2014 03:31 AM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: use same column names as input query

13092 04/10/2014 03:10 AM Aaron Marcuse-Kubitza

schemas/util.sql: remake_diff_table(): result table comment: documented how to display NULL values that are extra or missing

13091 04/10/2014 02:40 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: ran through pipeline

13090 04/10/2014 02:38 AM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: also need to convert to text in GROUP BY/ORDER BY

13089 04/10/2014 02:34 AM Aaron Marcuse-Kubitza

bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented

13088 04/10/2014 02:32 AM Aaron Marcuse-Kubitza

_license/UCSB/LICENSE.TXT: use (c) verbatim from the e-mail, not as displayed as © by Thunderbird

13087 04/10/2014 02:07 AM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date

13086 04/09/2014 08:19 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query

13085 04/09/2014 06:23 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query results, not the queries themselves.

13084 04/09/2014 02:55 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline

13083 04/09/2014 02:48 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL

13082 04/09/2014 02:46 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name

13081 04/09/2014 02:40 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql

13080 04/09/2014 02:25 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB

13079 04/09/2014 02:55 AM Aaron Marcuse-Kubitza

lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression

13078 04/09/2014 02:54 AM Aaron Marcuse-Kubitza

lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files

13077 04/09/2014 02:52 AM Aaron Marcuse-Kubitza

bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files

13076 04/09/2014 02:50 AM Aaron Marcuse-Kubitza

bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)

13075 04/08/2014 03:49 PM Aaron Marcuse-Kubitza

fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names

13074 04/08/2014 02:55 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text

13073 04/08/2014 02:52 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well

13072 04/08/2014 02:50 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark

13071 04/08/2014 01:52 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.

13070 04/08/2014 01:40 PM Aaron Marcuse-Kubitza

inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.

13069 04/08/2014 01:38 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.

13068 04/08/2014 01:19 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)

13067 04/08/2014 12:49 PM Aaron Marcuse-Kubitza

inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)

13066 04/07/2014 06:21 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables

13065 04/07/2014 06:19 PM Aaron Marcuse-Kubitza

inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)

13064 04/07/2014 06:09 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements

13063 04/07/2014 06:07 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: added specimens queries to pipeline

13062 04/07/2014 05:51 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource

13061 04/07/2014 05:35 PM Aaron Marcuse-Kubitza

validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()

13060 04/07/2014 05:17 PM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure both correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)

13059 04/07/2014 04:35 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)

13058 04/07/2014 01:43 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit

13057 04/07/2014 10:39 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: import_vars: documented that vars already set will not be overwritten

13056 04/07/2014 09:47 AM Aaron Marcuse-Kubitza

inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)

13055 04/04/2014 06:13 PM Aaron Marcuse-Kubitza

added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource

13054 04/03/2014 07:31 PM Aaron Marcuse-Kubitza

added validation/aggregating/pipeline/validations_on_sparse_datasources.odg

13053 04/03/2014 04:13 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx

13052 04/03/2014 04:09 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together

13051 04/03/2014 03:40 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx

13050 04/03/2014 03:39 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)