Project

General

Profile

Statistics
| Revision:
  • svn:ignore: extern

# Date Author Comment
13086 04/09/2014 08:19 PM Aaron Marcuse-Kubitza

fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query

13085 04/09/2014 06:23 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query results, not the queries themselves.

13084 04/09/2014 02:55 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline

13083 04/09/2014 02:48 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL

13082 04/09/2014 02:46 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name

13081 04/09/2014 02:40 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql

13080 04/09/2014 02:25 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB

13079 04/09/2014 02:55 AM Aaron Marcuse-Kubitza

lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression

13078 04/09/2014 02:54 AM Aaron Marcuse-Kubitza

lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files

13077 04/09/2014 02:52 AM Aaron Marcuse-Kubitza

bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files

13076 04/09/2014 02:50 AM Aaron Marcuse-Kubitza

bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)

13075 04/08/2014 03:49 PM Aaron Marcuse-Kubitza

fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names

13074 04/08/2014 02:55 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text

13073 04/08/2014 02:52 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well

13072 04/08/2014 02:50 PM Aaron Marcuse-Kubitza

validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark

13071 04/08/2014 01:52 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.

13070 04/08/2014 01:40 PM Aaron Marcuse-Kubitza

inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.

13069 04/08/2014 01:38 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.

13068 04/08/2014 01:19 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)

13067 04/08/2014 12:49 PM Aaron Marcuse-Kubitza

inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)

13066 04/07/2014 06:21 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables

13065 04/07/2014 06:19 PM Aaron Marcuse-Kubitza

inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)

13064 04/07/2014 06:09 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements

13063 04/07/2014 06:07 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: public_validations: added specimens queries to pipeline

13062 04/07/2014 05:51 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource

13061 04/07/2014 05:35 PM Aaron Marcuse-Kubitza

validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()

13060 04/07/2014 05:17 PM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure both correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)

13059 04/07/2014 04:35 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)

13058 04/07/2014 01:43 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit

13057 04/07/2014 10:39 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: import_vars: documented that vars already set will not be overwritten

13056 04/07/2014 09:47 AM Aaron Marcuse-Kubitza

inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)

13055 04/04/2014 06:13 PM Aaron Marcuse-Kubitza

added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource

13054 04/03/2014 07:31 PM Aaron Marcuse-Kubitza

added validation/aggregating/pipeline/validations_on_sparse_datasources.odg

13053 04/03/2014 04:13 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx

13052 04/03/2014 04:09 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together

13051 04/03/2014 03:40 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx

13050 04/03/2014 03:39 PM Aaron Marcuse-Kubitza

planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)

13049 04/03/2014 08:53 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long

13048 04/03/2014 08:35 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count

13047 04/03/2014 08:04 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace

13046 04/03/2014 07:32 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace

13045 04/03/2014 07:31 AM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long

13044 04/02/2014 05:55 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions

13043 04/02/2014 05:36 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.

13042 04/02/2014 05:21 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name

13041 04/02/2014 05:16 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity

13040 04/02/2014 05:14 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it

13039 04/02/2014 05:01 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution

13038 04/02/2014 04:38 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency

13037 04/02/2014 04:09 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide

13036 04/02/2014 04:03 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author

13035 04/02/2014 03:54 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide

13034 04/02/2014 03:14 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries

13033 04/02/2014 03:13 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)

13032 04/02/2014 03:05 PM Aaron Marcuse-Kubitza

validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)

13031 04/02/2014 11:17 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05

13030 04/02/2014 10:56 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors

13029 04/02/2014 10:45 AM Aaron Marcuse-Kubitza

fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing

13028 04/02/2014 09:01 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)

13027 04/01/2014 05:40 PM Aaron Marcuse-Kubitza

/README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.

13026 04/01/2014 04:24 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space

13025 04/01/2014 04:13 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev

13024 04/01/2014 04:04 PM Aaron Marcuse-Kubitza

/README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)

13023 04/01/2014 03:43 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning

13022 04/01/2014 03:37 PM Aaron Marcuse-Kubitza

fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.

13021 04/01/2014 02:04 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores

13020 04/01/2014 01:36 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.

13019 04/01/2014 01:31 PM Aaron Marcuse-Kubitza

fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish

13018 04/01/2014 01:29 PM Aaron Marcuse-Kubitza

lib/common.Makefile: added $(nice) and use it everywhere its definition is used

13017 04/01/2014 01:14 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits

13016 04/01/2014 12:47 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)

13015 04/01/2014 12:45 PM Aaron Marcuse-Kubitza

/Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make

13014 04/01/2014 12:43 PM Aaron Marcuse-Kubitza

fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run

13013 04/01/2014 12:42 PM Aaron Marcuse-Kubitza

/Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step

13012 04/01/2014 12:30 PM Aaron Marcuse-Kubitza

bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))

13011 04/01/2014 12:15 PM Aaron Marcuse-Kubitza

/Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target

13010 04/01/2014 11:40 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables

13009 04/01/2014 11:37 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`

13008 04/01/2014 11:26 AM Aaron Marcuse-Kubitza

backups/TNRS.backup.md5: updated

13007 04/01/2014 11:23 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import

13006 04/01/2014 11:08 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell

13005 03/30/2014 07:54 PM Aaron Marcuse-Kubitza

fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.

13004 03/30/2014 07:53 PM Aaron Marcuse-Kubitza

fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used

13003 03/30/2014 07:52 PM Aaron Marcuse-Kubitza

lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)

13002 03/30/2014 06:52 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: disk space: updated schema size (315GB)

13001 03/30/2014 06:45 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."

13000 03/30/2014 06:44 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine

12999 03/30/2014 06:41 PM Aaron Marcuse-Kubitza

/README.TXT: use `up` instead of `svn up --force` for consistency

12998 03/30/2014 06:40 PM Aaron Marcuse-Kubitza

fix: /README.TXT: always use `up` instead of `svn up` since this includes --force

12997 03/30/2014 06:39 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed

12996 03/30/2014 06:38 PM Aaron Marcuse-Kubitza

fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)

12995 03/30/2014 06:31 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile

12994 03/30/2014 06:28 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) before initializing vars that use own-schema functions

12993 03/30/2014 06:12 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations

12992 03/30/2014 06:08 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: import: validate at the end of the import

12991 03/30/2014 06:02 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: added new-style aggregating validations (`validate` target)

12990 03/30/2014 06:02 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path

12989 03/30/2014 06:00 PM Aaron Marcuse-Kubitza

fix: bin/vegbien_dest: added public_validations

12988 03/30/2014 05:41 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format

12987 03/30/2014 05:39 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try