Project

General

Profile

Statistics
| Revision:

# Date Author Comment
10944 09/12/2013 06:43 PM Aaron Marcuse-Kubitza

inputs/VegBank/: prepended the table name to each column name to prevent column collisions, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource

10943 09/12/2013 06:17 PM Aaron Marcuse-Kubitza

inputs/VegBank/: switched to new-style import, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource

10942 09/12/2013 06:13 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/stemlocation_/map.csv: put columns in table order, which is needed by new-style import

10941 09/12/2013 05:57 PM Aaron Marcuse-Kubitza

inputs/VegBank/stemlocation_/: translated one-to-many mappings to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10940 09/12/2013 05:49 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/taxonobservation_/map.csv: put columns in table order, which is needed by new-style import

10939 09/12/2013 05:26 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters: need to use GREATEST instead of _alt() to handle cases where the coordinate uncertainty is > than the fuzzing uncertainty, where you wouldn't want to just use the smaller fuzzing uncertainty

10938 09/12/2013 05:20 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10937 09/12/2013 05:11 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/postprocess.sql: map_*() derived cols: updated runtime

10936 09/12/2013 05:10 PM Aaron Marcuse-Kubitza

inputs/VegBank/plot_/: translated single-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10935 09/12/2013 04:36 PM Aaron Marcuse-Kubitza

inputs/VegBank/stemcount_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10934 09/12/2013 04:31 PM Aaron Marcuse-Kubitza

inputs/VegBank/stemlocation_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10933 09/12/2013 04:30 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/postprocess.sql: scientificName: recorded runtime (15 s)

10932 09/12/2013 04:15 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10931 09/12/2013 04:14 PM Aaron Marcuse-Kubitza

inputs/VegBank/taxonobservation_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns

10930 09/12/2013 03:37 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/postprocess.sql: use much simpler LEFT JOINs instead of nested RIGHT JOINs, which required lots of () to get them to happen in the right order. note that the columns are now provided in reverse instead of forwards path order, but this is still much clearer than the nested mess of RIGHT JOINs. this approach can also be used to simplify VegBank's joins.

10929 09/12/2013 03:34 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/view.run: remake_VegBIEN_mappings(): also need to remake header.csv, not just map.csv as for tables, because view columns may change when the view is regenerated

10928 09/12/2013 02:42 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: specimen: changed definition to "something collected from a plant" rather than just "a physical part of a plant", to support using this table for identifying pictures and descriptions of a plant (as DwC does)

10927 09/12/2013 02:28 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: regenerated exports and udpated image map

10926 09/12/2013 02:24 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: reobservable_presence: allow it to be vouchered by any reobservable element (including a tagged individual), not just a specimen

10925 09/12/2013 02:01 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: specimen.defining_data: clarified that the observations in this are actually a subset of individual_observation.traits (specifically, the subset that can be used to make a taxonomic redetermination). information in this field should therefore always also be stored in individual_observation.traits.

10924 09/12/2013 01:54 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: specimen: added specimen_unique_in_individual_observation unique constraint, analogous to specimen_unique_in_individual

10923 09/12/2013 01:34 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: regenerated exports and udpated image map

10922 09/12/2013 01:29 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.ERD.mwb: specimen: added defining_data, which for a digital-only specimen, stores the information that comprises the specimen. note that a taxon_presence without a physical voucher can still qualify as reobservable if a detailed description of it is provided in this field, to make taxonomic redeterminations on. for datasources like VegBank, which incorrectly allow multiple taxon_determinations for any type of taxon_observation, their taxonomic redeterminations would actually be considered invalid if made on a purely taxon_presence observation (i.e. just a taxon name) without a detailed description that could be used to make a redetermination. this is different than the scrubbing of a taxon name, which relates a taxon name to another taxon name, rather than a taxon_observation to a completely different taxon name.

10921 09/12/2013 12:35 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: set_fds(): don't add surrounding quotes to empty redirect dest

10920 09/12/2013 12:31 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: set_fds(): need to check if redirect is empty before escaping it with `printf %q`, which may add surrounding quotes to an empty string

10919 09/12/2013 11:40 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: attribution and conditions of use: documented that Brad/Brian/Bob should work on this, as decided in the conference call (wiki.vegpath.org/2013-09-12_conference_call#data-provider-metadata)

10918 09/12/2013 05:47 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: reformatted to fit all rows and all per-week columns on one page

10917 09/12/2013 05:30 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: streamline process of mapping and adding a new datasource: added subtask to create interactive scripts for each import step

10916 09/12/2013 05:15 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: improve and complete data provider metadata: moved to end because this can also been added manually to the source table, and does not have to be in place before running column-based import

10915 09/12/2013 05:09 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: flatten the datasources to a common schema: added subtask to left-join unvalidated datasources since they need the flattening in order to validate them properly

10914 09/12/2013 04:21 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: rebalanced dots

10913 09/12/2013 04:15 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved items marked later to separate section at bottom

10912 09/12/2013 04:13 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved revisions to schema under datasource validations because schema changes are largely driven by validations problems uncovered

10911 09/12/2013 04:12 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: split tasks into weeks

10910 09/12/2013 03:47 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

10909 09/12/2013 03:35 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: split months into (currently identical) weeks

10908 09/12/2013 03:19 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: added During month of label above months

10907 09/12/2013 03:09 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: switched to portrait mode to better fit the new format, which hides columns for past months

10906 09/12/2013 03:05 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: hid crossed out rows to show just the remaining tasks

10905 09/12/2013 03:03 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: crossed out avoid DB restructuring when ingesting a new datasource, because FIA (which is flattened before import) does properly support optional subplots and diamond linking of subplots to parent plot events, which were necessary to ingest an arbitrary flattened plots datasource

10904 09/12/2013 02:55 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: crossed out fully-completed tasks. rebalanced dots.

10903 09/12/2013 02:46 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved switching to new-style import to top of streamline process of mapping and adding a new datasource because this puts all the datasource adding steps (except filling in the mappings) into one rerunnable script

10902 09/12/2013 02:36 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: hid columns for past months so that the current and future months are right next to each task

10901 09/12/2013 02:31 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved streamline process of mapping and adding a new datasource before documentation testing because this will assist the documentation tester in running the import process

10900 09/12/2013 02:26 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved geoscrubbing re-run under add any missing columns because this is needed to fully populate the geoscrubbing columns

10899 09/12/2013 02:20 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: added documentation testing, usability testing priority tasks (wiki.vegpath.org/Priority_tasks). lowercased tasks for consistency with the wiki and to avoid needing to sentence case new subtasks.

10898 09/12/2013 01:53 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: moved Flatten the datasources to a common schema under Datasource validations because the query left-joining the tables is needed for validation, and it is much easier to validate datasources when there is only one input table to validate

10897 09/11/2013 02:52 PM Aaron Marcuse-Kubitza

added derived/biengeo/Geovalidation_and_geoscrubbing_update.presentation.url

10896 09/09/2013 06:12 PM Aaron Marcuse-Kubitza

added BIEN2/traits_observation_counts.xls

10895 09/09/2013 05:44 PM Aaron Marcuse-Kubitza

/README.TXT: Single datasource import: removed rescrub step because this is not needed by the current TNRS process

10894 09/09/2013 02:04 PM Aaron Marcuse-Kubitza

web/links/index.htm: updated to Firefox bookmarks. MySQL: added steps to add a user if you are not root but have sudo access.

10893 09/07/2013 08:19 PM Aaron Marcuse-Kubitza

BIEN2/country_species/: svn:ignore the .tsv exports

10892 09/07/2013 08:19 PM Aaron Marcuse-Kubitza

BIEN2/country_species/run: documented runtime (1 min)

10891 09/07/2013 08:15 PM Aaron Marcuse-Kubitza

added BIEN2/country_species/run, which exports each BIEN2 country's species list

10890 09/07/2013 08:14 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: set_fds(): need to escape redirect destinations which are files, because they may contain special shell characters

10889 09/07/2013 08:10 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added rm_prefix()

10888 09/07/2013 07:11 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_cmd(): added caller usage with connection/login opts

10887 09/07/2013 07:08 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql(), mysql_export(): usage: added database=...

10886 09/07/2013 12:30 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: Data provider validations: renamed to Datasource validations to clarify that this is a validation of the datasources, but not necessarily by the data providers

10885 09/05/2013 07:19 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: added Running individual steps separately label for the section that is not part of the main import, but is useful if the import is aborted part of the way through

10884 09/05/2013 05:02 PM Aaron Marcuse-Kubitza

/README.TXT: moved Single datasource import, Datasource setup to top since these are the most important howtos

10883 09/05/2013 04:14 PM Aaron Marcuse-Kubitza

bugfix: schemas/Makefile: enclose schema names in "" so that they won't be lowercased

10882 09/05/2013 03:56 PM Aaron Marcuse-Kubitza

bugfix: schemas/Makefile, lib/common.Makefile: enclose schema names in "" so that they won't be lowercased

10881 09/05/2013 03:26 PM Aaron Marcuse-Kubitza

/run: geoscrub_input/make(): updated runtime (20 s)

10880 09/05/2013 01:31 PM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: Data provider validations (spot-checking): moved ahead of Individual datasource refresh as decided in conference call

10879 09/05/2013 01:29 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_plot: added aggregateOrganismObservationID from analytical_stem

10878 09/05/2013 08:38 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

10877 09/05/2013 08:37 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: Data provider validations: added subtask for Aggregated validations (counts)

10876 09/05/2013 01:17 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: analytical DB: updated rowcount

10875 09/05/2013 01:14 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: updated import times

10874 09/05/2013 01:01 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: reimport: don't remove the existing import first, because it will instead be removed by the publish step. this ensures there is always one complete copy of the datasource in the DB.

10873 09/05/2013 01:00 AM Aaron Marcuse-Kubitza

added backups/vegbien.r10848.backup.md5

10872 09/05/2013 12:59 AM Aaron Marcuse-Kubitza

backups/TNRS.backup.md5: updated

10871 09/05/2013 12:11 AM Aaron Marcuse-Kubitza

bugfix: bin/import_all: use reimport_scrub instead of import_scrub so that the temp suffix of the datasource name is removed

10870 09/05/2013 12:02 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: reimport: use import_publish instead of import so that the reimport replaces the previous import

10869 09/04/2013 11:59 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: added import_publish, which removes the temp suffix when the import is done

10868 09/04/2013 11:48 PM Aaron Marcuse-Kubitza

bugfix: bin/after_import: run backups/fix_perms right after the backup files are created to make them private

10867 09/04/2013 11:32 PM Aaron Marcuse-Kubitza

bugfix: backups/fix_perms: just make the backups themselves private, since the other files are in svn, and their permissions should match their accessibility through Redmine

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10865 09/04/2013 05:27 PM Aaron Marcuse-Kubitza

bugfix: bin/make_analytical_db: `/run export_`: don't take input from the terminal, because this causes rm to prompt the user (from a background task) about overwriting the previous export

10864 09/04/2013 05:26 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: Publish the new import: added runtime (1 min)

10863 09/04/2013 03:00 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(map2db): import to datasrc.new instead of plain datasrc, so that the current import of the datasrc is not overwritten

10862 09/04/2013 02:59 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: added publish (`make inputs/src/publish`)

10861 09/04/2013 02:55 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: source: removed testing row that had gotten in during `make schemas/remake`

10860 09/04/2013 02:43 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: added %/publish (`make inputs/src/src.version/publish`)

10859 09/04/2013 02:32 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: datasource_publish(): need to remove the current live datasource instead of the datasource to publish. note that datasource_rename() does not currently generate an error if the specified datasource doesn't exist.

10858 09/04/2013 02:27 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: datasource_publish(): run it in a nested transaction so that there is always one published copy of the datasource. (note that a nested transaction is not automatically created for each function, http://stackoverflow.com/questions/6274457/set-isolation-level-for-postgresql-stored-procedures?In_PG_your_procedures_aren%27t_separate_transactions#answer-6283201 .)

10857 09/04/2013 01:57 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: added datasource_publish()

10856 09/04/2013 01:53 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: added datasource_rename()

10855 09/04/2013 01:51 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: added rm_version_suffix()

10854 09/04/2013 01:28 PM Aaron Marcuse-Kubitza

bin/map: allow user to override the source env var, which is used as the source.shortname value in the DB

10853 09/04/2013 09:43 AM Aaron Marcuse-Kubitza

exports/: svn:ignore *.zip

10852 09/04/2013 09:42 AM Aaron Marcuse-Kubitza

inputs/WIN/Specimen/unmapped_terms.csv: updated

10851 09/04/2013 09:37 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: updated import times

10850 08/31/2013 07:47 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: time to wait for the import to finish: updated to time in inputs/import.stats.xls

10849 08/31/2013 07:44 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: `rm inputs/.TNRS/tnrs/tnrs.make.lock`: need to use `"rm"` instead of `rm` so that we don't use any rm alias the user might have in their shell (import_all is run in the calling shell so that the jobs are owned by the calling shell)

10848 08/31/2013 07:36 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: don't map datasetURL to source.url for taxa-only data (this mapping should only occur for Source tables)

10847 08/31/2013 07:27 PM Aaron Marcuse-Kubitza

bin/import_all: added step to remove any leftover TNRS lockfile (previously done manually)

10846 08/31/2013 06:46 PM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

10845 08/31/2013 06:32 PM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: put_table(): Getting output table pkeys of existing/inserted rows: need to include the index cond in the join condition here, too (using var join_custom_cond), so that an index scan can be used instead of a much slower full-table sort