inputs/IRMNG/map.csv: updated to scrubbed output names from */map.csv (/map.csv does not currently get scrubbed)
bugfix: inputs/IRMNG/species_homonyms/header.csv, map.csv: reset input columns to DSV (delim-separated values) header. they had gotten changed to the output names in running map.csv with remake=1, causing it to be remade from the (renamed) staging tables.
inputs/input.Makefile: $(_svnFilesGlob): added *Makefile
/README.TXT: `make inputs/{upload,download}`: first run with test=1 to see what the diffs will be
added inputs/IRMNG/, including runscripts to download the names. this is now the 2nd datasource after GBIF to use runscripts, and the 3rd after FIA/GBIF to use new-style import.
inputs/input.Makefile: $(_svnFilesGlob): added *run (runscripts)
lib/runscripts/table.run: import(): also run remake_VegBIEN_mappings() to accept the test output. this function was previously unused, but was left in for future use when lib/import.sh was translated to lib/runscripts/table.run (it was used in its import.sh form in inputs/FIA/occurrence_all/import).
bugfix: lib/runscripts/table.run: remake_VegBIEN_mappings(): need to change to $top_dir before running `rm header.csv map.csv`
lib/sh/util.sh: added in_top_dir()
lib/runscripts/table.run: remake_VegBIEN_mappings(): only remake header.csv, map.csv if this target is being run directly, to avoid needing to remake them every time. for tables that are views, this instead requires them to be explicitly remade when the view columns change.
bugfix: lib/runscripts/subdir.run: subdir_make(): only remake if $remake has been explicitly propagated to subdir_make() by using self_make
lib/sh/make.sh: added deferred_check_target_exists alias and use it in check_fake_target_exists
added lib/sh/web.sh with curl wrapper
lib/sh/make.sh: added check_wildcard_target_exists alias
lib/sh/util.sh: added wildcard1 alias
lib/sh/util.sh: added echo1()
lib/runscripts/table.run: load_data(): first make sure schema is installed
lib/runscripts/table.run: added datasrc_make_install()
table_make_install(): take $install_log as an overridable kw param to support install logs in different locations
lib/runscripts/table.run: load_data(): split noclobber functionality into separate table_make_install() function, which can be used by other install-related targets
added schemas/VegBIEN/taxonomy/higherPlantGroup.xlsx.src.txt with Brad's description of how the names were chosen
added schemas/VegBIEN/taxonomy/higherPlantGroup.xlsx
schemas/VegBIEN/planning/taxonomy/: moved non-VegBIEN-specific resources to planning/resources/taxonomy/. this includes Brad's all-important Nomenclature_excerpt.ppt with the Latin taxonomic hierarchy suffixes on slide 5.
bugfix: schemas/vegbien.sql: taxon_trait_view: use the TNRS-scrubbed name from ScrubbedTaxon when available
schemas/vegbien.sql: split geoscrub_input_view's new-row-only filtering into separate view geoscrub_input_new, so that the full geoscrub_input rows are still available. the reduction in geoscrub_input from eliminating the already-scrubbed rows was only 280,000 (5076500 - 4799173) out of a possible 1.7 million (1707970), so it makes sense to just run geoscrubbing on the full input. (the lower-than-expected reduction is most likely due to rows from pre-refresh data being present in the original geoscrub_output table, which have been replaced by different, post-refresh input rows.)
added exports/_archive/
mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: use new _filter_genus() (see r9882)
backups/TNRS.backup.md5: updated
bin/make_analytical_db: use new mk_table() instead of TRUNCATE/INSERT
bin/make_analytical_db: added mk_table() and use it in mk_analytical_table()
schemas/vegbien.sql: higher_plant_group_nodes: ferns and allies: added Lycopodiophyta node, as requested by Brad in the conference call (wiki.vegpath.org/2013-06-13_conference_call)
schemas/vegbien.sql: geoscrub_input_view: exclude rows that have already been geoscrubbed, by anti-joining on geoscrub_output
inputs/.geoscrub/geoscrub_output/postprocess.sql: set decimallatitude, decimallongitude types to double precision to facilitate joining with other double precision values
inputs/.geoscrub/geoscrub_output/postprocess.sql: coords index: added rest of input columns so this can be used to check the existence of a result by input. added runtime (55 s). use idempotent create_if_not_exists().
bugfix: schemas/vegbien.sql: higher_plant_group_nodes: removed ferns and allies nodes Anthocerotophyta, Marchantiophyta, Bryophyta, which were incorrectly said to be part of this clade in the BIEN2 analytical DB overview (/planning/workflow/validation/BIEN2_Analytical_DB_overview.docx > p. 13 bottom > last ΒΆ). see http://wiki.vegpath.org/2013-06-13_conference_call#fix-higher_plant_group_nodes-mapping .
bugfix: /Makefile: postgres-Linux: phpPgAdmin: added steps to configure it for Apache 2.4
/run: geoscrub_input/make(): documented runtime (40 s)
bin/make_analytical_db: added `/run export_` to make the geoscrub_input CSV export
inputs/.TNRS/schema.sql: tnrs_populate_fields(): removed no longer needed casts of *_score to double precision
inputs/.TNRS/schema.sql: tnrs: *_score: changed type to double precision because these fields are always floats. this also avoids the need to manually cast them to double precision each time they are used.
lib/tnrs.py: HTTP requests: rewrapped lines
lib/tnrs.py: updated HTTP requests to match current web app
bugfix: lib/tnrs.py: download_request_template: changed dirty to true (to match the current web app), which is apparently needed to apply the source_sorting setting to the downloaded TSV in addition to the GUI results
lib/tnrs.py: retrieval_request_template: turned source_sorting back off, because it causes any match from the first source to always be used, even if it has a lower match score than the match from the other source. (Brad confirms that this should be off.) I think we had this on originally to ensure that only Tropicos results were used when available, rather than USDA when it was a better match. * note that due to a bug in the web app, this change will not actually be effective, because the source_sorting option is only applied to the GUI results, not the downloaded TSV. *
inputs/.TNRS/schema.sql: tnrs: Name_number: changed type to integer so it would sort numerically
inputs/.TNRS/schema.sql: added pkey on Time_submitted, Name_number
inputs/.TNRS/schema.sql: changed Name_submitted pkey to a unique constraint to allow adding a pkey on Time_submitted, Name_number instead
inputs/.TNRS/schema.sql: Time_submitted, Name_number: added NOT NULL constraints so that they can be used in a unique constraint
lib/tnrs.py: submission_request_template: include GCC in addition to Tropicos, because it provides more synonyms than Tropicos for Asteraceae, and the accepted names still match the Tropicos backbone (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/2013-06-13_conference_call#include-GCC-when-running-TNRS)
inputs/.TNRS/tnrs/tnrs.make: removed no longer needed end time, now that the total runtime is printed
inputs/.TNRS/tnrs/tnrs.make: print the total runtime using `time`
inputs/.TNRS/tnrs/tnrs.make: include the end time in addition to the start time so that the total runtime can be calculated
lib/sh/util.sh: command-specific alternate stdin/stdout/stderr: choice of 40/41/42: added mnemonic that 4 looks like A for Alternate
bugfix: /README.TXT: Full database import: added step to remove any leftover TNRS lockfile. usually, the PID in it would not exist, but sometimes it now refers to a different, active process which blocks tnrs.make.
schemas/vegbien.sql: allow public_ to view lookup tables (cultivated_family_locations, higher_plant_group_nodes)
added backups/TNRS.backup.md5, vegbien.r9459.backup.md5
bugfix: lib/sh/local.sh: sync_upload(): need to use --no-group to prevent the group from being reset to aaronmk upon download from jupiter (which uses group aaronmk instead of bien). use ./fix_perms to set the group of all files to bien. also use --no-owner in case running as root.
lib/sh/sync.sh: removed sync_download(). use swap=1 sync_upload() instead.
lib/sh/sync.sh: removed download(). use swap=1 upload, or swap=1 upload_caller, instead. this avoids having separate upload()/download() pairs for every caller of upload(), because you can instead just set swap=1.
bugfix: lib/sh/sync.sh: upload(): don't kw_params $swap because this unexports it, preventing put from seeing it. instead, use echo_vars to print it.
added bin/sync_upload, a wrapper around sync_upload()
bugfix: lib/sh/sync.sh: upload(): only add `--exclude="**"` if there are --includes. this enables running upload() without paths to upload all files.
lib/sh/sync.sh: upload(): support passing -- options to put, which will not be run through the path->--include processing
bugfix: lib/sh/sync.sh: upload(): added missing `local args=()` initializer
/README.TXT: Full database import: On local machine: added step to do steps under Maintenance > "to synchronize vegbiendev, jupiter, and your local machine", which is needed in addition to `make inputs/upload` since that doesn't handle overwrites or deletions
/README.TXT: Maintenance: to synchronize vegbiendev, jupiter, and your local machine: added warning that you should pay careful attention to all files that will be deleted or overwritten (as the three machines are often out of sync)
added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.0.preamble.sql
/README.TXT: Full database import: make inputs/{upload,download}: run them first with `test=1` to see what the changes will be
/README.TXT: Full database import: `svn up`: use --force to avoid errors about existing files
mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: filter out genera that contain numbers (using new _filter_genus()), which break TNRS and prevent it from matching any other parts of the name. later, these genera can instead be moved to the end of the name, where TNRS will correctly match them as Unmatched_terms.
bugfix: inputs/VegBIEN/: added _no_import to disable import for this folder, since this is actually just an entry in web/datasources/ with VegPath redirection links, rather than an input to the import process. this fixes "schema "VegBIEN" does not exist" errors generated in `make test`.
inputs/input.Makefile: $(dontImport): also support putting a _no_import file at the top level in the datasource to exclude the entire datasource
bugfix: lib/sh/local.sh: removed make() override, which is no longer needed now that its operations are performed by verbosity_compat(), and which caused errors by setting $verbosity to the invalid value ""
bugfix: lib/sh/util.sh: verbosity_compat(): always use default verbosity (`unset verbosity`) when verbosity == 1, regardless of whether the caller has set $verbosity to the special value "", because $verbosity is supposed to be an integer field and "" is not supported by most functions that use $verbosity. in cases where a util.sh script is invoked, it will set $verbosity back to the default value 1, so this will function as before for util.sh scripts and fix $verbosity for scripts that use a different verbosity system.
added inputs/GBIF/raw_occurrence_record_plants/table.tsv.md5
inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: regenerated. updated for new staging table input columns, which are now the same as the output columns.
bugfix: inputs/input.Makefile: %/VegBIEN.csv: use header from map.csv instead of the new columns, so that source.shortname is set to GBIF instead of VegCore
inputs/input.Makefile: %/VegBIEN.csv: when a runscript is available, instead map the output columns of map.csv to VegBIEN, because the columns have been renamed in the staging table
inputs/GBIF/raw_occurrence_record_plants/VegBIEN.csv: regenerated, which adds row_num input col
lib/sh/util.sh: echo_func(): check can_log at beginning of function, so that the resource-intensive func_loc (which calls `readlink -f`) does not need to be called if echo_cmd would not log anything at the current verbosity
lib/sh/util.sh: echo_func(): removed no longer used $minor flag. use `clog++... echo_func` instead.
lib/sh/util.sh: verbosity_compat(): don't make $verbosity a local var of this function, because then the changes will not be visible to the caller
bugfix: bin/make: use verbosity_compat because some make-invoked commands (e.g. bin/map) don't support verbosity=""
lib/sh/util.sh: command(): command__exec(): use verbosity_compat to support commands that don't support verbosity=""
lib/sh/util.sh: added verbosity_compat(), for use with commands that don't support verbosity=""
bugfix: lib/sh/local.sh: make(): when invoking overridden func, need make__make_sh
bugfix: lib/sh/util.sh: self, self_sys aliases: need to remove any func_override suffix __* from the FUNCNAME
bugfix: inputs/GBIF/import_order.txt, run: updated raw_occurrence_record/ to raw_occurrence_record_plants/
inputs/FIA/occurrence_all/test.xml.ref: update inserted row count
bugfix: bin/make: include local.sh so that its default verbosity-setting make() override will be used
lib/sh/local.sh: added make() override, which uses the default verbosity (i.e. verbosity="") when verbosity == 1. scripts that use lib/sql.py (which uses $verbosity) have different default verbosities, and this default should not be overriden by an env var, unless a higher verbosity has been set.
lib/sh/local.sh: added missing include of make.sh, used by root_make()
schemas/vegbien.sql: added _filter_genus()
inputs/GBIF/raw_occurrence_record_plants/run: import() runtime: specified that this does not include table.tsv.gz/make()
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: # duplicates: added revision #
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented that there are 4.5 million duplicates (59,998,354 rows before - 55,417,646 rows after = 4,580,708)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: added rerun time (~0 thanks to index, so no problem doing the DELETE each time postprocess.sql is run)
*{.sh,run}: use simpler .rel() instead of `. "$(dirname "${BASH_SOURCE0}")"/...` for relative includes