bugfix: bin/import_all: don't disable errexit because this prevents the program from being Ctrl-C'd. this functionality is no longer needed now that the README.TXT instructs to run bin/import_all in a subshell.
bin/import_all: removed functionality now provided by util.run
bin/import_all: converted to a runscript so it can use runscript functionality
bin/import_all: hidden_srcs(): removed `by_col=1` because these should be done in the same mode as the main datasources
bugfix: bin/with_all, import_all: don't disown processes because they should be auto-killed if the shell is (disown was only needed before we used screen)
bin/import_all: delete_logs(): documented that `trap EXIT` doesn't run until shell exit
bin/import_all: delete_logs(): print when this happens, so it can be verified that it's happening properly
bugfix: bin/import_all: need to run delete_logs manually because `trap EXIT` doesn't run until bg cmds done
bin/import_all: delete_logs: moved testing of whether to delete logs to delete_logs() so that delete_logs() can be run regardless of the $delete_logs setting
bugfix: bin/import_all: delete_logs(): also need to match log filenames when n=""
bugfix: bin/import_all: now that always using log files to fix output clutter, need to delete created logs if logging is turned off
bugfix: bin/import_all: don't errexit if a background process is Ctrl-C'd
bugfix: bin/import_all: was run without initial "." test: don't exit nonzero because this will close the subshell
bugfix: bin/import_all: ensure that this is run in a subshell, which is needed so errexits don't close the terminal window
bin/import_all: documented that this must be run in a subshell (obtained by running `$0`)
bugfix: bin/import_all: need to always use log files for background processes
fix: bin/import_all: Source/import: don't use by_col=1 for this because it's slower for small #s of rows. by_col mode is no longer needed for metadata-only tables because these tables now have a single empty row so that they also work in row-based mode.
fix: bin/import_all: hidden srcs: use with_all for this to avoid needing to list every source, and to display the backgrounded command with the variables substituted
bin/import_all: TNRS, geoscrub: integrated into the list of metadata sources
bin/import_all: TNRS, geoscrub: use import rather than publish because the non-imported tables have now been excluded
fix: bin/import_all: updated for new metadata datasource names (see issue #940)
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema
bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema
bugfix: bin/import_all: restore the working dir when main() is done, in case it started as something other than the root dir
bugfix: bin/import_all: fix $ when .-included without args (which causes bash to put the wrong values in $ instead of leaving it empty)
when .-included without args (which causes bash to put the wrong values in $
bin/import_all: `make schemas/$version/install`: reinstall instead to allow re-running the import to the same custom schema (e.g. 2013-10-18.Brian_Enquist.Canadensys)
bin/import_all: `make schemas/$version/install`: ignore errors if schema exists, to support running with -e
bugfix: bin/import_all: removing inputs/.TNRS/tnrs/tnrs.make.lock: use `"rm" -f` instead of plain "rm" to avoid having an error exit status, which will abort the script if run with the -e flag (as runscripts are)
bin/*_all: *_main(): renamed to just main() because it does not matter that other shell-includes' main() methods will clobber this, because it is only executed once
bugfix: bin/import_all: Source tables: use .../import instead of import_temp because import_temp is only needed when importing all tables, to prevent the temp suffix from being removed yet
bugfix: bin/import_all: need to publish datasources that won't be published by `make .../import`, so that the per-datasource import XPaths that refer to TNRS/geoscrub will link up with the TNRS/geoscrub source entry instead of creating a new entry without the metadata (because the entry with the metadata was named TNRS.new/geoscrub.new)
bin/import_all: removed no longer needed import of geoscrub data, because analytical_stem_view is now joined to the geoscrub_output table directly, instead of using the imported canon_place entries
bin/with_all: $all: renamed to $hidden_srcs for clarity, since this now just adds the hidden (.*) datasources, rather than always using all datasources
bin/import_all: usage: documented that this can now be run with a custom datasources list (each of the form inputs/src/)
bin/import_all: use just import_scrub, not reimport_scrub, because import_scrub now automatically publishes the datasource's import (i.e. removes the temp suffix)
bugfix: bin/import_all: use reimport_scrub instead of import_scrub so that the temp suffix of the datasource name is removed
bugfix: bin/import_all: `rm inputs/.TNRS/tnrs/tnrs.make.lock`: need to use `"rm"` instead of `rm` so that we don't use any rm alias the user might have in their shell (import_all is run in the calling shell so that the jobs are owned by the calling shell)
bin/import_all: added step to remove any leftover TNRS lockfile (previously done manually)
bin/import_all: use new bin/after_import
bin/import_all: with_all import_scrub: documented that this step uses $by_col, so that users know to include by_col=1 when running this step separately
bin/import_all: use column-based import (by_col=1) by default, instead of requiring the user to explicitly specify it. instead turn it off explicitly (by_col=) for row-based import.
bin/import_all: don't set $dump_opts until running the backup command that uses it, so that the user can run this backup command separately just by copying the line out of the script (without worrying about env vars that need to be set, other than $version which is visible outside the script)
Moved wait on tnrs.make lock from import_all to make_analytical_db, so that running make_analytical_db for a one-time import also waits on the lock
import_all: after_import(): Added wait on tnrs.make's lockfile to ensure that all background scrubbing processes are complete before creating the analytical DB
import_all: Moved `waitpid $jobs` into after_import()
import_all: Output the PIDs of the import_scrub and after_import processes, so those processes can be managed without shell job control. This is useful if the connection is lost to the remote shell running the import, which prevents using job control on the import processes.
import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake
input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev
import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.
import_all: Removed no longer needed TNRS import, which has been replaced by scrub.make (which adds TNRS taxondeterminations after the import instead of creating taxonlabel links before it)
inputs/.TNRS/: Changed tnrs+accepted to a view (defined in schema.sql) so accepted names would automatically be populated as they are parsed by TNRS, rather than needing to run `make inputs/.TNRS/tnrs+accepted/reinstall` to populate them
import_all: Reinstall tnrs+accepted, for eventual use by unscrubbed_taxondetermination_view
import_all: Directly import just the TNRS tables that should be imported, because some TNRS tables are included in import_order.txt so that they are part of the automated testing, but should not be imported at the same time as tnrs_accepted/tnrs_other
import_all: Made temporary vars local, so they wouldn't affect the calling shell
import_all: Make $dump_opts, $public_import local vars, so they will be automatically unset if the script is aborted
import_all: Make $import_source a local var, so it will be automatically unset if the script is aborted
import_all: Added command to add scrubbed taxondeterminations
import_all: Start tnrs-remake after starting the inputs, so that for subset imports (e.g. n=2), there will already be names to scrub when tnrs-remake starts up and it won't enter pause mode to wait for new rows (the pause is calibrated for full imports, and is too long for subset imports)
import_all: Run import with $public_import set in order to exclude excluded datasources
import_all: `make backups/vegbien.$version.backup/test`: Documented that this uses $dump_opts. $dump_opts must be manually set when running this command outside of import_all.
import_all: Allow caller to override $dump_opts
pg_dump_vegbien: Renamed $opts env var to $dump_opts to avoid conflicting with other commands' vars of the same name
make_analytical_db: Automatically call export_analytical_db when finished
import_all: after_import(): Added `make backups/vegbien.$version.backup/test`
import_all: after_import(): Added `make backups/TNRS.backup-remake`
import_all: after_import(): Added export_analytical_db
import_all: Run the import directly into a new, already-versioned public schema. This removes the need to manually rename the schema after import, and allows the backup commands to use the stored $version shell variable to refer to the last import.
import_all: Run all imports (not just the main datasources' import) with $import_source turned off, so that the Source tables will not be imported a second time when the datasource's main tables are imported. Note that it's not necessary to wait for asynchronous commands after the jobs for the main import are started (so that $import_source is not unset until after they are started), because with_all does not return until all jobs are started and have noted the $import_source setting in effect in the shell environment.
import_all: Source tables import: Fixed bug where need to use $all option to with_all to also include special datasources starting with "."
import_all: Fixed bug where need to wait for all asynchronous commands started before the main import, not just the first
import_all: Import all Source tables before the herbaria list, so that any custom metadata will override the info in the herbaria list
import_all: Added import of inputs/.herbaria/ before the main import
import_all: Change to main directory make targets are run from. Use relative paths to bin/ commands, which is possible now that the current dir is set.
import_all: Create a background process that waits until the import is done and then runs make_analytical_db
import_all: Documented that `wait %1` waits for asynchronous commands
import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import
import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)
import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly
import_all: Documented that TNRS import must come after NCBI for cross links to be made
Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables
import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands
import_all: Also import the NCBI tree of life, before the TNRS names
import_all: Added commands to import TNRS names so the user doesn't have to do this manually
tnrs_db: Made wait option default to off to facilitate running tnrs_db by itself, rather than as part of an import
README.TXT: Data import: import_all: Don't run with & because this prevents the created jobs from being owned by the calling shell. Instead, import the TNRS names as a separate backgrounded step and wait for it to finish before starting import_all. Removed TNRS import steps from import_all since these are now invoked separately.
import_all: Use new dedicated cleanup make target to clean up TNRS.tnrs
import_all: Clean up any new TNRS.tnrs entries before importing the TNRS data
import_all: Start the tnrs daemon using `make inputs/.TNRS/tnrs/tnrs-remake &`
import_all: Added import of .TNRS datasource, which happens synchronously before other datasources are imported
import_all: Pass any args, such as vars, through to with_all
Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
import_all: Use new with_all. Use ${BASH_SOURCE0} for $self and $self for $0.
import_all: Print Usage message if was run without initial "."
Renamed import-all to import_all to match convention of using underscores
import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'
import-all: disown each new import process to ignore SIGHUP
Added import-all to import all inputs at once