/trunk/bin/import_all - Changes - BIEN 3 - NCEAS Projects

root/trunk/bin/import_all @ 14905

svn:executable: *

#	Date	Author	Comment
14905	10/26/2014 04:58 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: don't disable errexit because this prevents the program from being Ctrl-C'd. this functionality is no longer needed now that the README.TXT instructs to run bin/import_all in a subshell.
14904	10/26/2014 04:56 PM	Aaron Marcuse-Kubitza	bin/import_all: removed functionality now provided by util.run
14903	10/26/2014 04:56 PM	Aaron Marcuse-Kubitza	bin/import_all: converted to a runscript so it can use runscript functionality
14099	07/17/2014 09:05 AM	Aaron Marcuse-Kubitza	bin/import_all: hidden_srcs(): removed `by_col=1` because these should be done in the same mode as the main datasources
14088	07/16/2014 03:31 PM	Aaron Marcuse-Kubitza	bugfix: bin/with_all, import_all: don't disown processes because they should be auto-killed if the shell is (disown was only needed before we used screen)
14073	07/15/2014 04:49 PM	Aaron Marcuse-Kubitza	bin/import_all: delete_logs(): documented that `trap EXIT` doesn't run until shell exit
14072	07/15/2014 04:48 PM	Aaron Marcuse-Kubitza	bin/import_all: delete_logs(): print when this happens, so it can be verified that it's happening properly
14071	07/15/2014 04:32 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: need to run delete_logs manually because `trap EXIT` doesn't run until bg cmds done
14070	07/15/2014 04:28 PM	Aaron Marcuse-Kubitza	bin/import_all: delete_logs: moved testing of whether to delete logs to delete_logs() so that delete_logs() can be run regardless of the $delete_logs setting
14069	07/15/2014 03:58 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: delete_logs(): also need to match log filenames when n=""
13985	07/11/2014 09:13 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: now that always using log files to fix output clutter, need to delete created logs if logging is turned off
13984	07/11/2014 08:45 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: don't errexit if a background process is Ctrl-C'd
13983	07/11/2014 08:41 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: was run without initial "." test: don't exit nonzero because this will close the subshell
13982	07/11/2014 08:38 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: ensure that this is run in a subshell, which is needed so errexits don't close the terminal window
13981	07/11/2014 08:32 AM	Aaron Marcuse-Kubitza	bin/import_all: documented that this must be run in a subshell (obtained by running `$0`)
13980	07/11/2014 08:25 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: need to always use log files for background processes
13979	07/11/2014 08:12 AM	Aaron Marcuse-Kubitza	fix: bin/import_all: Source/import: don't use by_col=1 for this because it's slower for small #s of rows. by_col mode is no longer needed for metadata-only tables because these tables now have a single empty row so that they also work in row-based mode.
13978	07/11/2014 08:06 AM	Aaron Marcuse-Kubitza	fix: bin/import_all: hidden srcs: use with_all for this to avoid needing to list every source, and to display the backgrounded command with the variables substituted
13977	07/11/2014 07:40 AM	Aaron Marcuse-Kubitza	bin/import_all: TNRS, geoscrub: integrated into the list of metadata sources
13976	07/11/2014 07:39 AM	Aaron Marcuse-Kubitza	bin/import_all: TNRS, geoscrub: use import rather than publish because the non-imported tables have now been excluded
13974	07/10/2014 07:25 PM	Aaron Marcuse-Kubitza	fix: bin/import_all: updated for new metadata datasource names (see issue #940)
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11839	12/05/2013 08:37 AM	Aaron Marcuse-Kubitza	bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema
11823	12/04/2013 07:26 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema
11430	10/24/2013 04:03 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: restore the working dir when main() is done, in case it started as something other than the root dir
11422	10/24/2013 01:10 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: fix $ `when .-included without args (which causes bash to put the wrong values in $` instead of leaving it empty)
11421	10/24/2013 01:09 PM	Aaron Marcuse-Kubitza	bin/import_all: `make schemas/$version/install`: reinstall instead to allow re-running the import to the same custom schema (e.g. 2013-10-18.Brian_Enquist.Canadensys)
11420	10/24/2013 01:07 PM	Aaron Marcuse-Kubitza	bin/import_all: `make schemas/$version/install`: ignore errors if schema exists, to support running with -e
11419	10/23/2013 11:10 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: removing inputs/.TNRS/tnrs/tnrs.make.lock: use `"rm" -f` instead of plain "rm" to avoid having an error exit status, which will abort the script if run with the -e flag (as runscripts are)
11416	10/23/2013 10:34 PM	Aaron Marcuse-Kubitza	bin/_all: _main(): renamed to just main() because it does not matter that other shell-includes' main() methods will clobber this, because it is only executed once
11415	10/23/2013 10:29 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: Source tables: use .../import instead of import_temp because import_temp is only needed when importing all tables, to prevent the temp suffix from being removed yet
11393	10/20/2013 05:21 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: need to publish datasources that won't be published by `make .../import`, so that the per-datasource import XPaths that refer to TNRS/geoscrub will link up with the TNRS/geoscrub source entry instead of creating a new entry without the metadata (because the entry with the metadata was named TNRS.new/geoscrub.new)
11390	10/20/2013 04:55 PM	Aaron Marcuse-Kubitza	bin/import_all: removed no longer needed import of geoscrub data, because analytical_stem_view is now joined to the geoscrub_output table directly, instead of using the imported canon_place entries
11374	10/19/2013 06:56 PM	Aaron Marcuse-Kubitza	bin/with_all: $all: renamed to $hidden_srcs for clarity, since this now just adds the hidden (.*) datasources, rather than always using all datasources
11371	10/19/2013 02:15 PM	Aaron Marcuse-Kubitza	bin/import_all: usage: documented that this can now be run with a custom datasources list (each of the form inputs/src/)
11286	10/17/2013 04:44 PM	Aaron Marcuse-Kubitza	bin/import_all: use just import_scrub, not reimport_scrub, because import_scrub now automatically publishes the datasource's import (i.e. removes the temp suffix)
10871	09/05/2013 12:11 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: use reimport_scrub instead of import_scrub so that the temp suffix of the datasource name is removed
10849	08/31/2013 07:44 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: `rm inputs/.TNRS/tnrs/tnrs.make.lock`: need to use `"rm"` instead of `rm` so that we don't use any rm alias the user might have in their shell (import_all is run in the calling shell so that the jobs are owned by the calling shell)
10847	08/31/2013 07:27 PM	Aaron Marcuse-Kubitza	bin/import_all: added step to remove any leftover TNRS lockfile (previously done manually)
10586	08/03/2013 09:14 PM	Aaron Marcuse-Kubitza	bin/import_all: use new bin/after_import
10580	08/03/2013 12:25 AM	Aaron Marcuse-Kubitza	bin/import_all: with_all import_scrub: documented that this step uses $by_col, so that users know to include by_col=1 when running this step separately
10579	08/03/2013 12:24 AM	Aaron Marcuse-Kubitza	bin/import_all: use column-based import (by_col=1) by default, instead of requiring the user to explicitly specify it. instead turn it off explicitly (by_col=) for row-based import.
10576	08/02/2013 11:55 PM	Aaron Marcuse-Kubitza	bin/import_all: don't set $dump_opts until running the backup command that uses it, so that the user can run this backup command separately just by copying the line out of the script (without worrying about env vars that need to be set, other than $version which is visible outside the script)
7618	02/20/2013 08:58 AM	Aaron Marcuse-Kubitza	Moved wait on tnrs.make lock from import_all to make_analytical_db, so that running make_analytical_db for a one-time import also waits on the lock
7419	02/02/2013 11:28 AM	Aaron Marcuse-Kubitza	import_all: after_import(): Added wait on tnrs.make's lockfile to ensure that all background scrubbing processes are complete before creating the analytical DB
7418	02/02/2013 11:18 AM	Aaron Marcuse-Kubitza	import_all: Moved `waitpid $jobs` into after_import()
7276	01/18/2013 03:25 AM	Aaron Marcuse-Kubitza	import_all: Output the PIDs of the import_scrub and after_import processes, so those processes can be managed without shell job control. This is useful if the connection is lost to the remote shell running the import, which prevents using job control on the import processes.
7267	01/16/2013 02:51 PM	Aaron Marcuse-Kubitza	import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake
7245	01/16/2013 07:56 AM	Aaron Marcuse-Kubitza	input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev
7228	01/15/2013 10:42 PM	Aaron Marcuse-Kubitza	import_all: Run disown_all after background processes have been created, so that they will not be aborted if the shell exits (e.g. due to a broken connection). Note that with_all processes are automatically disowned as they are created, but other processes, such as after_import, were not.
7163	01/11/2013 02:07 AM	Aaron Marcuse-Kubitza	import_all: Removed no longer needed TNRS import, which has been replaced by scrub.make (which adds TNRS taxondeterminations after the import instead of creating taxonlabel links before it)
7132	01/09/2013 09:13 AM	Aaron Marcuse-Kubitza	inputs/.TNRS/: Changed tnrs+accepted to a view (defined in schema.sql) so accepted names would automatically be populated as they are parsed by TNRS, rather than needing to run `make inputs/.TNRS/tnrs+accepted/reinstall` to populate them
7127	01/09/2013 02:23 AM	Aaron Marcuse-Kubitza	import_all: Reinstall tnrs+accepted, for eventual use by unscrubbed_taxondetermination_view
7125	01/09/2013 02:02 AM	Aaron Marcuse-Kubitza	import_all: Directly import just the TNRS tables that should be imported, because some TNRS tables are included in import_order.txt so that they are part of the automated testing, but should not be imported at the same time as tnrs_accepted/tnrs_other
7121	01/08/2013 10:19 PM	Aaron Marcuse-Kubitza	import_all: Made temporary vars local, so they wouldn't affect the calling shell
7103	01/07/2013 06:39 PM	Aaron Marcuse-Kubitza	import_all: Make $dump_opts, $public_import local vars, so they will be automatically unset if the script is aborted
7095	01/07/2013 05:00 PM	Aaron Marcuse-Kubitza	import_all: Make $import_source a local var, so it will be automatically unset if the script is aborted
7089	01/07/2013 04:10 PM	Aaron Marcuse-Kubitza	import_all: Added command to add scrubbed taxondeterminations
7087	01/07/2013 04:08 PM	Aaron Marcuse-Kubitza	import_all: Start tnrs-remake after starting the inputs, so that for subset imports (e.g. n=2), there will already be names to scrub when tnrs-remake starts up and it won't enter pause mode to wait for new rows (the pause is calibrated for full imports, and is too long for subset imports)
7048	01/04/2013 05:25 PM	Aaron Marcuse-Kubitza	import_all: Run import with $public_import set in order to exclude excluded datasources
7038	01/03/2013 02:31 AM	Aaron Marcuse-Kubitza	import_all: `make backups/vegbien.$version.backup/test`: Documented that this uses $dump_opts. $dump_opts must be manually set when running this command outside of import_all.
7023	12/21/2012 03:34 PM	Aaron Marcuse-Kubitza	import_all: Allow caller to override $dump_opts
7022	12/21/2012 03:33 PM	Aaron Marcuse-Kubitza	pg_dump_vegbien: Renamed $opts env var to $dump_opts to avoid conflicting with other commands' vars of the same name
6981	12/20/2012 10:45 AM	Aaron Marcuse-Kubitza	make_analytical_db: Automatically call export_analytical_db when finished
6977	12/20/2012 10:09 AM	Aaron Marcuse-Kubitza	import_all: after_import(): Added `make backups/vegbien.$version.backup/test`
6960	12/19/2012 01:49 PM	Aaron Marcuse-Kubitza	import_all: after_import(): Added `make backups/TNRS.backup-remake`
6958	12/19/2012 01:42 PM	Aaron Marcuse-Kubitza	import_all: after_import(): Added export_analytical_db
6946	12/19/2012 12:30 PM	Aaron Marcuse-Kubitza	import_all: Run the import directly into a new, already-versioned public schema. This removes the need to manually rename the schema after import, and allows the backup commands to use the stored $version shell variable to refer to the last import.
6897	12/18/2012 09:41 PM	Aaron Marcuse-Kubitza	import_all: Run all imports (not just the main datasources' import) with $import_source turned off, so that the Source tables will not be imported a second time when the datasource's main tables are imported. Note that it's not necessary to wait for asynchronous commands after the jobs for the main import are started (so that $import_source is not unset until after they are started), because with_all does not return until all jobs are started and have noted the $import_source setting in effect in the shell environment.
6896	12/18/2012 09:32 PM	Aaron Marcuse-Kubitza	import_all: Source tables import: Fixed bug where need to use $all option to with_all to also include special datasources starting with "."
6594	12/04/2012 09:52 PM	Aaron Marcuse-Kubitza	import_all: Fixed bug where need to wait for all asynchronous commands started before the main import, not just the first
6593	12/04/2012 09:51 PM	Aaron Marcuse-Kubitza	import_all: Import all Source tables before the herbaria list, so that any custom metadata will override the info in the herbaria list
6382	11/24/2012 03:33 AM	Aaron Marcuse-Kubitza	import_all: Added import of inputs/.herbaria/ before the main import
6211	11/15/2012 07:45 PM	Aaron Marcuse-Kubitza	import_all: Change to main directory make targets are run from. Use relative paths to bin/ commands, which is possible now that the current dir is set.
6210	11/15/2012 07:41 PM	Aaron Marcuse-Kubitza	import_all: Create a background process that waits until the import is done and then runs make_analytical_db
6208	11/15/2012 06:52 PM	Aaron Marcuse-Kubitza	import_all: Documented that `wait %1` waits for asynchronous commands
5959	11/01/2012 10:52 AM	Aaron Marcuse-Kubitza	import_all: After starting geoscrub import in the background, wait for make commands to scroll by before starting NCBI import
5957	11/01/2012 10:22 AM	Aaron Marcuse-Kubitza	import_all: Removed explicit by_col=1 from datasources that don't require it for proper import. (It will still be set if the user provides it on the command line.)
5944	11/01/2012 09:01 AM	Aaron Marcuse-Kubitza	import_all: Added geoscrub import, which can happen concurrently with NCBI/TNRS but must come before the main datasources for the matched places to link up properly
5943	11/01/2012 08:59 AM	Aaron Marcuse-Kubitza	import_all: Documented that TNRS import must come after NCBI for cross links to be made
5917	11/01/2012 05:15 AM	Aaron Marcuse-Kubitza	Calls to `make inputs/.TNRS/cleanup`: Do `make inputs/.TNRS/tnrs_accepted/reinstall; make inputs/.TNRS/tnrs_other/reinstall` instead to use new split TNRS tables
5836	10/30/2012 03:29 AM	Aaron Marcuse-Kubitza	import_all: Pass command-line args (such as make vars) to all commands, not just with_all, so that a custom public schema is properly used by all commands
5503	10/15/2012 08:22 AM	Aaron Marcuse-Kubitza	import_all: Also import the NCBI tree of life, before the TNRS names
5318	10/08/2012 09:58 PM	Aaron Marcuse-Kubitza	import_all: Added commands to import TNRS names so the user doesn't have to do this manually
5214	10/03/2012 01:11 PM	Aaron Marcuse-Kubitza	tnrs_db: Made wait option default to off to facilitate running tnrs_db by itself, rather than as part of an import
5206	10/03/2012 08:57 AM	Aaron Marcuse-Kubitza	README.TXT: Data import: import_all: Don't run with & because this prevents the created jobs from being owned by the calling shell. Instead, import the TNRS names as a separate backgrounded step and wait for it to finish before starting import_all. Removed TNRS import steps from import_all since these are now invoked separately.
5172	10/02/2012 10:35 PM	Aaron Marcuse-Kubitza	import_all: Use new dedicated cleanup make target to clean up TNRS.tnrs
5111	09/28/2012 11:42 AM	Aaron Marcuse-Kubitza	import_all: Clean up any new TNRS.tnrs entries before importing the TNRS data
5081	09/27/2012 11:28 AM	Aaron Marcuse-Kubitza	import_all: Start the tnrs daemon using `make inputs/.TNRS/tnrs/tnrs-remake &`
5055	09/27/2012 07:10 AM	Aaron Marcuse-Kubitza	import_all: Added import of .TNRS datasource, which happens synchronously before other datasources are imported
5039	09/27/2012 03:37 AM	Aaron Marcuse-Kubitza	import_all: Pass any args, such as vars, through to with_all
1953	04/23/2012 07:00 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script inside another script would make the script think it was being run as a program, and abort with a usage error
1952	04/23/2012 06:56 PM	Aaron Marcuse-Kubitza	Scripts that are meant to be run in the calling shell: Fixed bug where running the script as a program (without initial ".") wouldn't be able to call return in something that was not a function. Converted all code to a <script_name>_main method so that return would work properly again. Converted all variables to local variables.
1948	04/23/2012 05:36 PM	Aaron Marcuse-Kubitza	import_all: Use new with_all. Use ${BASH_SOURCE⁰} for $self and $self for $0.
1551	03/22/2012 05:33 PM	Aaron Marcuse-Kubitza	import_all: Print Usage message if was run without initial "."
1550	03/22/2012 04:52 PM	Aaron Marcuse-Kubitza	Renamed import-all to import_all to match convention of using underscores
1547	03/22/2012 04:33 PM	Aaron Marcuse-Kubitza	import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'
1546	03/20/2012 11:13 PM	Aaron Marcuse-Kubitza	import-all: disown each new import process to ignore SIGHUP
1541	03/20/2012 10:38 PM	Aaron Marcuse-Kubitza	Added import-all to import all inputs at once

Project

General

Profile