schemas/vegbien.sql: stemobservation: Renamed stemobservation_unique_accessioncode to stemobservation_unique_within_plantobservation and also apply it to NULL sourceaccessioncodes, so that a plantobservation can have a single stemobservation for its single stem's traits without needing a separate sourceaccessioncode for it
schemas/vegbien.sql: aggregateoccurrence: Removed redundant aggregateoccurrence_unique_accessioncode unique constraint, which has been replaced by aggregateoccurrence_unique_within_taxonoccurrence
schemas/vegbien.sql: plantnamescope: Added CHECK constraint to ensure that at least one key column is specified (an empty plantnamescope doesn't make sense; use NULL instead)
schemas/vegbien.ERD.mwb: Synced with schema
ch_root: Don't require both the input and output mappings to contain their respective new roots, since sometimes only one or the other root is being subset. This will occur, for example, in mappings that are flat on the input but normalized on the output, such as VegCSV.
VegBIEN: Reversed aggregateoccurrence<->plantobservation relationship to point from plantobservation->aggregateoccurrence, so plantobservation could be scoped by aggregateoccurrence in the same way as all other core tables are scoped by their parent tables. This reversed direction was an anomaly due to the need to have a trigger auto-set aggregateoccurrence.count to 1 when there was an associated plantobservation. This was most easily accomplished on the aggregateoccurrence table itself, but required the reversed relationship. The trigger has now been reimplemented on plantobservation, which externally updates aggregateoccurrence.count.
input.Makefile: Testing: diffing test outputs: Ignore changes in whitespace, due to e.g. different indent levels. This facilitates accepting tests when an element has been nested inside another element (or unnested), by showing only the opening and closing tags of the new outer element.
dicts.py: DictProxy: Fixed bug where default value for inner param needed to be created in the constructor, or else every default instance would use and modify the same dictionary
db_xml.py: put(): wrap_e(): Call augment_error() to add the current node to the error message
db_xml.py: put(): Raise an error if there are multiple fields with the same name, instead of silently overwriting the first with the second. This generally indicates the need to use `:[@merge=1]` on the fields in question.
dicts.py: Added OnceOnlyDict and helper exception KeyExistsError
dicts.py: DictProxy: Added default value for inner param to facilitate creating empty wrapped dicts
bin/map: out_is_db: row-based mode: Debug-log the processed XML tree produced by xml_func.process()
sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
schemas/vegbien.sql: aggregateoccurrence: Added UNIQUE INDEX that makes an aggregateoccurrence unique within a taxonoccurrence. When the sourceaccessioncode isn't specified (as for individual organisms data, where this goes in plantobservation and taxonoccurrence), this ensures a 1:1 relationship between aggregateoccurrence and taxonoccurrence.
schemas/vegbien.sql: taxonoccurrence: Added UNIQUE INDEX that makes a taxonoccurrence unique within a locationevent. When the sourceaccessioncode isn't specified (as for specimens data), this ensures a 1:1 relationship between taxonoccurrence and locationevent.
mappings/VegX-VegBIEN.stems.csv: binomial (full) plantname: Also mapped to an alternative for taxonoccurrence.sourceaccessioncode, for aggregate plots data that distinguishes taxonoccurrences only by plantname (such as CVS)
exc.py: e_msg(): Fixed bug where exceptions with nothing in e.args (such as StopIteration) caused a failed assertion. Fixed bug where exceptions with multiple values in e.args (such as certain IOErrors) caused a failed assertion.
sql.py: flatten(): Documented that shouldn't cache query because the temp table will usually be truncated after use
sql_gen.py: merge_not_null(): For clarity, use to_text() to represent NULL as the string 'NULL' instead of as the null sentinel for the column's type
sql_gen.py: Added to_text() and helper value null_as_str
mappings/VegX-VegBIEN.stems.csv: plantobservation: sourceaccessioncode, authorplantcode: Removed no longer needed mapping to specimenreplicate.sourceaccessioncode, since specimenreplicate for plots data is now identified by its plantobservation fkey, without needing its own sourceaccessioncode
sql_io.py: put_table(): ignore_cond(): Fixed bug where if is_literals, need to return NULL, instead of trying to filter invalid rows out of a nonexistant input table
mappings/VegX-VegBIEN.stems.csv: Replaced "/}" (with unnecessary "/") with "}"
mappings/VegX-VegBIEN.stems.csv: Replaced doubled "/"s with single "/"
backups/Makefile: Added synchronization of backups with vegbiendev. Added downloading backups to After a new import steps.
lib/common.Makefile: rsync: $(remote): Fixed bug where the inputs/ dir was hardcoded, when the remote dir name needed to be determined dynamically based on the Makefile dir
backups/Makefile: Refactored to include lib/common.Makefile
inputs/Makefile: Added download-logs to download import logs onto local machine and added it to the "After a new import" steps
Moved generally useful targets and vars from inputs/Makefile to lib/common.Makefile and lib/forwarding.Makefile
bin/map: Don't create unneeded /_ignore/inLabel element containing the datasource name because sql_io.put_table() now autopopulates the datasource_id
schemas/functions.sql, py_functions.sql: Removed no longer needed relational functions, since sql_io.put_table() supports regular SQL functions
inputs/Madidi/maps/VegX.plots.csv: Mapped all mappable columns
mappings/VegX-VegBIEN.stems.csv: elevation, elevationrange: Added _rangeStart/_rangeEnd filter
sql_io.py: Wrapping mapping in a sql_gen.ColDict: Documented that sql_gen.ColDict sanitizes both keys and values passed into it
sql_gen.py: ColDict: Documented that anything that isn't a column is wrapped in a NamedCol
README.TXT: Datasource setup: Accepting the test cases: Added instructions for what to do if you get errors
bin/map: Fixed bug where needed to use sql.function_exists() to determine if something is a relational (now SQL) function, including in row-based mode, since that now uses sql_io.put_table(), which requires this. The bug fix relies on the new xml_func.process() feature that preserves unknown relational functions in case they are built-in functions rather than SQL functions.
xml_func.py: process(): In row-based mode, when trying to evaluate function using DB, preserve unknown funcs because these might be built-in functions of db_xml.put(). The sql.DoesNotExistException should be raised again when db_xml.put() is run and it verifies whether the function is built-in or not (e.g. _simplifyPath is now built-in, for column-based support). See db_xml.put_special_funcs for built-in functions.
db_xml.py: put(): Fixed bug where strings starting with "$" were interpreted as input columns in row-based mode (this should only apply to column-based mode). Explicitly store whether in row-based mode in is_literals var (similar to is_literals in sql_io.put_table()).
sql_io.py: put_table(): unrecoverable errors: Returning default value: is_literals: Remove column rename from default value so it doesn't get treated as a column by db_xml.put() (which is handled differently from a literal value)
db_xml.py: put(): put_(): Removed no longer needed in_row_ct_ref param, which is only used by put_table(). Rewrapped function body.
sql_io.py: put_table(): ignore(): literals: Only replace invalid literal with NULL or remove row if that column actually contains the invalid value in question. This handles the case where all columns are being ignore()d because the specific column couldn't be identified, and this was not the invalid column.
mappings/VegX-VegBIEN.stems.csv: plot: Mapped note
mappings/VegX-VegBIEN.stems.csv: plot: Added landform mapping
schemas/vegbank.ERD.pdf: Auto-repaired with Adobe Reader so that the repair message doesn't pop up whenever it's opened
schemas: Added vegbank.ERD.pdf so the VegBank ERD is easily accessible when mapping
mappings/VegX-VegBIEN.stems.csv: project: Mapped sourceaccessioncode. This entailed adding a distinguishing suffix to the projectname input mapping.
mappings/DwC2-VegBIEN.specimens.csv, VegX-VegBIEN.stems.csv: Removed all manual mappings to datasource_id now that datasource_id is auto-populated, both on the VegBIEN output side and the DwC/VegX input side. This should greatly simplify many of the mappings!
db_xml.py: put(): Don't suppress exceptions thrown by sql_io.put_table() by passing them to on_error(), because some exceptions indicate unrecoverable database connection problems such as a broken connection, which should abort the import
db_xml.py: put(): Support datasets with no rows, where root.firstChild == None. Documented that to use an entire XML document, you need to pass root.firstChild rather than root.
inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes CVS.
README.TXT: Documented that the PostgreSQL server should be restarted after installing system updates that may affect it, to avoid spurious errors that crash the import but go away upon reimport
Regenerated vegbien.ERD exports
schemas/vegbien.ERD.mwb: Fixed lines
bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
schemas/vegbien.sql: *_unique_datasource UNIQUE INDEXes: Removed COALESCE from datasource_id and datasource_id IS NOT NULL filter, because datasource_id is now always NOT NULL
schemas/filter_ERD.csv: Removed AUTO_INCREMENT because that is not added to any other tables
Regenerated schemas/vegbien.my.sql
schemas/vegbien.sql: specimenreplicate: Inherit datasource_id from taxonoccurrence instead of defining it independently
xml_func.py: Removed no longer needed local XML functions that have been translated to SQL functions
input.Makefile: Testing: Removed VegBIEN.%.xml test because the import.%.xml test output includes the template tree that it's inserting, so there is no need to generate the XML tree in a separate test. This will also remove the need to maintain local XML functions that have already been translated to DB functions for the sole purpose of this automated test.
schemas/vegbien.sql: Made datasource_id required on every table that has it, to trigger the automatic population of it by sql_io.put_table()'s col_defaults
Moved importing of col_defaults from db_xml.put_table() to bin/map, so that it also happens in row-based mode. Note that this causes a DB entry for the datasource to always be created, even if the datasource has no mappings or no rows.
Use new exc.reraise() where exc.raise_() was used, so that the stack trace is preserved when the exception is rethrown
exc.py: reraise(): Take optional exception argument so it can be invoked in the same way as raise_(). Interestingly, this missing parameter does not produce the usual "...() takes no arguments (1 given)" error when the function is called inside an except block.
exc.py: Added reraise()
db_xml.py: put(): Inserting node: Wrap sql_io.put_table() call in catch-all exception handler that calls on_error_() (wrapper for error handler provided by caller) and returns None. This both adds additional debugging info to the exception (in on_error_()) and allows recovery from arbitrary exceptions that happen in sql_io.put_table(), so that an exception does not abort the import.
exc.py: get_e_tracebacks_str(): Use the current system traceback if the exception doesn't contain its own traceback(s)
schemas/vegbien.sql: specimenreplicate: Added locationevent fkey, since fkeys are not inherited from parent tables
schemas/vegbien.sql: Added datasource_id fkey constraints to all tables that needed it
bin/map: out_is_db: Use col_defaults in row-based mode as well
db_xml.py: Renamed put_table_special_funcs to put_special_funcs because it is now used by put() as well
db_xml.py: Moved put() before the functions that use it
db_xml.py: Renamed _put_table_part() to put(), replacing the existing put() whose functionality it now performs
db_xml.py: _put_table_part(): Reordered params to match put(), so that it can eventually be substituted for it
db_xml.py: _put_table_part(): Allow being invoked directly by adding defaults for parameters
db_xml.py: put(): Use _put_table_part(). This will ensure that all the put-related functionality is in one place, rather than duplicated.
db_xml.py: _put_table_part(): Append the node to errors handled with on_error()
sql_io.py: Added own SyntaxError class to replace built-in SyntaxError because it stringifies to only the first line
input.Makefile: Testing: Removed $(via).%.xml tests because they require the via format (DwC/VegX) to be XML, but we want to flatten VegX into a DwC-like set of CSV column names
Removed inputs/NY/test/VegX.specimens.xml.ref because NY is not mapped via VegX
input.Makefile: Testing: Renamed import.*.out tests to end in .xml because they now contain XML import trees for validation, and this extension turns on XML syntax highlighting in a text editor
bin/map: out_is_db: Output the put template to stdout so it will be validated in the automated testing
xml_func.py: process(): If local XML function can't be found, just replace with last param instead of returning an error. This allows DB-only functions to be ignored in XML output mode.
sql_gen.py: ColDict.__setitem__(): Fixed bug where None value should not be replaced with column default value if column has no underlying table
sql.py: DbConn.col_info(): If column does not exist, raise sql_gen.NoUnderlyingTableException
sql_io.py: put_table(): In log messages, use `.to_str(db)` instead of repr() where possible to use the SQL syntax of the DB driver
sql_io.py: put_table(): ignore(): Replacing invalid value with NULL in nullable column: Corrected log message to "Replacing invalid value ... with NULL in column ..." because the rows with that value are not ignored in that case
sql.py: run_query(): InvalidValueException: Parse any exception ending in "out of range", not just "field value out of range", in order to support errors that the timezone is out of range
schemas/py_functions.sql: _dateRange*(): Made functions STRICT because they return NULL on NULL input
sql_io.py: put(): Use a simple case of put_table(), which now supports everything put() needs. This will enable all row-based and column-based processing to be maintained in the same function, put_table(), and avoids the need to reimplement any column-based functionality (like SQL functions) in put().
xml_dom.py: NodeTextEntryIter: Allow empty values through as None, and instead filter them out in TextEntryOnlyIter using new helper function non_empty(). This allows XML functions to decide for themselves whether empty values should be filtered out, because process() will now no longer automatically remove them. This will enable process() to work with SQL functions, which must not have empty values filtered out because this will remove required, but nullable, arguments.
xml_func.py: Use conv_items() in every XML function that needs empty (NULL) entries removed, so that they are not dependent on what process() does to the items
sql_io.py: put_table(): ignore(): Support invalid literals in addition to invalid column values. This also allows put_table() to fully support being called by put().
xml_func.py: process(): In row-based mode, if function is not explicitly a relational function but does not exist as a local XML function, treat it as a relational function. This will help in merging sql_io.put() and put_table(), since put() did not support SQL functions but put_table() does, and this ensures that a SQL function is always used if the local XML function has been removed in favor of it.
sql_io.py: put_table(): Removed into param to set a custom into table name because put_table() now has all the info it needs to generate this name automatically, and callers are no longer providing it