To Do¶
- Table of contents
- To Do
Issue tracker¶
Meetings¶
- 2010-11-09 meeting
- 2011-10-13 conference call
- 2011-10-24 to 28 working group
- 2011 working group Fr BIEN Implementation
- 2011 working group Fr Summary
- 2011 working group Mo BIEN workflow
- 2011 working group Mo technical challenges
- 2011 working group Th BIEN Components
- 2011 working group Th BIEN Implementation
- 2011 working group Th Summary of subgroups
- 2011 working group Th Use cases
- 2011 working group Th VegBank conference call
- 2011 working group Tu BIEN database
- 2011 working group Tu iPToL-BIEN Phylogenetics
- 2011 working group We BIEN tools
- 2011 working group We Use cases
- 2011-12-01 conference call
- 2011-12-08 conference call
- 2011-12-15 conference call
- 2012-01-05 conference call
- 2012-01-11 NCEAS meeting
- 2012-01-12 conference call
- 2012-01-19 conference call
- 2012-02-03 conference call
- 2012-02-10 conference call
- 2012-02-17 conference call
- 2012-02-24 conference call
- 2012-03-02 conference call
- 2012-03-09 conference call
- 2012-03-16 conference call
- 2012-03-23 conference call
- 2012-04-02 conference call on VegX modifications
- 2012-04-09 conference call
- 2012-04-20 conference call
- 2012-04-27 conference call
- 2012-05-04 conference call
- 2012-06-01 conference call
- 2012-07-26 conference call
- 2012-08-03 conference call
- 2012-08-17 conference call
- 2012-08-24 conference call
- 2012-08-30 small VegCSV conference call
- 2012-09-07 conference call
- 2012-09-13 VegCSV conference call
- 2012-09-24 conference call
- 2012-10-03 conference call
- 2012-10-19 conference call
- 2012-11-02 conference call
- 2012-11-09 conference call
- 2012-11-14 conference call on data provider metadata
- 2012-11-16 conference call
- 2012-11-26 to 30 working group
- 2012-12-07 conference call
- 2012-12-14 conference call
- 2013-01-04 conference call
- 2013-01-11 conference call
- 2013-01-18 conference call
- 2013-01-24 conference call
- 2013-01-31 conference call
- 2013-02-07 conference call
- 2013-02-14 conference call
- 2013-02-21 conference call
- 2013-02-28 conference call
- 2013-03-06 conference call with Brad
- 2013-03-07 conference call
- 2013-03-14 conference call (canceled)
- 2013-03-21 conference call
- 2013-03-28 conference call
- 2013-04-04 conference call (canceled)
- 2013-04-11 conference call (canceled)
- 2013-04-19 conference call
- 2013-04-24 conference call
- 2013-05-02 conference call
- 2013-05-09 conference call
- 2013-05-16 conference call
- 2013-05-24 conference call
- 2013-05-30 conference call
- 2013-06-06 conference call
- 2013-06-13 conference call
- 2013-06-20 conference call
- 2013-06-27 conference call
- 2013-07-02 conference call
- 2013-07-03 separate conference call
- 2013-07-11 conference call
- 2013-07-19 conference call (canceled)
- 2013-07-25 conference call
- 2013-08-01 conference call
- 2013-08-08 conference call (canceled)
- 2013-08-16 conference call (canceled)
- 2013-08-22 conference call
- 2013-08-29 conference call strategy discussion
- 2013-09-05 conference call
- 2013-09-12 conference call
- 2013-09-19 conference call
- 2013-09-19 to 10-17 conference calls (summary)
- 2013-09-26 conference call
- 2013-10-03 conference call
- 2013-10-10 conference call
- 2013-10-17 conference call
- 2013-10-25 conference call
- 2013-10-31 conference call
- 2013-11-07 conference call
- 2013-11-14 conference call
- 2013-11-21 conference call
- 2013-11-28 conference call (canceled--holiday)
- 2013-12-05 conference call
- 2013-12-12 conference call
- 2013-12-17 planning conference call
- 2013-12-19 conference call
- 2013-12-26 conference call (canceled--holiday)
- 2014-01-02 conference call (canceled--holiday)
- 2014-01-09 conference call
- 2014-01-13 planning conference call
- 2014-01-16 conference call
- 2014-01-23 conference call
- 2014-01-30 conference call
- 2014-02-06 conference call
- 2014-02-13 conference call
- 2014-02-20 conference call
- 2014-02-24 working group
- 2014-02-27 conference call
- 2014-03-06 conference call
- 2014-03-13 conference call
- 2014-03-18 schema changes conference call with Brad
- 2014-03-20 conference call (canceled)
- 2014-03-27 conference call
- 2014-04-03 conference call
- 2014-04-10 conference call
- 2014-04-17 conference call
- 2014-04-23 conference call (canceled)
- 2014-05-01 conference call
- 2014-05-08 conference call
- 2014-05-15 conference call
- 2014-05-22 conference call (canceled)
- 2014-05-29 conference call
- 2014-06-05 conference call
- 2014-06-06 separate conference call on data dictionary
- 2014-06-12 conference call on data dictionary
- 2014-06-19 conference call
- 2014-06-26 conference call
- 2014-07-03 conference call
- 2014-07-10 conference call
- 2014-07-17 conference call
- 2014-07-24 conference call (canceled)
- 2014-07-31 conference call (canceled)
- 2014-08-07 conference call
- 2014-08-14 conference calls
- 2014-08-21 conference call
- 2014-08-28 conference call
- 2014-09-04 conference call
- 2014-09-11 conference call
- 2014-09-18 conference call (canceled)
- 2014-09-25 conference call (canceled)
- 2014-10-03 conference call on CVS issues
- 2014-10-09 conference call (canceled)
- 2014-10-16 conference call
- 2014-10-24 conference call on sPlot
- 2014-10-30 conference call
VegBIEN schema¶
- scope specimenreplicate by collectionnumber when no catalognumber present
- individualCount should be 1 for specimens
- taxondetermination: Add constraint trigger to make sure exactly one (not zero) taxondeterminations per taxonoccurrence is always marked current
- {commname,commstatus}.source_id should be scoping
- store verbatim date
- form scientificNameWithMorphospecies differently for specimens and plots
- use scientificName for specimens
- remove no longer used centerlatitude/centerlongitude? the lat/long go in locationdetermination
- Change taxonrank's forma value to form to match TCS?
- move plantobservation scoping fields to taxonoccurrence, because these tables are 1:1
- support raw location name in its own field, distinct from locationNarrative
- partial indexes should be full where possible, so that they can be used to query the database
- specimenreplicate: require catalognumber_dwc in check constraint, even if plantobservation_id provided
- first need to ensure plots data doesn't use any specimenreplicate fields except for that
- Store times as a binary times in VegBIEN
- add locationdetermination notes on how lat/long converted from input data, if any
- Move plantobservation.stemcount to aggregateoccurrence, for cases where number of stems is known, but not which stems go to each individual
- Normalize fields ending in numbers
- e.g.
growthFormType
- e.g.
- add
specimensand traits capability - compare to CTFS
remove subproviders from provider_count who don't have any rows in VegBIENmake taxonoccurrence.locationevent_id NOT NULL: Instead, nullable only when sourceaccessioncode is specified- requires running all the tables' automated tests in one transaction (or in commit mode), so that the existing parent tables can be connected to
make taxonoccurrence.locationevent_id nullable only when sourceaccessioncode is specified- but better to require a locationevent, and look up the taxonoccurrence by its sourceaccessioncode if the locationevent_id is NULL
(Note that this will not trigger a DuplicateKeyException, because the NullValueException will be triggered first, so the existing import process can't yet do this.)
- but better to require a locationevent, and look up the taxonoccurrence by its sourceaccessioncode if the locationevent_id is NULL
store full name of person instead of/in addition to parsed first/last namefuzzing from access levelmake different hierarchical levels for DwC taxonRank and infraspecificEpithetadd project.parentProject_id?
denormalized VegCore¶
- canon: support two terms having the same simplified form, which will be disambiguated using ? like in redmine_synonyms' output
- mark terms sourced from VegX
- *slopeAspect, etc.: add units
- add native
VegCSV¶
- reorganize VegCSV vs. VegX into a table with two columns
VegPath¶
- web/main/: Handle symlinked dirs in .htaccess files that contain self-referential paths, e.g. VegBIEN/.htaccess > don't redirect subdir paths
VegX schema¶
See VegX schema
Mappings¶
- for TNRS, map the Unmatched_terms (morphospeciesSuffix) to NULL if a Specific_epithet_matched was provided
populate all datasources' import_order.txt- adding a subdir auto-adds it to import_order.txt
- map ND to NULL (e.g. in REMIB.Specimen.accession_number,
locality) - handle taxonomic names that are actually comments, like "NO SPECIES ON PLOT"
- map TEAM site placename metadata
- map CVS.taxonObservation_ growthForm fields
- validate CVS (after VegBank problems have been fixed)
- Translate ranks to taxonrank enum values
- especially needed for NCBI.higher_taxa.rank:
SELECT DISTINCT rank FROM "NCBI".nodes
- especially needed for NCBI.higher_taxa.rank:
- store whether a source is top-level
- analytical_stem TNRS names: Merge name containing just a family and family field when combining, so family is not duplicated
- this occurs when Name_matched_rank = family
- Set taxonomicStatus on higher taxa
- Place cf/aff in taxonlabel and populate with TNRS.Annotations
- dataGeneralizations is confidentialityStatus
- don't copy collectiondate to locationevent if mapping a specific TaxonOccurrence
- move datasources' custom mappings (along with the comments) to
mappings/VegCore.thesaurus.csv
- migrate mappings so that collectionnumber is used for authorSpecimenCode instead of catalognumber_dwc
- resolve SALVIAS SourceVoucher/coll_number/Ind ambiguity: coll_number should really be recordNumber, but that's currently Ind
- check that SALVIAS SourceVoucher/coll_number is globally unique, since it is being used as such in indirect vouchers
- _eq(): compare values case-insensitively
- this will support SALVIAS DetType
"Indirect"
matching"indirect"
select "DetType", count(*) from "SALVIAS".organisms group by "DetType"
- this will support SALVIAS DetType
- map DwC 1.21 terms to official DwC
- correctly support looking up a plantobservation using just its sourceaccessioncode (not also its aggregateoccurrence_id)
- possibly by making aggregateoccurrence_id nullable when sourceaccessioncode is specified
- the mappings currently work around this by also providing a taxonoccurrence whenever a plantobservation is needed
- import all tables in same public schema, without rolling back after each test, so that stemobservations will link up with existing plantobservations
- handle "day is out of range for month" errors by replacing the day with 15 (mid month)
- need to parse the date into parts first
- remove main maps' mappings comments that only relate to a specific datasource
- map minimumElevationInMeters to elevation/_avg/max, filtered by _rangeEnd
- filter dateCollected->collectiondate mapping with _dateRangeStart?
- is it valid to have a collection date that's a range? do any datasources have this?
- figure out which BIEN2 datasources from viewFullOccurrence.DataSource (SurveyType = 'Specimen') are in VegBIEN
- convert unit suffixes in verbatim fields
- Handle invalid lat/long (99.9, 999.9, etc.) in all core maps (currently just done for value
0
in DwC) - Handle date ranges in all date fields (esp. DwC)
- Unescape
\%
in e.g. ACAD ID16551
- Change the long DwC column name to just the DwC label in the datasource mappings
- Parse time fields into standard format
- Fix eventDate/verbatimEventDate mappings so they correspond to TDWG
- append YMD dates using " " so that if full date is in one field, it will be parsed correctly
- but need to handle empty YMD fields: maybe check for full date in one field as special case
- for examples, see vegbien "ARIZ"."specimens.errors"
- Casting to timestamps: add UTC timezone if no existing timezone
- join together min/max elevation values before splitting them apart so that any range in the min field will automatically be parsed as a range
- constrain all child tables with a default unique index that makes them 1:1 with their immediate parent
- core tables should have this already
- to support col-based _map, add _dict built-in function that puts args into a dict, which becomes a PostgreSQL *hstore*
- make _if a built-in function, which vertically subsets the rows according to the given filter
- would likely require handling then and else in separate _if statements using new XML function _not
- built-in function could just handle passing the parent fkey through to the then element, and then a relational function with a must-be-true check constraint on cond could do the subsetting
- Map fields with no join mapping:
make missing_mappings
- associatedMedia
- associatedSequences
- basisOfRecord
- bibliographicCitation
- coordinatePrecision
- countryCode
- datasetName
- day
- dynamicProperties
- endDayOfYear
- eventRemarks
- eventTime
- geodeticDatum
- georeferenceProtocol
- georeferenceRemarks
- georeferenceSources
- georeferenceVerificationStatus
higherGeography: Datasources with it always also have place names divided out by rank- identificationRemarks
- interpretationType
island: Not usedislandGroup: Not used- language
- lifeStage
- locationRemarks
- modified
- month
- municipality
- occurrenceRemarks
- otherCatalogNumbers
- ownerInstitutionCode
- preparations
- relatedResourceID
- relationshipOfResource
- reproductiveCondition
- rightsHolder
startDayOfYear: Datasources with it always also have month- subgenus
- type
- typeStatus
- verbatimDepth
- verbatimSRS
- year
- Map fields with no input mapping
make missing_mappings
cat unmapped_terms.csv
Convert degree-minutes-seconds to decimal degreescheck that each table has the needed unique index(es), including ones we don't (yet) map toonly needed for ones we map touse date only in datasources that need the extra parsing provided by dateutil: _._date(date)
has been removedmake method unique within the datasource or locationevent instead of globally uniquemap infraspecificEpithet to the field indicated by taxonRank: not applicable because we are using a hierarchical schema for epithets and the analytical DB does not contain infraspecificEpithetMap DwC day (aka julianDay): Datasources with it always also have month
Fixes¶
- add validation to PostgreSQL
util.set_col_names()
to check that the column being renamed is the correct column. this is necessary to prevent errors when themap.csv
columns don't correspond 1:1 to the staging table columns (e.g. in the case of one input column mapping to multiple outputs, or a data refresh causing column names to change). - sql.py run_query(): savepoint-level down before running
parse_exception()
so that you don't getcurrent transaction is aborted, commands ignored until end of transaction block
errors
e.g. happens when runningverbosity=4 make scrub
on the test_taxonomic_names (generated withinputs/test_taxonomic_names/test_scrub
) as of r9756 - support UTF-8Y input files (e.g. MO refresh)
- check whether threatened field is still populated correctly after switch to new TNRS import method
- import_all's after_import() should ensure tnrs.make is continuously unlocked for at least a minute before trying to acquire the lock, to allow other waiting processes to acquire it first
- sql_io.put_table(): each col_default should only be evaluated once, and replaced with its value
- because col_defaults are sometimes copied, the copies would need to be updated, too
- input.Makefile: %/install should set pipefail when teeing output to log so errors cause make to stop
db.col_info()and related functions should use the search_path- all functions that take an errors_table should accept a None value for it
- sql_io.put_table(): ensure input and output columns match up
- use function to do each insert incrementally and return the input pkey along with the output pkey from INSERT RETURNING
- change
Missing mapping for NOT NULL column
warnings to errors- first need to remove empty parent tables in
xml_func.simplify()
so they don't generate this warning
- first need to remove empty parent tables in
- when two paths map to same place, and a node contains two text elements, need user-friendly error to indicate this
- currently, error is
AttributeError: Text instance has no attribute 'tagName'
- this happens if two paths are identical except one has _alt at the end, because _alt will only be autoappended to the one without it
- currently, error is
sql_gen.map_expr()
: Don't replace quoted identifier where it is preceded by double quotes (indicating embedded double quotes)- sql.py: run_query(): Parse error messages' value strings containing embedded quotes
- sql_io.put_table(): ignore(): handle cols that have been wrapped in func calls (casts, etc.)
- When setting the value of a text element, raise an error if that element already contains child elements (and vice versa)
- Fix duplicate elimination for tables that have nullable columns in their unique constraints
SELECT conname, attname FROM pg_constraint JOIN pg_attribute ON attrelid = conrelid AND attnum = ANY (conkey) WHERE conname like '%_unique' and not attnotnull ORDER BY conname, attnum
- sql_io.put_table(): ensure_cond(): Handle case where is_literals is False but some of the columns in the condition are literals, not input columns
Only replace IDs (*ID) with abbrs, so thatplantname
in/*_id/*/plantname
doesn't get abbreviatedset up read-only DB user for people to use to browse the DBadd fki indexes on all fkey source columnsTNRS-scrub the names in taxon_trait_view using the newScrubbedTaxon
viewfix race condition in scrubbing daemons' lockfile algorithm, which frequently allows 2-3 scrub.make instances to process the same set of rows at oncefigure out what causes the: Occurs when one input row matches multiple output rows, due to different imports using different unique indexes of the same tablecould not create unique index ... key is duplicated
errors and whether this is repeatable or random- see
inputs/REMIB/Specimen/logs/2012-09-21-16-37-57.log.sql
,inputs/VegBank/stemcount/logs/2012-09-21-17-56-19.log.sql
- appears to be related to index conditions, where not all rows satisfy the condition
- see
Deal with missing plantnames error in SALVIAS organisms import: see: Hasn't been a problem in awhileplotObservations.PlotObsID = 145483
Features¶
- rename all README.TXT to _README.TXT so they sort at the top of the folder
- change plain-text wiki code blocks to language blocks, now that language blocks no longer display with line numbers
- db_xml.put(): Add runtime _if optimization like for _alt
- recluster tables periodically on pkey to facilitate joins and updates by pkey
- sql_gen.simplify_expr(): Support identifiers with embedded ()
- move _alt optimization that just returns the first arg if it's non-NULL to xml_func.simplify() (after tagging the XML tree with the nullability of each node)
- staging tables and derived temp tables: apply a NOT NULL constraint to every column that will accept it
- tnrs_db: Lock TNRS.tnrs for writing to ensure that no two instances of tnrs_db are performing TNRS requests simultaneously (which would overburden and crash TNRS)
- sql_io.put_table(): Try import first with no rows in input table, so input table only needs to be generated if there are no unrecoverable errors in the zero-row run
- sql_io.put_table(): Support doing lookups of existing records without requiring a DuplicateKeyException, to support cases where one of the duplicate key columns is NOT NULL and not provided in the current hierarchical level
- Remove
id="-1"
from import templates - Add separating line between each datasource in verbose make output
- escape XML tag names
- filename sorting supports negative numbers
- sql_io.py: put_table(): don't generate output pkeys table if the caller doesn't need it
- support NULL in all SQL function params with a default value, and use coalesce() to apply the default value
- _dateRangeStart/_dateRangeEnd autodetect the range and date part separators
- currently, only dates containing
" "
(space) are supported
- currently, only dates containing
- highlight/pretty-print UserWarnings to make them visible like exceptions
- should allow them to be used with
error_stats
- should allow them to be used with
- sql_io.put_table(): Allow col_defaults to contain output table column names, in the same way as default
- join: Add option to print "No input mapping" error even if there is a comment on the mapping
- support CSVs whose quotes are escaped with
"\"
- Print summary stats before exiting if user sends SIGINT, SIGTERM, etc. to map
- Print command to restart import where it left off if user sends SIGINT, SIGTERM, etc. to map
- Restart import where it left off if user sends SIGHUP to map
- join: Support "bare" join column labels without a root, which should be treated as compatible with any root
- Mark autogenerated maps as such with a comment so that the user doesn't accidentally edit them
or don't keep them in version control (but then need to have all make dependencies on the machine where the code is checked out)this helps detect unwanted diffs
- Handle seasons in dates
- Handle unknown characters in dates (fuzzy option to parse()?)
- Don't require a {} XPath expression to be preceded by an element to attach the other_branches to
- Filter log files to allow comparison using diff
- use debug2redmine?
- Compare filtered 2012-08-03 and 2012-08-01 import log files using WinMerge diff to ensure that they do the same thing (with different XML trees)
- Escape names of everything being inserted into the DB from a make target
- This will help prevent SQL injection attacks when VegBIEN becomes public
- Set ON DELETE fkey behavior for nullable fields to SET NULL instead of CASCADE
- Warn if there's an index missing on a column used in a WHERE clause
- need to support indexes on multiple columns
- In XPaths, make / following -> optional
- automate collision elimination of column names in cat_csv
- see README.TXT section: "For every file with an error 'column "..." specified more than once'"
- sql.py: index_pkey(): recover if pkey exists
- Error message:
multiple primary keys for table "<table>" are not allowed
- Error message:
- see if there's a way to get exception detail info in SQLERRM (probably not, but would be useful for errors tables)
- db_xml.put_table(): don't subset table if less than partition_size and getting all rows
- CREATE TABLE AS is fast (<1s), but the subsequent ANALYZE is comparatively slow (8s) (
vegbiendev:/home/bien/inputs/Madidi/import/organisms.2012-07-27-22-54-00.log.sql
) - use EXPLAIN's row count
- CREATE TABLE AS is fast (<1s), but the subsequent ANALYZE is comparatively slow (8s) (
- Garbage collect target records created for a source record to point to, where the source record is never inserted because of an error
- but sometimes, only the target record is needed and the source record just happens to be part of the output mapping
- use param names in info_schema to order params when SQL functions with named params not supported (on old versions of Postgres)
- not all PL/Python exceptions should be translated to data_exception, because some should be handled by the import mechanism (e.g. not-null constraint errors)
- Make
.last_cleanup
targets silently (with -s) - Only run tests on inputs whose maps have changed according to
svn st
- but run on all inputs if the schema has changed
Remove verbose make output when checking whether external files are up to date (especially inmake test
)Parallelize import so it uses all 4 cores (less priority with col-based import, but still useful)using column-based import insteadSplitting sourcelist.name to sourcename.name should also split on ,sql_io.put_table(): use left anti-join to remove existing rows before trying to insert new rows, in order to avoid creating holes in the indexes when the duplicate inserts are rolled backFor derived maps installation, redirect stderr to the install log filemake all map tools (join, etc.) case-insensitive- eliminates the need for case-sensitive/insensitive mappings
Reconnect to database if connection lost: hasn't been an issue in a long time; might have only been a problem for MySQL inputs which are now CSV exports- Would be fixed by handling the error in run_query() and disconnecting
Refactorings¶
- have local machine and vegbiendev back up separately to jupiter, rather than synchronizing via jupiter, which introduces unnecessary complexity in the local machine/vegbiendev synchronization process
- this would avoid the need for many of the .rsync_filter/.rsync_ignore files, and the separate commands for syncing different parts of the directory tree
inputs/input.Makefile $(svnFilesGlob)
: move unversioned files into separate subdirs withsvn:ignore
* , to avoid needing to explicitly add every versioned file by runningmake inputs/<datasrc>/add
- Auto-detect the CSV's NULL value and store this in the CSV dialect, for use by csv.reader
- Make TsvReader use the dialect's NULL value
- Use the CSV dialect's NULL value
- remove dependency on $(bin)/join
- remove no longer used prefixes code
- Make
xml_dom.NodeEntryIter
return anamedtuple
- Use raise instead of raise e where possible to preserve whole stack trace
- aggregate SQL functions: use array param to support arbitrary # of args
- _name is part of this, because it simplifies when it has only one arg
- sql_gen.py: to_str() renamed items (NamedCol, etc.) using hint param that defines whether to include the AS "..." renaming or just the value
- Move all SQL query-generating functions to sql_gen.py
- Change all
val
parameters tovalue
to standardize named parameters - Split util into multiple libs
- sql_io.py put_table(): generate in_tables from the cols in the mapping param
- sql.py mk_select(): use new conditions syntax
- once everything that uses conds uses new syntax, remove:
elif isinstance(conds, dict): conds = conds.items()
- once everything that uses conds uses new syntax, remove:
- In map, factor out WrapIter/ListDict code into a common function
- Move map code that doesn't relate to command line invocation to separate lib file
- Handle parsing and getting of metadata in xpath.py parse() and get()
- For readability, use
::
instead ofCAST()
in PostgreSQL queries (but retainCAST()
usage for other DB engines) - Makefiles: move after-line comments before the line when the comment isn't indented
- sql.py run_query(): don't remove PL/Python prefix
- but first, row-based import would need to parse errors using wrapper functions, because it doesn't use errors tables
- use operator classes which compare NULL literally instead of
COALESCE()
in indexes - db_xml.put_special_funcs _simplifyPath(): don't need to
xpath.parse(next)
?
In common.Makefile, change the default $src_server (sync server) from vegbiendev to jupiter
Suggestions¶
- Look into using Sybase Powerbuilder or IBM Enterprise Vision to map data
- The TACC (Texas Advanced Computing Center) people might have individual licenses they could let us use
- "For a next plot type I would suggest TurboVeg" (e-mail from Bob Peet on 2011-12-1)
Working group output¶
- either modify loading scripts to use VegBIEN or create BIEN 2 -> VegBIEN loading script
- analytical database
- version controlled
- validation
BIEN¶
deficiencies in existing data- time component
taxonomy versioningversioned DB backupstaxon traits table- good data entry tool
- UI and tools for porting data to and from
VegXVegCSV
use casesBrad to request from BIEN members; will compile for Aaron.- each use case will consist of:
- analysis for which data was used (publication or in prep)
- raw data sample
- summary of manipulations needed to make data useable
- shortcomings of data, challenges during data compilation/preparation
Mapping¶
talk to Nick Spencer about mapping enginewill be made publicly available onlineengine reusable for VegBIEN mapping: no: VegX-specific and just maps to VegX top-level tables, not nested XML paths
Databanks¶
- contact(s) for RAINFOR
- access to databanks' internal databases rather than just their source data
CTFS schemalogin for SALVIAS: clone onnimoy
insalvias_plots
MySQL database