Activity
From 11/15/2011 to 12/14/2011
12/14/2011
- 05:31 PM Revision 170: xpath.py: Moved abbr expansion code to separate function
- 04:41 PM Revision 169: test/map: Process all tables for a given DB (.sh) input
- 03:26 PM Revision 168: Removed /'s from DB input mappings
- 02:29 PM Task #299: Mapping from NVS to VegX and VegBIEN
- NVS data from Nick Spencer is on nimoy in @/home/bien_shared/raw_data/nvs/@
- 02:06 PM Task #299 (Resolved): Mapping from NVS to VegX and VegBIEN
- Will require finding someone at NVS willing to work with Aaron and mappings and validations
- 02:25 PM Task #300: TurboVeg data
- TurboVeg info from Bob Peet is on nimoy in @/home/bien_shared/raw_data/turboveg/@
- 02:07 PM Task #300 (New): TurboVeg data
- with commitment by someone familiar with DB to work with Aaron on (a) evaluating mappings, and (b) developing validat...
- 02:07 PM Task #301 (Resolved): RAINFOR data
- with commitment by someone familiar with DB to work with Aaron on (a) evaluating mappings, and (b) developing validat...
- 02:06 PM Task #298 (New): Try to find source of DwCA (DwC Archives) data
- hoping GBIF will be willing to work with us on this. Possibly approach Remsen directly
- 02:05 PM Task #297 (Resolved): Request new data dump of specimen data from GBIF, this time in DwC format
- 02:03 PM Task #296 (Resolved): Direct mapping from native salvias_plots MySQL database to VegBIEN
- 01:59 PM Task #289: look for formal mapping mechanism
- Got NVS mapping tool from Nick Spencer, which is on nimoy in @/home/bien_shared/raw_data/nvs/VegX/@
- 01:54 PM Task #286: CSV-XML-database mapping script
- Added support for database and XML inputs
- 01:46 PM Revision 167: map: Use row's index instead of pkey as ID in XML output
- 01:45 PM Revision 166: test/map: Compare via-VegX output to direct output
- 01:13 PM Revision 165: xpath.py: Changed order that main and other branches are processed in so it is consistent with the order the branches are specified in the XPath
- 12:03 PM Revision 164: map: Handle metadata in order with regular mappings
- 11:33 AM Revision 163: Accepted VegBank test output for new CSV mapping order
- 11:26 AM Revision 162: map: Changed CSV input to process mappings in the order they are in the spreadsheet, rather than the order of the CSV columns
12/13/2011
- 05:18 PM Revision 161: map: Added support for XML input
- 05:17 PM Revision 160: Accepted new test output for sorted SALVIAS_db-VegBank mapping
- 05:07 PM Revision 159: mappings to VegBank: Sorted by output column to help VegX-VegBank conversion put elements in the same order as source-VegBank
- 04:57 PM Revision 158: join_all_vegbank: Updated to sort output maps
- 04:57 PM Revision 157: Added script to sort a spreadsheet
- 03:51 PM Revision 156: xpath.py: Allowed empty names in XPaths
- 03:48 PM Revision 155: xpath.py: Added automatic conversion of strings to paths where needed.
- 03:00 PM Revision 154: xpath.py: Added caching of parsed XPaths. Added automatic conversion of strings to paths where needed.
- 02:59 PM Revision 153: Added __str__() method to XML nodes
- 02:58 PM Revision 152: Fixed VegX-VegBank mapping syntax error
- 02:17 PM Revision 151: Added faded beginning of string in Parser syntax errors
- 02:01 PM Revision 150: Updated mappings Makefile
- 02:00 PM Revision 149: Added Makefiles for scripts and test
- 01:54 PM Revision 148: Added mappings Makefile
- 12:28 PM Revision 147: Added human-readable SALVIAS_db mappings
- 12:28 PM Revision 146: db_xml.get(): Pass limit through to SQL query
- 12:11 PM Task #295 (Resolved): provide benchmark queries for NYBG data
- Brad Boyle provided NYBG queries, which are on the wiki under [[NYBG tests]]
- 12:10 PM Task #290: benchmark tests for database loading
- Brad Boyle provided NYBG queries, which are on the wiki under [[NYBG tests]]
- 11:57 AM Task #291: list of milestones
- updated to do list
- 11:56 AM Task #291: list of milestones
- Brad created a timeline, which is on the wiki under [[December 8 2011 WebEx meeting]].
- 11:48 AM Revision 145: Regenerated human-readable mappings
12/12/2011
- 05:39 PM Revision 144: Fixed documentation for xml_funcs
- 05:38 PM Revision 143: Refactored xml_dom.set_value() to avoid needing a doc parameter for the XML document
- 05:35 PM Revision 142: xpath.py: Refactored xml_func.py to avoid needing a doc parameter for the XML document
- 05:30 PM Revision 141: xpath.py: Refactored to avoid needing a doc parameter for the XML document
- 04:41 PM Revision 140: Fixed DB input to ignore NULL values
- 04:27 PM Revision 139: xml_dom.py: Changed all uses of name_of(node) to node.tagName
- 04:23 PM Revision 138: Made XML node names case-sensitive
- 04:20 PM Revision 137: mappings to VegBank: Fixed incorrect mappings found after disabling heuristic search for missing fields
- 03:49 PM Revision 136: test/map: Ignore diff exit status
- 03:34 PM Revision 135: map: Implemented DB input support for querying a single table
12/09/2011
- 05:36 PM Revision 134: Added SALVIAS_db test accepted output
- 05:35 PM Revision 133: map: Continued to add DB input support
- 04:54 PM Revision 132: test/map: Echo command used to import db config
- 04:02 PM Revision 131: Added support for multiple database engines. Changed SALVIAS_db input to use user-entered password.
- 01:58 PM Revision 130: map: Allow db config vars to be optional. SALVIAS_db test: Changed to use salvias_plots and XPath mapping syntax.
- 01:32 PM Revision 129: Renamed SALVIAS_db test input to use organisms table
- 01:29 PM Revision 128: Re-committed accepted_outputs
- 01:23 PM Revision 127: Renamed test/map output to remove CSV/DB indicator because that is now specified in the datasource name
- 01:18 PM Revision 126: map: Started adding database get by XPath functionality
12/08/2011
- 06:48 PM Revision 125: format_for_review: Fixed bug where Comments column would be reformatted in addition to mappings columns
- 05:38 PM Task #285: CSV to XML mappings for NYBG, SALVIAS
- To make it easier to review the mappings, I created human-readable versions "in Subversion":https://projects.nceas.uc...
- 05:35 PM Revision 124: Regenerated human-readable mappings
- 05:15 PM Revision 123: Added human-readable versions of mappings and scripts to generate them
- 05:14 PM Revision 122: VegX-VegBank mapping: Removed a duplicated mapping
- 05:13 PM Revision 121: NYBG-VegX mapping: Added conference call feedback
- 03:12 PM Task #295 (Resolved): provide benchmark queries for NYBG data
- 03:11 PM Task #294 (Resolved): find plot data source provider to work with Aaron
- 01:48 PM Revision 120: Added Comments column with Brad's and Aaron's comments to mapping spreadsheets
- 01:07 PM Task #290: benchmark tests for database loading
- Brad Boyle provided SALVIAS queries, which are on the wiki under [[SALVIAS tests]]
12/07/2011
- 05:01 PM Revision 119: Added stub for SALVIAS database test
- 05:00 PM Revision 118: test/map: Added support for database input
- 04:14 PM Revision 117: Preparing map to input from DB
- 04:05 PM Task #288: VegX-VegBank mapping
- If you would like to browse the @vegbank@ database on nimoy, you can now use "phpPgAdmin":http://bien.nceas.ucsb.edu/...
- 03:47 PM Task #293: mapping inversion script
- updated transformations
- 01:43 PM Task #293: mapping inversion script
- suggested name and location: @svn/scripts/util/invert_map@
- 01:41 PM Task #293 (New): mapping inversion script
- A Python script to invert a mapping spreadsheet. This will be useful for mapping VegBank to VegX, so that we can just...
- 03:32 PM Revision 116: Started preparing map to input from DB
- 02:18 PM Task #289: look for formal mapping mechanism
- *"RDF SPARQL":http://en.wikipedia.org/wiki/SPARQL:*
* SELECT-style queries for RDF data
* uses concise Turtle syn... - 02:01 PM Task #289: look for formal mapping mechanism
- *"IBM Clio":http://www.almaden.ibm.com/cs/projects/criollo/:*
* "Clio then also interprets these mappings to const... - 01:23 PM Task #289: look for formal mapping mechanism
- updated to do list
- 01:11 PM Task #289: look for formal mapping mechanism
- *"Bourret's XML-ER mapping":http://rpbourret.com/:*
* *summary: his various mapping methods are already used by Ve... - 12:45 PM Task #289: look for formal mapping mechanism
- *XQuery:*
* "XQuery Tutorial":http://www.w3schools.com/xquery/default.asp
** XQuery iterates over XML documents stor... - 12:38 PM Task #289: look for formal mapping mechanism
- *Altova XMLSpy's graphical generation of XPaths:*
* *summary: XMLSpy and Oxygen XML both have Copy XPath commands (O... - 01:27 PM Revision 115: xml_func.py: Added optimization to first check if function name starts with _ before looking it up in the table
- 12:24 PM Revision 114: Added _alt functions for mappings to VegBank authorPlotCode
- 12:17 PM Revision 113: xml_func.py: Added _alt function to choose between alternative values and used it for the collector plantName mapping
- 11:54 AM Revision 112: VegX-VegBank mapping: Added mapping from taxonName/Simple (NYBG ScientificName) to collector plantName so that collector plantName will always have a value
- 11:27 AM Revision 111: xml_func.py: Added support for decimal years (with day as the fraction)
- 11:16 AM Revision 110: test/map: Added echoing of commands run
12/06/2011
- 04:34 PM Task #288: VegX-VegBank mapping
- A mostly working VegX-VegBank mapping is now available is now available in svn at https://projects.nceas.ucsb.edu/nce...
- 04:31 PM Task #287 (Resolved): XML to database conversion script (merged into CSV-XML-database mapping script)
- 04:28 PM Task #285: CSV to XML mappings for NYBG, SALVIAS
- Working NYBG-VegBank mappings are now available in svn at https://projects.nceas.ucsb.edu/nceas/projects/bien/reposit...
- 04:26 PM Task #286: CSV-XML-database mapping script
- We are now able to import NYBG data directly into VegBank, using the map2vegbank script on nimoy at @/home/bien_share...
- 04:19 PM Revision 109: Added psql_vegbank to easily access vegbank db from the command line
- 04:07 PM Revision 108: Ignore OpenOffice lock files in mappings
- 04:05 PM Revision 107: Added SALVIAS data CSVs and accepted test output
- 03:52 PM Revision 106: test/map: Expanded to include all input CSVs in test/input
- 03:31 PM Revision 105: Removed unneeded joins dir
- 03:30 PM Revision 104: Moved VegBank mapping joins to main mappings dir so they would have similar paths for the upcoming all-sources tester
- 03:11 PM Revision 103: Moved test scripts and files from util to test
- 02:50 PM Revision 102: xml_func.py: Added _namepart function for extracting parts of names
- 02:11 PM Revision 101: Finished NYBG mapping to VegBank\!
- 02:04 PM Revision 100: test_map: Added debug option to print VegBank XML instead of importing it into the database
- 01:34 PM Revision 99: xpath.py: Created is_positive() function
- 01:28 PM Revision 98: Further refinements to mappings to support database constraints
- 01:27 PM Revision 97: xpath.py: Added support for negative attribute assertions with !
- 10:54 AM Revision 96: Changed mappings to use keys vs. attrs
- 10:53 AM Revision 95: xpath.py: Fixed creation of attrs so it happens even when node already exists
- 09:59 AM Revision 94: xpath.py: Added concept of keys vs attrs in XPath elem
12/05/2011
- 05:25 PM Revision 93: Started filling in required values for VegBank fields in mappings. Will need to refactor to move these to metadata for the datasources.
- 05:24 PM Revision 92: Now allow empty rows. Added support for select statement limit.
- 04:17 PM Revision 91: Added support for quoted values in XPaths
- 04:02 PM Revision 90: Fixed name XML function. Fixed accept_test_output.
- 03:59 PM Revision 89: Added support for name XML function. Added error handling for empty rows.
- 03:28 PM Revision 88: Made it easier to accept test output
- 03:18 PM Revision 87: Added NYBG stemCount metadata
- 03:11 PM Revision 86: Added xml_func.py to process mappings whose output needs postprocessing
- 01:53 PM Revision 85: Changed VegBank mappings to use XML functions (not implemented yet) to calculate averages and ranges
- 01:25 PM Revision 84: Added support for mapping datasource metadata
- 12:53 PM Revision 83: Changed for loops to use enumerate() where the index is also needed
- 12:50 PM Revision 82: Moved XPath prep code (setting ID, value) to xpath.py
- 12:14 PM Task #292: VegBank metadata query mechanism
- (Moved to issue description)
- 12:13 PM Task #292 (New): VegBank metadata query mechanism
- For data discovery of VegBank schema.
Mike Lee's suggestion: (e-mail on 2011-11-9)
I'm wondering if you all tal... - 12:09 PM Task #289: look for formal mapping mechanism
- Mike Lee's explanation of the VegBank XML serialization format: (e-mail on 2011-12-2)
My recollection is that our in...
12/02/2011
- 05:27 PM Revision 81: xpath.py: Added deepcopy() before setting value of other branches to traverse
- 05:12 PM Revision 80: NYSpecimenDataAmericas.test.xml: Updated for new NYBG-VegX.organisms.csv
- 05:11 PM Revision 79: NYBG-VegX.organisms.csv: Changed voucher (primary key) column to be UniqueNYInternalRecordNumber because CatalogNumber contained an empty value
- 05:10 PM Revision 78: xpath.py: Added basic support for split paths
- 04:30 PM Revision 77: Merged xml_xpath.py into xpath.py in preparation for changing the XPath parse tree to be the XML DOM tree itself
- 03:58 PM Revision 76: Refactored xpath.parse() to use a nested function instead of a class extending Parser
- 03:04 PM Revision 75: map: Fixed mislocated import for Parser.SyntaxException
- 02:21 PM Revision 74: Removed SALVIAS voucher_string mapping per conference call discussion
- 02:16 PM Revision 73: map: Fixed bugs to enable mapping straight from CSV to a database. Still need a way to set plot.authorPlotCode for specimens data.
- 12:05 PM Revision 72: Fixed ch_map_root to support subpaths which follow the root by -> rather than /. Changed spreadsheet syntax to have : between label and root.
12/01/2011
- 04:44 PM Task #291: list of milestones
- I put the tasks from today's conference call into the "Redmine issue tracker":https://projects.nceas.ucsb.edu/nceas/p...
- 04:35 PM Task #291 (Resolved): list of milestones
- *Conference call:*
* -need list of milestones for the next 6-12 months-
* -*add conference call tasks to Redmine ... - 04:34 PM Task #290 (Resolved): benchmark tests for database loading
- *Conference call:*
* *develop benchmark tests to check that datasource data was inserted correctly into VegBank*
... - 04:32 PM Task #289 (Resolved): look for formal mapping mechanism
- *Conference call:*
* look into VegBranch's way of capturing mappings and metadata
* -look into Altova XMLSpy's gr... - 04:29 PM Task #285: CSV to XML mappings for NYBG, SALVIAS
- *Conference call:*
* Ignore SALVIAS @voucher_string@ because it is sometimes missing collector's name
* Ignore SALVI... - 10:50 AM Task #285: CSV to XML mappings for NYBG, SALVIAS
- The latest mapping spreadsheets for datasources->VegX and VegX->VegBank are now available in svn at https://projects....
- 01:55 PM Revision 71: Updated extract_plot_map to use new name for VegX-VegBank mapping and re-ran it and join_all_vegbank
- 01:51 PM Revision 70: Finished VegX-VegBank mapping and created VegBank joins of mappings to VegX
- 11:53 AM Revision 69: Finished ch_map_root (renamed from submap)
- 10:53 AM Task #286: CSV-XML-database mapping script
- I've merged data2xml and xml2db into one script called map, which can be run on nimoy at /home/bien_shared/svn/script...
- 10:52 AM Task #288: VegX-VegBank mapping
- The mapping spreadsheet for VegX->VegBank is now available in svn at https://projects.nceas.ucsb.edu/nceas/projects/b...
11/30/2011
- 05:36 PM Revision 68: Added submap and extract_plot_map to extract plot subpaths from VegX-VegBank.csv
- 04:56 PM Revision 67: Moved env usage string creation to opts.py. Changed db config var names to use in/out instead of from/to.
- 04:24 PM Revision 66: Keep *.test.xml out of version control
- 04:22 PM Revision 65: Moved options-processing code to opts.py: Added opts.py
- 04:21 PM Revision 64: Moved options-processing code to opts.py
- 04:04 PM Revision 63: test_map: Compares generated XML to correct version
- 03:55 PM Revision 62: Fixed xml_xpath.get() last_only optimization to handle attrs correctly. Turned off stack traces for errors intended for the user to see.
- 02:32 PM Revision 61: Changed mappings to place prefix common to all XPaths in the column header
- 01:40 PM Task #288 (Resolved): VegX-VegBank mapping
- CSV spreadsheet mapping VegX to VegBank
- 01:37 PM Task #287: XML to database conversion script (merged into CSV-XML-database mapping script)
- Merged into CSV-XML-database mapping script
- 01:35 PM Task #286: CSV-XML-database mapping script
- Merged in XML to database conversion script
- 01:31 PM Revision 60: simplify_xpath: Made it case-insensitive
- 01:25 PM Revision 59: map: Added support for custom fkeys to parent in db XML trees. Removed extraneous csv reader/writer config because Excel format is default. Improved documentation.
11/29/2011
- 05:36 PM Revision 58: map: Added stub for database input
- 05:33 PM Revision 57: map: Added more stubs for XML-XML mapping
- 05:15 PM Revision 56: Started adding XML-XML mapping support to map
- 04:43 PM Revision 55: Split off xpath.py XML functionality into xml_xpath.py
- 04:28 PM Revision 54: map: Using SystemExit for usage errors to avoid stack trace
- 04:22 PM Revision 53: Merged data2xml and xml2db into map
- 03:03 PM Revision 52: Removed trailing whitespace from VegX-VegBank.csv map
- 02:59 PM Revision 51: Created join_maps to join two 2-column map spreadsheets
- 02:11 PM Revision 50: Renamed mappings to be compatible with Redmine allowed characters in attachment filenames
- 01:59 PM Revision 49: Added refactored mappings and changed data2xml to use the new 2-column format
- 01:25 PM Revision 48: Refactored db_xml.py's db insertion function to avoid extra nested functions
- 01:06 PM Revision 47: Added README.TXT
- 01:02 PM Revision 46: Renamed modules to remove _util
- 12:47 PM Revision 45: Added svn:ignore for *.pyc
- 12:42 PM Revision 44: Renamed xml2db_ and data2xml_ to remove _
- 12:42 PM Revision 43: Moved scripts to main directory and associated files to util
- 12:31 PM Revision 42: Moved Python modules to shared lib folder
11/28/2011
- 05:32 PM Revision 41: xml2db: Started refactoring xml2db() to support getting as well as inserting data
- 05:29 PM Revision 40: xml2db: Started refactoring xml2db() to support getting as well as inserting data
- 05:05 PM Revision 39: xml2db: Changed to return ID (pkey) of inserted record and use this returned value as parent_id instead of getting the parent_id from the parent XML node
- 03:16 PM Revision 38: data2xml: Added syntax for split paths, which map to multiple leaves
- 01:52 PM Revision 37: xml2db: Improved empty_db to use TRUNCATE instead of DROP DATABASE. Added xml2vegbank to automatically set db env vars.
- 01:51 PM Revision 36: data2xml: Improved syntax for XPath lookahead assertions. Changed XML printing to print multiple text nodes on separate lines.
- 12:15 PM Revision 35: Moved vegbank_example_ver1.0.2.xml to xml2db, where it should have been
11/23/2011
- 05:39 PM Task #286: CSV-XML-database mapping script
- I updated the data2xml script (demo at nimoy:/home/bien_shared/svn/scripts/data2xml/test) to support the pointer form...
- 05:37 PM Task #285: CSV to XML mappings for NYBG, SALVIAS
- I made a number of changes to the NYBG and SALVIAS mappings to support VegX's concept of pointers between objects.
- 05:22 PM Revision 34: data2xml: Small correction to NYBG mapping
- 04:58 PM Revision 33: data2xml: Created simplify_xpath script to remove duplication from XPath expressions
- 04:15 PM Revision 32: data2xml: Added support for * abbrs for backward (child-to-parent) pointers
- 02:52 PM Revision 31: In data2xml, fixed determination of which nesting level to put IDs on
- 02:45 PM Revision 30: Simplified expansion of * abbrs
- 02:23 PM Revision 29: Removed no longer necessary strip() from node value getter
- 02:22 PM Revision 28: Added patch for xml.dom.minidom.Element.writexml to avoid adding extra whitespace around text nodes
- 12:45 PM Revision 27: Added pointer field name abbreviations to data2xml and NYBG mappings
11/22/2011
- 04:35 PM Revision 26: In data2xml, fixed pointer handling to deal with pointer targets that are themselves pointers
- 04:01 PM Revision 25: In data2xml, added shortcut for lookahead assertion using ! symbol
- 02:32 PM Revision 24: In data2xml, fixed backward (child-to-parent) pointer handling to get and set attribute values properly
- 01:52 PM Revision 23: In data2xml, fixed xpath.get() to do last_only optimization properly for pointer targets
- 01:32 PM Revision 22: In data2xml, added support for XPath pointers
11/21/2011
- 05:48 PM Revision 21: Merged data2xml XPath functionality into xpath.py. Merged data2xml xml_dom.py and xml2db xml_util.py into identical xml_util.py for each script.
- 04:50 PM Revision 20: Added empty_db script to reset the vegbank database after running xml2db/test in commit mode
- 02:20 PM Task #287: XML to database conversion script (merged into CSV-XML-database mapping script)
- The script is on nimoy at @/home/bien_shared/svn/scripts/xml2db@. You can run the demo at @/home/bien_shared/svn/scri...
- 02:01 PM Revision 19: Changed xml2db and vegbank db to be owned by new user vegbank
11/18/2011
- 05:38 PM Revision 18: Changed xml2db and data2xml to help standardize mapping to different XML formats
- 02:48 PM Revision 17: Added DROP DATABASE and CREATE DATABASE to vegbank.sql
- 12:52 PM Revision 16: Changed xml2db to use primarily node contents to determine whether a node is a field or a child table
11/17/2011
- 04:42 PM Revision 15: Changed xml2db to use the first column in a table as its primary key
- 04:08 PM Revision 14: Changed xml2db to avoid inserting duplicate rows
- 03:31 PM Revision 13: Initial version of xml2db. Doesn't yet handle all duplicate rows correctly.
- 11:29 AM Revision 12: Removed .pyc files
- 10:29 AM Revision 11: Added BIEN 3 scripts
- 10:08 AM Task #287 (Resolved): XML to database conversion script (merged into CSV-XML-database mapping script)
- Python script to import an XML file into a PostgreSQL database
- 10:07 AM Task #286 (New): CSV-XML-database mapping script
- Python script to map CSV, XML, and database datasources to each other, using a map spreadsheet when needed
- 10:06 AM Task #285 (Resolved): CSV to XML mappings for NYBG, SALVIAS
- CSV spreadsheets mapping CSV file columns to XPath expressions for the NYBG and SALVIAS datasources
Also available in: Atom