Project

General

Profile

XSLT

e-mail from Aaron on 2012-1-13:

I was thinking about how to easily support many input tables (such as in CTFS or VegBank) without needing to create a separate mapping spreadsheet for each database table or needing to import each table in the correct order (i.e. plots, then organisms, then stems). I think the best approach is to use something like push-based* XSLT, but for relational rather than XML input, as described in the articles _Querying relational databases through XSLT_ and _Translating XSLT Programs to Efficient SQL Queries_ .

The tool would view a relational database as XML in the way that VegBank XML does. It would need to translate the XML nodes to database tables and the XPath lookups to SQL queries, using an algorithm like those described in the articles. The current mapping tool already views a relational database as XML on the output side, but on the input side just deals with a single table at a time or an XML file that's structured as logical rows (organisms).

There are several descriptions/algorithms for how a relational XSLT tool would work, but I couldn't find an actual implementation. Oracle XML comes close, but (if I understand correctly) it's designed for database fields that contain XML, rather than viewing the database itself as XML. I propose, essentially, that if we can't find an existing XSLT tool that handles relational data inputs, we create one ourselves. This would be a general-purpose tool that we can use for mapping any hierarchical data source to VegX or a database.

It would probably be best to first test out push-based XSLT on the VegX->VegBIEN mapping. If it looks promising as a mapping format, we can develop an XSLT tool to handle relational inputs.

Does this sound like a good plan? If so, what priority should it have relative to our other data imports, schema refactorings, etc.? This may be a precondition for importing CTFS data if it's very normalized, and would help with some of the SALVIAS import issues we've been having.