Project

General

Profile

Statistics
| Revision:

# Date Author Comment
4263 08/28/2012 04:01 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Also create header.csv so that there is a CSV header that the map spreadsheets can be autogenerated from

4262 08/28/2012 02:22 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Add row_num column to derived staging tables so they will have a pkey

4261 08/28/2012 02:21 PM Aaron Marcuse-Kubitza

sql.py: pkey(): Use pkey_col constant if this column exists, to allow using a row_num column as the pkey even when it is placed at the end of the table (due to being added after the table was created)

4260 08/28/2012 01:59 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Support alternative generation of a staging table by joining together other staging tables in a create.sql file

4259 08/28/2012 01:57 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: %/install: Don't create a row_num column when the table is a joined table because it collides during joins

4258 08/28/2012 01:49 PM Aaron Marcuse-Kubitza

csv2db: Made input_cmd optional when errors_table_only is on, because the CSV header is not needed to create the errors table

4257 08/28/2012 01:47 PM Aaron Marcuse-Kubitza

csv2db: Added has_row_num param to disable creating a row_num column

4256 08/28/2012 12:44 PM Aaron Marcuse-Kubitza

input.Makefile: Existing maps discovery: $(allTables): When prepending unsorted (joined) tables, save them in $(joinedTables) for later use in determining which tables should have a row_num column

4255 08/28/2012 12:27 PM Aaron Marcuse-Kubitza

README.TXT: Fixed indent

4254 08/28/2012 12:04 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables installation: Install all tables, not just those present in import_order.txt. This will later allow staging tables to be derived by joining together other staging tables, which themselves are not imported but still need to be installed.

4253 08/28/2012 11:53 AM Aaron Marcuse-Kubitza

input.Makefile: Existing maps discovery: $(tables): Prepend unsorted tables (those that are not present in import_order.txt)

4252 08/28/2012 11:04 AM Aaron Marcuse-Kubitza

input.Makefile: Renamed "...-%" targets to "%/..." so they are more logically associated with a specific subdir

4251 08/28/2012 10:54 AM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added Madidi terms that don't exist in other datasources

4250 08/28/2012 10:47 AM Aaron Marcuse-Kubitza

inputs/Madidi/0.plots/map.csv: Added [Veg+] to root to enable auto-mapping

4249 08/28/2012 10:35 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4248 08/27/2012 10:47 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/1.organisms/map.csv: Map directly to locationID, plotName instead of parentLocationID, parentPlotName because these terms now map correctly to the parent location when a subplot column exists

4247 08/27/2012 10:43 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: plotName -> /location/authorlocationcode mapping: When subplot is provided, remove this mapping using _if ... _exists instead of _alt so that a NULL subplot value will not cause the parent plot's name to be used for the subplot name

4246 08/27/2012 10:34 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: $(runTest): Remove outputs of successful tests to reduce clutter

4245 08/27/2012 10:32 PM Aaron Marcuse-Kubitza

input.Makefile: Testing: %/test.staging.xml: Don't create test.staging.xml at all for non-flat-file inputs, because it is not needed (diff does not run in this case)

4244 08/27/2012 10:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Fixed bug where "if subplot" conditions would evaluate to true only if the subplot was NOT NULL, when they should actually evaluate to true if the datasource specified any subplot column, nullable or not

4243 08/27/2012 10:14 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Removed no longer needed hardcoded _if simplifying code now that there is an _if() simplifying function

4242 08/27/2012 10:10 PM Aaron Marcuse-Kubitza

db_xml.py: input_col_prefix: Use value of xml_func.var_name_prefix, which is now the place where this value is configured

4241 08/27/2012 10:09 PM Aaron Marcuse-Kubitza

db_xml.py: Moved input_col_prefix above the put() function that uses it

4240 08/27/2012 10:09 PM Aaron Marcuse-Kubitza

xml_func.py: Added _if() simplifying function

4239 08/27/2012 10:07 PM Aaron Marcuse-Kubitza

xml_func.py: Added is_var_name() and is_var()

4238 08/27/2012 10:06 PM Aaron Marcuse-Kubitza

xml_dom.py: Added NodeEntryIter

4237 08/27/2012 09:33 PM Aaron Marcuse-Kubitza

xml_func.py: Added _exists()

4236 08/27/2012 09:30 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Added support for custom simplifying functions, which are not hard-coded in simplify()

4235 08/27/2012 09:19 PM Aaron Marcuse-Kubitza

xml_dom.py: replace_with_text(): Use new bool2str() so that False causes the node to be removed instead of replaced with the empty string

4234 08/27/2012 09:18 PM Aaron Marcuse-Kubitza

xml_dom.py: Added bool2str()

4233 08/27/2012 08:56 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/1.organisms/map.csv: Mapped subplot, Line to new subplot VegCore term

4232 08/27/2012 08:54 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped subplot, which involved replacing an _if with _alt to both remove plotName as the authorlocationcode and use subplot instead when subplot is specified

4231 08/27/2012 08:47 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationID, plotName: Redirect to /location/parent_id/location/* if subplot field is specified

4230 08/27/2012 08:42 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Also remove _if statements with only a condition. This is a required transformation, because such _if statements can't be handled by functions._if() due to there being no argument to provide the anyelement type.

4229 08/27/2012 08:06 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Added pruning optimization that removes empty children. Empty children are created when some mappings don't apply to the current datasource.

4228 08/27/2012 07:58 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Only generate children list if node is a function

4227 08/27/2012 07:33 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Refactored to support processing nodes that are not functions. Changed var names for clarity.

4226 08/27/2012 06:55 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg

4225 08/27/2012 06:51 PM Aaron Marcuse-Kubitza

db_xml.py: put(): _simplifyPath() built-in function: Removed `require` param, which is not used by this _simplifyPath() implementation because the database constraints handle this

4224 08/27/2012 05:56 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added subplot

4223 08/27/2012 05:30 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Also add empty import_order.txt

4222 08/27/2012 05:30 PM Aaron Marcuse-Kubitza

lib/common.Makefile: SVN: Added $(addFile)

4221 08/27/2012 05:26 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: add: Don't automatically add a Specimen subdir, because some plots datasources don't have that table

4220 08/27/2012 05:23 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Adding input data: Added step to add <table> to inputs/<datasrc>/import_order.txt

4219 08/27/2012 04:48 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Changed "<name>" to "<datasrc>" to distinguish it more clearly from "<table>", which is also a name

4218 08/27/2012 04:45 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Adding input data: Changed steps to use new %/add command to add table's subdir

4217 08/27/2012 04:36 PM Aaron Marcuse-Kubitza

input.Makefile: SVN: Added %/add to add a new table subdir. add: Changed default subdir name to Specimen to match suggested table names at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV#Suggested-table-names&gt;. Use new %/add to add it.

4216 08/27/2012 04:18 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4215 08/24/2012 07:56 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Replaced fixed table names with link to VegCSV suggested table names

4214 08/24/2012 07:43 PM Aaron Marcuse-Kubitza

input.Makefile: $(srcsOnly): Include only files ending in one of the data extensions: csv tsv txt xml. This allows the data provider to include other documentation files, such as SQL export queries, in the table subdirs.

4213 08/24/2012 07:24 PM Aaron Marcuse-Kubitza

bin/map: Documented that it is duplicate-column safe (supports multiple columns of the same name)

4212 08/24/2012 07:10 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: Obtaining CSVs: Documented that when exporting relational databases to CSVs, you MUST ensure that embedded quotes are escaped by doubling them, not by preceding them with a "\" as is the default in phpMyAdmin

4211 08/24/2012 07:00 PM Aaron Marcuse-Kubitza

csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter

4210 08/24/2012 06:50 PM Aaron Marcuse-Kubitza

sql_io.py: null_strs: Added 'NULL', which is used by phpMyAdmin as the default "Replace NULL with" value for CSV exports

4209 08/24/2012 06:48 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Refactored to use for loop with array constant, so that additional NULL-equivalent strings can easily be added

4208 08/24/2012 06:30 PM Aaron Marcuse-Kubitza

mappings/roots/: Merged roots for different tables into one mappings/root.sh for Veg+, which handles all tables' mappings to VegBIEN

4207 08/24/2012 04:31 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): When ignoring all rows for an iteration, return literal NULL value instead of column of NULLs as an optimization for callers using that iteration's pkeys

4206 08/24/2012 12:20 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4205 08/23/2012 05:32 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Primary taxondetermination: Removed [role=identifier] because the role of the entity making the determination is unknown. Added [!isoriginal] filter to those mappings to ensure that primary taxondetermination XPaths map to a different taxondetermination than the [isoriginal=true] determination when both are present.

4204 08/23/2012 05:24 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/1.organisms/map.csv: Remapped cfaff to identificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms but does not have a corresponding Orig prefix to indicate that it should apply to the original determination instead of the primary TNRS one

4203 08/23/2012 05:19 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Removed no longer used computer.* taxonomic terms

4202 08/23/2012 05:19 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Removed no longer used computer.* taxonomic terms

4201 08/23/2012 05:18 PM Aaron Marcuse-Kubitza

inputs: Regenerated VegBIEN.csv for several datasources, which had apparently not gotten regenerated when make was run after the taxonRank mapping addition

4200 08/23/2012 05:00 PM Aaron Marcuse-Kubitza

backups/: svn:ignore: Also ignore .*, which includes temp files generated by rsync

4199 08/23/2012 04:58 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Also consider _name() to be an aggregate function

4198 08/23/2012 04:57 PM Aaron Marcuse-Kubitza

xml_func.py: simplify(): Also consider _name() to be an aggregate function

4197 08/23/2012 04:49 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/1.organisms/map.csv: Removed computer.* prefix from primary (TNRS) taxondetermination, so it would map to the main taxondetermination in VegBIEN

4196 08/23/2012 04:46 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped taxonRank analogously to computer.taxonRank

4195 08/23/2012 04:34 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/1.organisms/map.csv: Remapped OrigFamily/OrigGenus/OrigSpecies to new verbatim* taxonomic names. Also remapped cfaff to verbatimIdentificationQualifier, because it was previously mapped to the same taxondetermination as the Orig* terms, but this will later need to be remapped to identificationQualifier (not in this commit because that is a separate change). Note that the switch to the verbatim* taxonomic names removes a concatenated binomial that was part of the previous mappings, which put OrigGenus and OrigSpecies together into one scientificName.

4194 08/23/2012 03:34 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped verbatimScientificName to taxonoccurrence.authortaxoncode as an alternative to scientificName

4193 08/23/2012 03:12 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: Mapped verbatim* taxonomic terms

4192 08/23/2012 03:10 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added verbatimIdentificationQualifier

4191 08/23/2012 03:07 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added verbatimScientificName

4190 08/23/2012 03:06 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination: taxondetermination_unique unique index: Added isoriginal so an "original" determination in the same row (as found in SALVIAS) will be seen as distinct from the scrubbed determination, even if they are to the same plant name

4189 08/23/2012 02:57 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: taxonomic terms: Removed ":[isoriginal=true]" because there may be multiple determinations for an organism (either in separate rows or, for SALVIAS, in separate columns), and not all will be the original determination

4188 08/23/2012 02:43 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxondetermination.role: Default to 'unknown' so that the field is optional

4187 08/23/2012 02:41 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: role enum: Added 'unknown' value

4186 08/23/2012 02:20 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: Added verbatim* taxonomic terms

4185 08/23/2012 02:12 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4184 08/22/2012 04:56 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

4183 08/22/2012 04:31 PM Aaron Marcuse-Kubitza

inputs: Regenerated maps for changes to bin/union, which removes empty mappings. Added /_alt suffix where needed.

4182 08/22/2012 03:23 PM Aaron Marcuse-Kubitza

inputs: Move src subdir into main dir, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCSV_subfolders#Move-src-subdir-into-main-dir>

4181 08/22/2012 02:02 PM Aaron Marcuse-Kubitza

input.Makefile: $(tables): Allow datasource to specify custom import order in src/import_order.txt

4180 08/22/2012 01:29 PM Aaron Marcuse-Kubitza

mappings/Veg+.terms.csv: growthForm: Documented source of standard terms

4179 08/22/2012 10:21 AM Aaron Marcuse-Kubitza

inputs/SALVIAS*/src/1.organisms/map.csv: Removed no longer applicable comments, which related to mappings that were in effect long ago

4178 08/22/2012 10:09 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/src/2.stems/map.csv: Added comments from corresponding SALVIAS-CSV organisms columns

4177 08/22/2012 09:54 AM Aaron Marcuse-Kubitza

inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Mapped to new Veg+ habit term

4176 08/22/2012 09:53 AM Aaron Marcuse-Kubitza

inputs/SALVIAS*/src/1.organisms/map.csv: Habit: Don't filter out values not part of the provided terms list, because such values should be flagged as invalid in the error maps rather than silently discarded. This also ensures that any valid values which are not part of the provided terms list are kept.

4175 08/22/2012 09:45 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: habit: Map to new verbatimGrowthForm since this field is not necessarily standardized

4174 08/22/2012 09:42 AM Aaron Marcuse-Kubitza

mappings/Makefile: Veg+.cs-VegBIEN.csv: Join new Veg+-VegCore.to_self.csv (self-join), instead of Veg+-VegCore.csv, to VegCore-VegBIEN.csv, to support two-level chains of mappings in Veg+-VegCore.csv

4173 08/22/2012 09:40 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: /_alt pass through mappings: Removed comment because the two-level mapping propagates it to all fields ending in /_alt, even though it doesn't apply to them, causing the main VegBIEN map and several datasources' maps to change unnecessarily. Also, the comment is not completely accurate because /_alt pass throughs are now used primarily to support idempotent self-joins of Veg+-VegCore.csv.

4172 08/22/2012 09:21 AM Aaron Marcuse-Kubitza

union: Don't eliminate duplicate rows based on matches between map_0's output column and map_1's input column, because union is now being used for self-joins and it is legitimate for a term to appear as both an input and an output

4171 08/22/2012 09:10 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): MissingCastException: Use strings.repr_no_u() instead of strings.urepr() in order to remove the u in u'...' for Unicode strings

4170 08/21/2012 09:48 AM Aaron Marcuse-Kubitza

README.TXT: After a new import: Updated commands for new subdirs layout

4169 08/21/2012 09:42 AM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

4168 08/21/2012 09:34 AM Aaron Marcuse-Kubitza

mappings: Added autogen Veg+-VegCore.to_self.csv, which is Veg+-VegCore.csv joined to itself, and use it as an intermediate map to join to VegCore-VegBIEN.csv. This provides support for two-level chains of mappings in Veg+-VegCore.csv.

4167 08/21/2012 09:31 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Changed output root to Veg+, to allow mappings/Veg+-VegCore.csv to be joined with itself idempotently, for supporting multi-level chains of mappings

4166 08/21/2012 09:27 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: Add pass through /_alt mapping for all terms in this map that are merged with _alt, to allow datasource to define custom mappings that don't pass through the default mapping. This also allows mappings/Veg+-VegCore.csv to be joined with itself idempotently, to support multi-level chains of mappings.

4165 08/21/2012 09:19 AM Aaron Marcuse-Kubitza

mappings/Veg+-VegCore.csv: authorPlantCode: Added _alt suffix to create the correct priority

4164 08/21/2012 09:13 AM Aaron Marcuse-Kubitza

union: Exclude empty rows from the output, so that empty mappings from map_0 aren't included when map_1 contains a non-empty mapping for the same term. Note that this causes "No non-empty join mapping" warnings to turn into "No join mapping".