Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1341 03/09/2012 06:43 PM Aaron Marcuse-Kubitza

Added xpath_func.py for XPath "function" elements that transform their subpaths

1340 03/09/2012 06:23 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Removed no longer needed taxondetermination.determinationtype values, because they can be determined from the new role closed list

1339 03/09/2012 06:19 PM Aaron Marcuse-Kubitza

filter_ERD.csv: Removed no longer needed references to role

1338 03/09/2012 06:18 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1337 03/09/2012 06:17 PM Aaron Marcuse-Kubitza

VegBIEN: Changed role table to a closed list

1336 03/09/2012 06:14 PM Aaron Marcuse-Kubitza

PostgreSQL-MySQL.csv: custom types: Consider everything except a set of accepted types to be a custom type

1335 03/09/2012 05:40 PM Aaron Marcuse-Kubitza

VegBIEN: taxonrank enum: Made values lowercase to match case convention in other enums

1334 03/09/2012 05:33 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1333 03/09/2012 05:32 PM Aaron Marcuse-Kubitza

vegbien.sql: Renamed plantconceptscope to plantnamescope because it's now attached to plantname

1332 03/09/2012 05:26 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved parent_id from plantconcept to plantname, since plantnames themselves are unique according to their parent taxons (a species under one genus is not the same as a species under another genus)

1331 03/09/2012 05:03 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1330 03/09/2012 04:59 PM Aaron Marcuse-Kubitza

vegbien.ERD.mwb: Fixed lines

1329 03/09/2012 04:57 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved scope_id from plantconcept to plantname, since plantnames themselves are scoped, not just the plantconcepts that use them (e.g. "sp. 1" has different meanings in different scopes, so it should not be shared between scopes). plantname: Added accessioncode.

1328 03/09/2012 04:38 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved plantconcept parent_id from plantstatus to plantconcept. plantconcept: Removed datasource-specific fields to make it globally unique (one plantconcept for each assigned parent taxon of a plantname, of which there will usually be just one)

1327 03/09/2012 04:22 PM Aaron Marcuse-Kubitza

vegbien.sql: plantname: Removed datasource-specific fields to make this a globally-unique table (the datasource-specific fields belong in plantconcept)

1326 03/09/2012 04:16 PM Aaron Marcuse-Kubitza

Added inputs/UArizona/verify

1325 03/09/2012 04:15 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Updated for schema changes

1324 03/09/2012 04:06 PM Aaron Marcuse-Kubitza

vegbien.sql: placerank enum: Added "village"

1323 03/09/2012 04:00 PM Aaron Marcuse-Kubitza

VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination

1322 03/09/2012 03:54 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships

1321 03/09/2012 03:44 PM Aaron Marcuse-Kubitza

xml_func.py: Added _simplifyPath

1320 03/09/2012 03:25 PM Aaron Marcuse-Kubitza

xpath.py: Added get_1()

1319 03/09/2012 02:50 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)

1318 03/09/2012 02:46 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Made rank required

1317 03/09/2012 02:33 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list

1316 03/09/2012 02:30 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Map namedplaces using new rank field

1315 03/09/2012 02:25 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem

1314 03/09/2012 02:20 PM Aaron Marcuse-Kubitza

vegbien.sql: placerank: Standardized names to DwC/GML

1313 03/09/2012 01:06 PM Aaron Marcuse-Kubitza

vegbien.sql: Added placerank enum

1312 03/09/2012 12:35 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.

1311 03/09/2012 12:21 PM Aaron Marcuse-Kubitza

xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)

1310 03/09/2012 12:20 PM Aaron Marcuse-Kubitza

DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available

1309 03/08/2012 06:30 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)

1308 03/08/2012 06:00 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings

1307 03/08/2012 05:58 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL

1306 03/08/2012 05:13 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map

1305 03/08/2012 05:11 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid

1304 03/08/2012 05:10 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Added option to make map a closed list

1303 03/08/2012 04:56 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg

1302 03/06/2012 06:48 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL

1301 03/06/2012 06:42 PM Aaron Marcuse-Kubitza

input.Makefile: verify: Time the verification since it can take a long time

1300 03/06/2012 06:34 PM Aaron Marcuse-Kubitza

specimens verification: Added duplicate catalog numbers test

1299 03/06/2012 06:27 PM Aaron Marcuse-Kubitza

map: On nimoy, use bien2_staging unless otherwise specified

1298 03/06/2012 06:21 PM Aaron Marcuse-Kubitza

specimens verification: Added # counties test

1297 03/06/2012 05:34 PM Aaron Marcuse-Kubitza

specimens verification: Added collection codes and # catalog numbers tests

1296 03/06/2012 05:33 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary

1295 03/06/2012 05:32 PM Aaron Marcuse-Kubitza

strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets

1294 03/06/2012 03:45 PM Aaron Marcuse-Kubitza

xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)

1293 03/06/2012 03:40 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.organisms.csv: Removed unneeded lookahead assertions from stemtag mappings. They relied on a bug ("feature"?) in the XPath engine that made the value of the lookahead assertion's path the same as the value of the main path, even though the value is set after the path is parsed.

1292 03/06/2012 02:45 PM Aaron Marcuse-Kubitza

xml_func.py: _date: For year/month/day dates, require the year (it would not make sense to default to a particular year)

1291 03/06/2012 01:29 PM Aaron Marcuse-Kubitza

inputs/UArizona: Added test outputs

1290 03/06/2012 01:28 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Fixed to allow datasource to define custom date mappings that don't pass through the default date mapping

1289 03/05/2012 05:31 PM Aaron Marcuse-Kubitza

input.Makefile: Generate maps/src.join.*.csv, which can be used to determine which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN

1288 03/05/2012 05:26 PM Aaron Marcuse-Kubitza

Makefile: Fixed subdir remake target to work for nested subdirs as well

1287 03/05/2012 04:51 PM Aaron Marcuse-Kubitza

inputs/UArizona: Renamed maps/src.csv to maps/src.specimens.csv because there will be one for each input table

1286 03/05/2012 04:41 PM Aaron Marcuse-Kubitza

inputs/UArizona: Added maps/src.csv with columns from source data

1285 03/05/2012 04:40 PM Aaron Marcuse-Kubitza

Added autogen mappings/DwC-VegBIEN.specimens.no_empty.csv, which will be used for determining which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN

1284 03/05/2012 04:35 PM Aaron Marcuse-Kubitza

Added remove_empty to remove empty mappings in a map spreadsheet

1283 03/05/2012 04:35 PM Aaron Marcuse-Kubitza

join: Don't raise "No join mapping" error for empty mappings because you only want the error for empty mappings for your particular dataset, which requires more information (namely, the subset of the mappings used by your dataset, some of which will not be in the mappings if standard fields have been subtracted out)

1282 03/05/2012 04:10 PM Aaron Marcuse-Kubitza

join: Fixed bug in "No join mapping" error generation where rows with no existing comments column would cause an IndexError

1281 03/05/2012 04:09 PM Aaron Marcuse-Kubitza

util.py: Added list_set() and list_setdefault()

1280 03/05/2012 03:44 PM Aaron Marcuse-Kubitza

inputs/UArizona/maps/DwC.specimens.csv: Merge FieldNotes and Remarks

1279 03/05/2012 03:35 PM Aaron Marcuse-Kubitza

inputs/UArizona/maps/DwC.specimens.csv: Finished mappings

1278 03/05/2012 03:08 PM Aaron Marcuse-Kubitza

inputs/UArizona/maps/DwC.specimens.csv: Removed fields already present in DwC mappings

1277 03/05/2012 03:05 PM Aaron Marcuse-Kubitza

inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping

1276 03/05/2012 03:03 PM Aaron Marcuse-Kubitza

inputs/NYBG/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping

1275 03/05/2012 02:48 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Removed fields already present in DwC2.ci-VegBIEN.specimens.csv

1274 03/05/2012 02:38 PM Aaron Marcuse-Kubitza

Makefiles: Moved remake into main Makefile. Fixed remake to run `make all` in a new make so that cache of existing files is reset. Have main remake run clean and then all instead of forwarding remake to subdirs, so that everything is cleaned before everything is remade.

1273 03/05/2012 02:21 PM Aaron Marcuse-Kubitza

input.Makefile: maps: maps/$(via).%.full.csv: Fixed bug where $(selfMap) would be ignored if it had not yet been made

1272 03/05/2012 02:02 PM Aaron Marcuse-Kubitza

mappings/Makefile: Reorganized into DwC and VegX sections

1271 03/05/2012 02:02 PM Aaron Marcuse-Kubitza

Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.

1270 03/05/2012 01:57 PM Aaron Marcuse-Kubitza

Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.

1269 03/05/2012 01:54 PM Aaron Marcuse-Kubitza

inputs/UArizona/maps/DwC.specimens.csv: Mapped CollectedDate to eventDate/_alt/2 even though it's not used because other datasources might copy these mappings and want it already filled in

1268 03/05/2012 01:52 PM Aaron Marcuse-Kubitza

Added ucase_first to uppercase the first character of columns in a spreadsheet

1267 03/05/2012 01:21 PM Aaron Marcuse-Kubitza

Added inputs/UArizona/maps/DwC.specimens.csv autogen maps

1266 03/05/2012 01:20 PM Aaron Marcuse-Kubitza

inputs/UArizona/maps/DwC.specimens.csv: Mapped more fields

1265 03/05/2012 01:14 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Remove date -> date/_alt/2 mappings because they prevent the original DwC2 date field from being mapped to without an extra /_alt/2 appended

1264 03/05/2012 01:10 PM Aaron Marcuse-Kubitza

xml_func.py: Use new dates.strtotime(). When component date parts specified, year defaults to dates.epoch.year.

1263 03/05/2012 01:09 PM Aaron Marcuse-Kubitza

dates.py: Added strtotime() to wrap dateutil.parser.parse() with default defaulting to epoch, so that e.g. months with day missing default to day 1 instead of the current day of the month

1262 03/05/2012 12:38 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Map eventDate,dateIdentified using /_alt/2 and year/month/day using /_alt/1 so that inputs with both a date and date parts will select between the two

1261 03/05/2012 11:43 AM Aaron Marcuse-Kubitza

input.Makefile: Added comment that self map must be made first if it's needed for maps/$(via).%.full.csv

1260 03/05/2012 11:40 AM Aaron Marcuse-Kubitza

Makefiles: Use .SECONDARY with no prerequisites instead of setting a .PRECIOUS for each intermediate, to simplify turning off automatic deletion of intermediate files

1259 03/05/2012 11:23 AM Aaron Marcuse-Kubitza

inputs/UArizona: Added initial maps/DwC.specimens.csv

1258 03/05/2012 11:10 AM Aaron Marcuse-Kubitza

DwC mappings: Map datasource name via institutionID to avoid conflicting with existing institutionCode fields that many DwC data sources have

1257 03/05/2012 10:57 AM Aaron Marcuse-Kubitza

input.Makefile: Don't profile by default because it appears to slow things down significantly on long imports

1256 03/05/2012 10:56 AM Aaron Marcuse-Kubitza

Added inputs/UArizona/maps

1255 03/03/2012 05:56 PM Aaron Marcuse-Kubitza

Makefile: python-Linux: Added python-profiler

1254 03/03/2012 05:44 PM Aaron Marcuse-Kubitza

specimens verification: Added # binomials test

1253 03/03/2012 05:35 PM Aaron Marcuse-Kubitza

vegbien.sql: specimenreplicate: Removed specimenreplicate_unique_collectionnumber index because the collectionnumber (NYBG FieldNumber) is not always unique within a collector, even though it should be. Changed specimenreplicate_unique_catalognumber to only operate on rows with no sourceaccessioncode (of which there are 8 in NYBG).

1252 03/03/2012 05:09 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: # species test: Fixed to join separately on taxondeterminations for genus and species. # genera test: Removed no longer needed join on party.

1251 03/03/2012 05:04 PM Aaron Marcuse-Kubitza

vegbien.sql: specimenreplicate: Added fki index on taxonoccurrence_id

1250 03/03/2012 04:25 PM Aaron Marcuse-Kubitza

vegbien.sql: plantname: Added index on rank to speed up specimens verifications, where the query planner insists on joining from plantname to specimenreplicate instead of the other way around (which takes much longer without the index)

1249 03/03/2012 03:33 PM Aaron Marcuse-Kubitza

mappings/verify.*: Use nested SELECT instead of JOIN on party to get datasource_id, so that party will not be joined on after other joins have already occurred (which slows things down)

1248 03/03/2012 03:26 PM Aaron Marcuse-Kubitza

vegbien.sql: party: Changed party_unique_name to ignore NULL values and the organizationname (a first(+middle)+last name is considered unique)

1247 03/03/2012 03:15 PM Aaron Marcuse-Kubitza

vegbien.sql: party: Added party_unique_organizationname constraint

1246 03/03/2012 02:11 PM Aaron Marcuse-Kubitza

Specimens verification: Added # genera and # species

1245 03/03/2012 01:50 PM Aaron Marcuse-Kubitza

input.Makefile: verify: Create target dir if it doesn't exist

1244 03/03/2012 01:42 PM Aaron Marcuse-Kubitza

inputs/NYBG: Added verify/specimens.ref.sql

1243 03/03/2012 01:41 PM Aaron Marcuse-Kubitza

Added mappings/verify.specimens.sql

1242 03/03/2012 01:41 PM Aaron Marcuse-Kubitza

Added inputs/NYBG-CSV/verify/