/ - Changes - BIEN 3 - NCEAS Projects

root @ 1326

#	Date	Author	Comment
1326	03/09/2012 04:16 PM	Aaron Marcuse-Kubitza	Added inputs/UArizona/verify
1325	03/09/2012 04:15 PM	Aaron Marcuse-Kubitza	mappings/verify.specimens.sql: Updated for schema changes
1324	03/09/2012 04:06 PM	Aaron Marcuse-Kubitza	vegbien.sql: placerank enum: Added "village"
1323	03/09/2012 04:00 PM	Aaron Marcuse-Kubitza	VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination
1322	03/09/2012 03:54 PM	Aaron Marcuse-Kubitza	VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships
1321	03/09/2012 03:44 PM	Aaron Marcuse-Kubitza	xml_func.py: Added _simplifyPath
1320	03/09/2012 03:25 PM	Aaron Marcuse-Kubitza	xpath.py: Added get_1()
1319	03/09/2012 02:50 PM	Aaron Marcuse-Kubitza	vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)
1318	03/09/2012 02:46 PM	Aaron Marcuse-Kubitza	vegbien.sql: namedplace: Made rank required
1317	03/09/2012 02:33 PM	Aaron Marcuse-Kubitza	vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list
1316	03/09/2012 02:30 PM	Aaron Marcuse-Kubitza	VegBIEN mappings: Map namedplaces using new rank field
1315	03/09/2012 02:25 PM	Aaron Marcuse-Kubitza	vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem
1314	03/09/2012 02:20 PM	Aaron Marcuse-Kubitza	vegbien.sql: placerank: Standardized names to DwC/GML
1313	03/09/2012 01:06 PM	Aaron Marcuse-Kubitza	vegbien.sql: Added placerank enum
1312	03/09/2012 12:35 PM	Aaron Marcuse-Kubitza	vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.
1311	03/09/2012 12:21 PM	Aaron Marcuse-Kubitza	xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)
1310	03/09/2012 12:20 PM	Aaron Marcuse-Kubitza	DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available
1309	03/08/2012 06:30 PM	Aaron Marcuse-Kubitza	xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)
1308	03/08/2012 06:00 PM	Aaron Marcuse-Kubitza	xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings
1307	03/08/2012 05:58 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL
1306	03/08/2012 05:13 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map
1305	03/08/2012 05:11 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid
1304	03/08/2012 05:10 PM	Aaron Marcuse-Kubitza	xml_func.py: _map: Added option to make map a closed list
1303	03/08/2012 04:56 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg
1302	03/06/2012 06:48 PM	Aaron Marcuse-Kubitza	mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL
1301	03/06/2012 06:42 PM	Aaron Marcuse-Kubitza	input.Makefile: verify: Time the verification since it can take a long time
1300	03/06/2012 06:34 PM	Aaron Marcuse-Kubitza	specimens verification: Added duplicate catalog numbers test
1299	03/06/2012 06:27 PM	Aaron Marcuse-Kubitza	map: On nimoy, use bien2_staging unless otherwise specified
1298	03/06/2012 06:21 PM	Aaron Marcuse-Kubitza	specimens verification: Added # counties test
1297	03/06/2012 05:34 PM	Aaron Marcuse-Kubitza	specimens verification: Added collection codes and # catalog numbers tests
1296	03/06/2012 05:33 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary
1295	03/06/2012 05:32 PM	Aaron Marcuse-Kubitza	strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets
1294	03/06/2012 03:45 PM	Aaron Marcuse-Kubitza	xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)
1293	03/06/2012 03:40 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.organisms.csv: Removed unneeded lookahead assertions from stemtag mappings. They relied on a bug ("feature"?) in the XPath engine that made the value of the lookahead assertion's path the same as the value of the main path, even though the value is set after the path is parsed.
1292	03/06/2012 02:45 PM	Aaron Marcuse-Kubitza	xml_func.py: _date: For year/month/day dates, require the year (it would not make sense to default to a particular year)
1291	03/06/2012 01:29 PM	Aaron Marcuse-Kubitza	inputs/UArizona: Added test outputs
1290	03/06/2012 01:28 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Fixed to allow datasource to define custom date mappings that don't pass through the default date mapping
1289	03/05/2012 05:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Generate maps/src.join.*.csv, which can be used to determine which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
1288	03/05/2012 05:26 PM	Aaron Marcuse-Kubitza	Makefile: Fixed subdir remake target to work for nested subdirs as well
1287	03/05/2012 04:51 PM	Aaron Marcuse-Kubitza	inputs/UArizona: Renamed maps/src.csv to maps/src.specimens.csv because there will be one for each input table
1286	03/05/2012 04:41 PM	Aaron Marcuse-Kubitza	inputs/UArizona: Added maps/src.csv with columns from source data
1285	03/05/2012 04:40 PM	Aaron Marcuse-Kubitza	Added autogen mappings/DwC-VegBIEN.specimens.no_empty.csv, which will be used for determining which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
1284	03/05/2012 04:35 PM	Aaron Marcuse-Kubitza	Added remove_empty to remove empty mappings in a map spreadsheet
1283	03/05/2012 04:35 PM	Aaron Marcuse-Kubitza	join: Don't raise "No join mapping" error for empty mappings because you only want the error for empty mappings for your particular dataset, which requires more information (namely, the subset of the mappings used by your dataset, some of which will not be in the mappings if standard fields have been subtracted out)
1282	03/05/2012 04:10 PM	Aaron Marcuse-Kubitza	join: Fixed bug in "No join mapping" error generation where rows with no existing comments column would cause an IndexError
1281	03/05/2012 04:09 PM	Aaron Marcuse-Kubitza	util.py: Added list_set() and list_setdefault()
1280	03/05/2012 03:44 PM	Aaron Marcuse-Kubitza	inputs/UArizona/maps/DwC.specimens.csv: Merge FieldNotes and Remarks
1279	03/05/2012 03:35 PM	Aaron Marcuse-Kubitza	inputs/UArizona/maps/DwC.specimens.csv: Finished mappings
1278	03/05/2012 03:08 PM	Aaron Marcuse-Kubitza	inputs/UArizona/maps/DwC.specimens.csv: Removed fields already present in DwC mappings
1277	03/05/2012 03:05 PM	Aaron Marcuse-Kubitza	inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
1276	03/05/2012 03:03 PM	Aaron Marcuse-Kubitza	inputs/NYBG/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
1275	03/05/2012 02:48 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Removed fields already present in DwC2.ci-VegBIEN.specimens.csv
1274	03/05/2012 02:38 PM	Aaron Marcuse-Kubitza	Makefiles: Moved remake into main Makefile. Fixed remake to run `make all` in a new make so that cache of existing files is reset. Have main remake run clean and then all instead of forwarding remake to subdirs, so that everything is cleaned before everything is remade.
1273	03/05/2012 02:21 PM	Aaron Marcuse-Kubitza	input.Makefile: maps: maps/$(via).%.full.csv: Fixed bug where $(selfMap) would be ignored if it had not yet been made
1272	03/05/2012 02:02 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Reorganized into DwC and VegX sections
1271	03/05/2012 02:02 PM	Aaron Marcuse-Kubitza	Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
1270	03/05/2012 01:57 PM	Aaron Marcuse-Kubitza	Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
1269	03/05/2012 01:54 PM	Aaron Marcuse-Kubitza	inputs/UArizona/maps/DwC.specimens.csv: Mapped CollectedDate to eventDate/_alt/2 even though it's not used because other datasources might copy these mappings and want it already filled in
1268	03/05/2012 01:52 PM	Aaron Marcuse-Kubitza	Added ucase_first to uppercase the first character of columns in a spreadsheet
1267	03/05/2012 01:21 PM	Aaron Marcuse-Kubitza	Added inputs/UArizona/maps/DwC.specimens.csv autogen maps
1266	03/05/2012 01:20 PM	Aaron Marcuse-Kubitza	inputs/UArizona/maps/DwC.specimens.csv: Mapped more fields
1265	03/05/2012 01:14 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Remove date -> date/_alt/2 mappings because they prevent the original DwC2 date field from being mapped to without an extra /_alt/2 appended
1264	03/05/2012 01:10 PM	Aaron Marcuse-Kubitza	xml_func.py: Use new dates.strtotime(). When component date parts specified, year defaults to dates.epoch.year.
1263	03/05/2012 01:09 PM	Aaron Marcuse-Kubitza	dates.py: Added strtotime() to wrap dateutil.parser.parse() with default defaulting to epoch, so that e.g. months with day missing default to day 1 instead of the current day of the month
1262	03/05/2012 12:38 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Map eventDate,dateIdentified using /_alt/2 and year/month/day using /_alt/1 so that inputs with both a date and date parts will select between the two
1261	03/05/2012 11:43 AM	Aaron Marcuse-Kubitza	input.Makefile: Added comment that self map must be made first if it's needed for maps/$(via).%.full.csv
1260	03/05/2012 11:40 AM	Aaron Marcuse-Kubitza	Makefiles: Use .SECONDARY with no prerequisites instead of setting a .PRECIOUS for each intermediate, to simplify turning off automatic deletion of intermediate files
1259	03/05/2012 11:23 AM	Aaron Marcuse-Kubitza	inputs/UArizona: Added initial maps/DwC.specimens.csv
1258	03/05/2012 11:10 AM	Aaron Marcuse-Kubitza	DwC mappings: Map datasource name via institutionID to avoid conflicting with existing institutionCode fields that many DwC data sources have
1257	03/05/2012 10:57 AM	Aaron Marcuse-Kubitza	input.Makefile: Don't profile by default because it appears to slow things down significantly on long imports
1256	03/05/2012 10:56 AM	Aaron Marcuse-Kubitza	Added inputs/UArizona/maps
1255	03/03/2012 05:56 PM	Aaron Marcuse-Kubitza	Makefile: python-Linux: Added python-profiler
1254	03/03/2012 05:44 PM	Aaron Marcuse-Kubitza	specimens verification: Added # binomials test
1253	03/03/2012 05:35 PM	Aaron Marcuse-Kubitza	vegbien.sql: specimenreplicate: Removed specimenreplicate_unique_collectionnumber index because the collectionnumber (NYBG FieldNumber) is not always unique within a collector, even though it should be. Changed specimenreplicate_unique_catalognumber to only operate on rows with no sourceaccessioncode (of which there are 8 in NYBG).
1252	03/03/2012 05:09 PM	Aaron Marcuse-Kubitza	mappings/verify.specimens.sql: # species test: Fixed to join separately on taxondeterminations for genus and species. # genera test: Removed no longer needed join on party.
1251	03/03/2012 05:04 PM	Aaron Marcuse-Kubitza	vegbien.sql: specimenreplicate: Added fki index on taxonoccurrence_id
1250	03/03/2012 04:25 PM	Aaron Marcuse-Kubitza	vegbien.sql: plantname: Added index on rank to speed up specimens verifications, where the query planner insists on joining from plantname to specimenreplicate instead of the other way around (which takes much longer without the index)
1249	03/03/2012 03:33 PM	Aaron Marcuse-Kubitza	mappings/verify.*: Use nested SELECT instead of JOIN on party to get datasource_id, so that party will not be joined on after other joins have already occurred (which slows things down)
1248	03/03/2012 03:26 PM	Aaron Marcuse-Kubitza	vegbien.sql: party: Changed party_unique_name to ignore NULL values and the organizationname (a first(+middle)+last name is considered unique)
1247	03/03/2012 03:15 PM	Aaron Marcuse-Kubitza	vegbien.sql: party: Added party_unique_organizationname constraint
1246	03/03/2012 02:11 PM	Aaron Marcuse-Kubitza	Specimens verification: Added # genera and # species
1245	03/03/2012 01:50 PM	Aaron Marcuse-Kubitza	input.Makefile: verify: Create target dir if it doesn't exist
1244	03/03/2012 01:42 PM	Aaron Marcuse-Kubitza	inputs/NYBG: Added verify/specimens.ref.sql
1243	03/03/2012 01:41 PM	Aaron Marcuse-Kubitza	Added mappings/verify.specimens.sql
1242	03/03/2012 01:41 PM	Aaron Marcuse-Kubitza	Added inputs/NYBG-CSV/verify/
1241	03/03/2012 01:40 PM	Aaron Marcuse-Kubitza	Makefile: Print done message after verify
1240	03/03/2012 01:29 PM	Aaron Marcuse-Kubitza	VegX-VegBIEN mapping: Use new lookup-only element syntax to ensure that stemtag 1 is not created if it doesn't exist when stemtag 2 tries to set its iscurrent status to false. This should fix the 136 "NullValueException: columns: tag" errors in the SALVIAS organisms import.
1239	03/03/2012 01:27 PM	Aaron Marcuse-Kubitza	xpath.py: get(): Added support for lookup-only elements which are not created if they don't exist
1238	03/03/2012 01:25 PM	Aaron Marcuse-Kubitza	xpath.py: parse(): Added support for lookup-only elements which are not created if they don't exist
1237	03/03/2012 01:15 PM	Aaron Marcuse-Kubitza	VegX-VegBIEN mapping: Map stemtags using [] instead of :[] for attrs that are really keys
1236	03/02/2012 07:54 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
1235	03/02/2012 07:52 PM	Aaron Marcuse-Kubitza	VegX-VegBIEN mapping: Handle user-defined field voucherType (SALVIAS DetType) by mapping specimenreplicates for voucherTypes other than direct via voucher
1234	03/02/2012 06:58 PM	Aaron Marcuse-Kubitza	xml_func.py: Added _if and _eq. Added cast() to throw SyntaxException if can't cast and use it in conv_items(). _merge: Check types of input using conv_items(strings.ustr, items).
1233	03/02/2012 06:53 PM	Aaron Marcuse-Kubitza	util.py: Added all_not_none() and bool2str()
1232	03/02/2012 06:52 PM	Aaron Marcuse-Kubitza	strings.py: Added ustr() (like built-in str() but converts to unicode object)
1231	03/02/2012 05:32 PM	Aaron Marcuse-Kubitza	PostgreSQL-MySQL.csv: Fixed bug in removal of casts of default values, which treated NOT NULL as part of the datatype
1230	03/02/2012 05:30 PM	Aaron Marcuse-Kubitza	VegBIEN: soilobs: Added default value for horizon. Adjusted mappings to remove now-unecessary horizon value.
1229	03/02/2012 05:26 PM	Aaron Marcuse-Kubitza	repl: Removed automatic case-insensitivity because Python apparently only supports turning on case-insensitivity via (?i) but not off via (?-i) (as Java does)
1228	03/02/2012 05:09 PM	Aaron Marcuse-Kubitza	VegBIEN: soilobs: Removed soil* prefix from fields
1227	03/02/2012 05:05 PM	Aaron Marcuse-Kubitza	VegX-VegBIEN mapping: Map to new soilobs fields

Project

General

Profile