Activity
From 07/14/2012 to 08/12/2012
08/10/2012
- 10:29 PM Revision 3960: schemas: Renamed vegbien.ERD.pdf to vegbien.ERD.1_pg.pdf since it's not the primary PDF that should be used, due to its slow load time
- 10:26 PM Revision 3959: Regenerated vegbien.ERD exports
- 10:23 PM Revision 3958: schemas/vegbien.sql: specimenreplicate: specimenreplicate_plantobservation_1_to_1: Only apply when sourceaccessioncode and catalognumber_dwc are NULL, in order to support multiple specimenreplicates for one plant in plots data. specimenreplicate_unique_catalognumber: Added plantobservation_id, so that catalognumber_dwc (a sort of authorSpecimenCode for plots data) only needs to be unique within a plant. Eventually, we will want to migrate the mappings so that collectionnumber is used for this purpose instead.
- 10:16 PM Revision 3957: schemas/vegbien.sql: specimenreplicate: Made plantobservation_id optional again, since indirect vouchers do create specimenreplicates without a parent plantobservation. schemas/vegbien.ERD.mwb: Fixed lines.
- 10:02 PM Revision 3956: schemas/vegbien.sql: specimenreplicate: Made plantobservation_id required, since that is now the parent table fkey
- 10:00 PM Revision 3955: schemas/vegbien.ERD.mwb: Fixed lines
- 09:51 PM Revision 3954: schemas/vegbien.ERD.mwb: Adjusted lines. Adjusted position of locationdetermination to put location directly next to locationevent. Expanded location to fill newly-available space.
- 09:37 PM Revision 3953: schemas/vegbien.sql: locationevent: Renamed authorlocationcode to authoreventcode to be consistent with the table name. Note that for our current datasources, the plot = the plot event, so the authoreventcode and authorlocationcode/authorPlotCode will be the same.
- 09:22 PM Revision 3952: mappings/VegCSV-VegBIEN.specimens.csv: Changed VegCSV term fieldNumber (from DwC) to recordNumber to be consistent with the TDWG meaning of fieldNumber, which defines it as the author code for the *event*, not the organism (what VegBIEN calls the authorlocationcode and VegBank calls the authorObsCode)
- 08:47 PM Revision 3951: mappings/VegCSV-VegBIEN.specimens.csv: Comments: Removed no longer applicable comments about XPath syntax added to affect sort order
- 08:35 PM Revision 3950: mappings/VegCSV-VegBIEN.specimens.csv: height: Removed mapping to plantobservation.overallheight, since the height is a stem field rather than a plant field. Note that a height in the *organisms* table will be mapped to the height in a single stemobservation for that plant, with NULL sourceaccessioncode and authorstemcode. Note also that this change is possible because no mapped datasource yet provides a valid overallheight with multiple stems or that differs from its single stem's height. (Although SALVIAS sometimes provides both a stem height and an organism height, that height is always either the same, or the organism height is invalid. See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>.)
- 06:56 PM Revision 3949: mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Removed obsolete mapping to growthform, since growthforms and cultivated/native information are no longer merged into one field in VegBIEN (which they were when this mapping was created)
- 06:18 PM Revision 3948: mappings/VegCSV-VegBIEN.specimens.csv: decimalLatitude/decimalLongitude: Added _nullIf suffix for mergability with VegCSV-VegBIEN.specimens.csv
- 06:10 PM Revision 3947: mappings/VegCSV-VegBIEN.specimens.csv: coordinateUncertaintyInMeters: Added _noCV suffix for mergability with VegCSV-VegBIEN.specimens.csv
- 06:00 PM Revision 3946: mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added _if wrapper for mergability with VegCSV-VegBIEN.specimens.csv
- 05:52 PM Revision 3945: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber direct voucher _if statement: Changed @name to "if *indirect* voucher", so that it's logical consistent with the else branch following it. It was previously "if *direct* voucher" because the _if statement only contained a case for direct vouchers, and the else branch was being used in place of a _not() function.
- 05:38 PM Revision 3944: mappings/roots: plots roots: Default to using VegCSV instead of VegX for new plots datasources
- 05:35 PM Revision 3943: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber _if statements: Changed @names to more descriptive comments. This also prevents the @name from looking confusingly like the condition of the _if statement, which is actually supplied through the cond param and is usually located in a separate mapping.
- 05:20 PM Revision 3942: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Split _if apart into separate _ifs for the indirect and direct voucher cases. Moved direct voucher _if inwards so it is just wrapping catalognumber_dwc itself. This will enable this mapping to be used for specimens data, which is always considered a direct voucher and will always have this _if return true. Also moved indirect voucher _if inwards in the same way, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling entire XML subtrees. Note that if the indirect voucher _if returns false, NOT NULL and CHECK constraint violations will cause the intervening voucher and specimenreplicate elements to be deleted, thus having the same effect. Use new @name syntax for distinguishing _if statements.
- 05:02 PM Revision 3941: mappings: Removed no longer used for_review/VegBIEN-DwC2.specimens.csv
- 04:49 PM Revision 3940: xml_func.py: _if(): Changed documentation about name param for distinguishing separate _if statements to use @name attribute instead, so that the XML/SQL function mechanism doesn't have to deal with code that's solely for XPath merging
- 04:09 PM Revision 3939: Regenerated vegbien.ERD exports
- 04:08 PM Revision 3938: schemas/vegbien.ERD.mwb: Fixed lines
- 03:57 PM Revision 3937: schemas/filter_ERD.csv: Removed no longer applicable specimenreplicate inheritance filters
- 03:50 PM Revision 3936: inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes additional date parsing on all date fields, which adds 1/2-1 hour to the import time. Eventually, we will want to translate _date() to PL/pgSQL and only use extra date processing if PostgreSQL's cast to timestamp doesn't work, which should greatly reduce this time.
08/09/2012
- 05:37 PM Revision 3935: Regenerated vegbien.ERD exports
- 05:35 PM Revision 3934: schemas/vegbien.sql: Removed inheritance link between specimenreplicate and taxonoccurrence, which is not needed now that specimenreplicate is mapped via plantobservation. mappings/DwC2-VegBIEN.specimens.csv: As part of this change, moved mappings to specimenreplicate fields inherited from taxonoccurrence to go directly to taxonoccurrence.
- 05:15 PM Revision 3933: Regenerated vegbien.ERD exports
- 05:14 PM Revision 3932: schemas/vegbien.ERD.mwb: Synced with schema
- 05:13 PM Revision 3931: mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Default to mapping via plantobservation rather than via voucher when no voucherType is specified, in order to be consistent with the specimens data mapping for catalogNumber
- 03:31 PM Revision 3930: Regenerated mappings/for_review/VegX-VegCSV.stems.csv. Note that running `make mappings/` did not change mappings/VegX-VegCSV.stems.csv, because all changes were deletions of lines.
- 03:29 PM Revision 3929: mappings/VegX-VegCSV.stems.csv: Removed no longer used user-defined terms (simpleUserdefined). Note that CTFS does use user-defined terms, but these are all defined in its own map spreadsheet.
- 03:24 PM Revision 3928: mappings: Removed no longer needed VegX-VegBIEN mappings
- 03:23 PM Revision 3927: mappings/Makefile: Made VegCSV-VegBIEN.specimens.csv a non-derived map, since the VegX-VegCSV mapping is no longer used. This causes automatic creation of a for_review file.
- 03:21 PM Revision 3926: plots inputs: Removed maps/.VegX.*.csv.last_cleanup
- 03:13 PM Revision 3925: plots inputs: Remapped all VegX via maps to VegCSV. See steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegX-%3EVegCSV>.
- 02:45 PM Revision 3924: join: Added map_1_core_only option that uses only columns 0 and 1 of map_1. This is useful for one-time refactoring joins where the Source column, mappings comments, etc. shouldn't be part of the datasource's via map (although they will be part of the autogenerated VegBIEN map)
- 02:33 PM Revision 3923: join: Use opts.env_usage() for usage message
- 02:04 PM Revision 3922: mappings: Made VegCSV-VegBIEN.{plots,organisms,stems}.csv symlinks to VegCSV-VegBIEN.specimens.csv
- 01:46 PM Revision 3921: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Commented out combining with DwC2-VegBIEN mappings, because merging DwC and VegX/VegCSV into one map is a lower priority than replacing all datasource VegX mappings with VegCSV (which does not require the merging but does require XPaths that don't collide, which is not yet the case)
- 01:40 PM Revision 3920: lib/xml_func.py: _if(): Made then param optional, so that user can just map to the else branch as a shortcut for logically inverting the condition. (Note that a _not() XML function does not exist yet, so this is also a workaround.)
- 01:29 PM Revision 3919: VegBIEN mappings: Wrapped dates in _date() and _dateRangeStart()/_dateRangeEnd(), to assist in importing date and date range values that PostgreSQL cannot parse. This will increase the import time, but hopefully also decrease the # of invalid values in the errors tables. (These functions can later be optimized to reduce the impact on import time.)
- 01:25 PM Revision 3918: sql_io.py: put_table(): is_literals: is_function: Fixed bug where function call needed to be recreated in each iteration of the main loop, because the arguments to the function, which are based on mapping, may change as the result of error handling replacing invalid values with NULL
- 01:13 PM Revision 3917: sql_io.py: put_table(): is_literals: Fixed bug where sql.select() that calls the function needed to be run recoverably, to auto-rollback errors. Made sql.select() cacheable because SQL functions are immutable, so it should be idempotent.
- 01:03 PM Revision 3916: mappings/DwC2-VegBIEN.specimens.csv: Remapped taxonRemarks to taxondetermination.notes because http://rs.tdwg.org/dwc/terms/#taxonRemarks indicates that these notes are "about the taxon", not the specimen/plant in general
- 12:56 PM Revision 3915: mappings/DwC2-VegBIEN.specimens.csv: Remapped eventDate to new aggregateoccurrence.collectiondate, which is a more accurate place than locationevent.obsstartdate/obsenddate because the date refers to a specific specimen. This also makes eventDate compatible with plots data.
- 12:44 PM Revision 3914: mappings/DwC2-VegBIEN.specimens.csv: Moved sex user-defined mapping to plantobservation because it's a property of the plant rather than the specimen, and so that it can also apply to plots data
- 12:31 PM Revision 3913: mappings: Remapped specimenreplicate.description to new aggregateoccurrence.notes because the notes don't necessarily refer specifically to the specimen, especially for plots data
- 12:31 PM Revision 3912: mappings: Remapped specimenreplicate.description to new aggregateoccurrence.notes because the notes don't necessarily refer specifically to the specimen, especially for plots data
- 12:21 PM Revision 3911: schemas/vegbien.sql: aggregateoccurrence: Added notes, to serve the purpose that specimenreplicate.description previously did. specimenreplicate.description is not appropriate for plots data, and often not appropriate even for specimens data, which uses fieldNotes as a general notes field rather than a description of the specimen.
- 12:07 PM Revision 3910: schemas/vegbien.sql: aggregateoccurrence: Reordered linecover so it's near cover instead of at the end
- 12:02 PM Revision 3909: schemas/vegbien.sql: Moved collectiondate from specimenreplicate to aggregateoccurrence because it's actually the SALVIAS census_date, which is the date the plant was sampled, rather than the DwC eventDate, which is the date the specimen was collected
- 11:56 AM Revision 3908: mappings/DwC2-VegBIEN.specimens.csv: Mapped specimenreplicate via plantobservation for consistency with plots data. (This change is required for VegCSV table merging to work properly.) This is also a more accurate way of representing the data, because a specimen in fact comes from a plant, and it's natural to place the plant-related data (measurements, etc.) in the plantobservation table.
- 11:42 AM Revision 3907: mappings/DwC2-VegBIEN.specimens.csv: Mapped specimenreplicate via plantobservation for consistency with plots data. (This change is required for VegCSV table merging to work properly.) This is also a more accurate way of representing the data, because a specimen in fact comes from a plant, and it's natural to place the plant-related data (measurements, etc.) in the plantobservation table.
- 10:41 AM Revision 3906: mappings/VegX-VegCSV.stems.csv: Remapped stem notes to new stemNotes term, and mapped new organism notes VegX XPath to now-available DwC fieldNotes
- 10:30 AM Revision 3905: inputs/SALVIAS/maps/VegX.organisms.csv: Map organism notes to different place than stem notes, because these are separate fields
- 10:09 AM Revision 3904: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Temporarily sort by input column rather than output column, to assist in finding terms that map to different places in the DwC- and VegX-VegBIEN mappings
- 10:02 AM Revision 3903: mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Use new all option to union, in order to manually review inputs which appear in both maps but map to different places
- 10:01 AM Revision 3902: union: Added full flag to turn off merging mappings that are in both maps, in order to review inputs which appear in both maps but map to different places
- 09:57 AM Revision 3901: mappings/Makefile: Merged .VegX-VegCSV.stems.csv.last_cleanup into .%.last_cleanup, since VegX-VegCSV.stems.csv now uses the same cleanup operations as the other non-derived maps. Note that this automatically creates a file in for_review for VegX-VegCSV.stems.csv, which is currently identical to it.
- 09:52 AM Revision 3900: mappings/Makefile: .%.last_cleanup: Removed simplify_xpath because non-derived maps will now have VegX XPaths in their Source column URLs, which should not be modified
- 09:50 AM Revision 3899: mappings/Makefile: VegX-VegCSV.stems.csv: Removed autogeneration command because once file has been generated, regeneration is no longer needed
- 09:42 AM Revision 3898: mappings/Makefile: Fixed bug where VegX-VegCSV.stems.csv needed to be removed from $(vegcsvMaps) so it wouldn't be deleted on `make clean`
- 08:53 AM Revision 3897: mappings/VegX-VegCSV.stems.csv: Source: Put URLs in the order their terms appear in the VegCSV term name
- 08:38 AM Revision 3896: mappings/VegX-VegCSV.stems.csv: Comments: Changed "Table name" to "Table" to be concise
- 08:37 AM Revision 3895: mappings/VegX-VegCSV.stems.csv: Mapped VegX community fields
- 08:28 AM Revision 3894: mappings/VegX-VegCSV.stems.csv: Mapped VegX cover-related fields
- 08:26 AM Revision 3893: mappings/VegX-VegCSV.stems.csv: Changed authorPlantCode to the associated DwC term fieldNumber
- 08:04 AM Revision 3892: mappings/VegX-VegCSV.stems.csv: Changed locationNarrative to the associated DwC term locality
- 08:00 AM Revision 3891: mappings/VegX-VegCSV.stems.csv: Changed collectedDate to the associated DwC term eventDate
- 07:54 AM Revision 3890: mappings/VegX-VegCSV.stems.csv: Added plot prefix to eventStartDate/eventEndDate to distinguish it from the DwC eventDate, which is the date the *specimen* was collected
- 07:40 AM Revision 3889: mappings/VegX-VegCSV.stems.csv: Order within table: Updated order #s for salvias_plots terms that got changed to SALVIAS data dictionary terms
- 07:33 AM Revision 3888: mappings/VegX-VegCSV.stems.csv: Changed collector name parts to the associated DwC term recordedBy
- 07:11 AM Revision 3887: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS voucher type
08/08/2012
- 11:09 PM Revision 3886: mappings/VegX-VegCSV.stems.csv: Mapped collector name parts
- 11:00 PM Revision 3885: mappings/VegX-VegCSV.stems.csv: Table names ("." prefixes) merged into name where possible, for consistency. computer taxonomic elements have not been merged because the field part should exactly match the corresponding DwC term.
- 10:53 PM Revision 3884: mappings/VegX-VegCSV.stems.csv: Order within table: If Source has multiple URLs, ensure each source has its own order
- 10:44 PM Revision 3883: mappings/VegX-VegCSV.stems.csv: Order within table: Separate orders of multiple elements with "," instead of ";", for consistency with the Source column
- 10:42 PM Revision 3882: mappings/VegX-VegCSV.stems.csv: Changed authorPlotCode terms to a variation of VegX's plotName, for standardization with VegX
- 10:37 PM Revision 3881: mappings/VegX-VegCSV.stems.csv: Changed uniqueIDs with table names to the table name + "ID", for standardization
- 10:26 PM Revision 3880: mappings/VegX-VegCSV.stems.csv: Changed terms with table names to DwC terms where possible
- 10:19 PM Revision 3879: mappings/VegX-VegCSV.stems.csv: Removed comments about alternate names, as these will be included in a separate "VegCSV-alt" mapping to "VegCSV-core" terms
- 10:17 PM Revision 3878: mappings/VegX-VegCSV.stems.csv: Clarified comments about the inclusion of the table name
- 10:12 PM Revision 3877: mappings/VegX-VegCSV.stems.csv: Mapped plotObservation user-defined terms
- 09:59 PM Revision 3876: mappings/VegX-VegCSV.stems.csv: Mapped VegX plotObservation fields
- 09:40 PM Revision 3875: mappings/VegX-VegCSV.stems.csv: Corrected sources of DwC terms to point to the actual DwC term, where needed. eventDate parts: Added source for VegBank field used as named suffix.
- 09:35 PM Revision 3874: mappings/VegX-VegCSV.stems.csv: Corrected sources of VegX names to point to the actual VegX field name, where needed
- 09:28 PM Revision 3873: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS stem tags
- 09:22 PM Revision 3872: mappings/VegX-VegCSV.stems.csv: Corrected parent plot-only mappings by prefixing "parentPlot."
- 09:18 PM Revision 3871: mappings/VegX-VegCSV.stems.csv: Mapped VegX //plot/plotName
- 09:14 PM Revision 3870: mappings/VegX-VegCSV.stems.csv: Mapped VegX //plot/plotUniqueIdentifier
- 09:00 PM Revision 3869: mappings/VegX-VegCSV.stems.csv: Source SALVIAS terms from the SALVIAS data dictionary when possible, to provide an automatic link to the description of the term. Having these direct links will also assist in creating a data dictionary for VegCSV and eventually VegBIEN (using mappings/VegCSV-VegBIEN.specimens.csv). Note that many SALVIAS terms exist only in the live database, as they are not part of the export format documented in the data dictionary.
- 08:31 PM Revision 3868: mappings/VegX-VegCSV.stems.csv: Source VegBank terms directly from the appropriate VegBank data dictionary page, to provide an automatic link to the description of the term. Having these direct links will also assist in creating a data dictionary for VegCSV and eventually VegBIEN (using mappings/VegCSV-VegBIEN.specimens.csv).
- 08:18 PM Revision 3867: mappings/VegX-VegCSV.stems.csv: Mapped VegX relativePlotPosition terms
- 08:02 PM Revision 3866: maps with Order column: Renamed Order column to Order within table for clarity
- 08:00 PM Revision 3865: maps with Order column: Renamed Order column to Order within table for clarity
- 07:57 PM Revision 3864: maps with Source column: Added original column name to source URLs, so that source name is completely specified. For official DwC terms, this also allows linking directly to the term. Fixed nimoy phpMyAdmin links so that going to the link in a browser would take you straight there after login.
- 06:53 PM Revision 3863: mappings/VegX-VegCSV.stems.csv: Corrected SALVIAS stem diameter terms to place original name (before expansion for clarity) in the Comments column instead of appending it to the source URL, because the source URL should point just to the table the term is in. The actual term is identified directly by its order # and indirectly by the name of the VegCSV term, which should be similar (if not, the original term should be listed in the comments).
- 06:46 PM Revision 3862: mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS stem diameter terms
- 06:35 PM Revision 3861: mappings/VegX-VegCSV.stems.csv: Mapped VegX project terms
- 06:29 PM Revision 3860: mappings/VegX-VegCSV.stems.csv: VegX plot terms: Added order
- 06:25 PM Revision 3859: mappings/VegX-VegCSV.stems.csv: Mapped non-user-defined height XPath
- 06:23 PM Revision 3858: mappings/VegX-VegCSV.stems.csv: Changed source of height to VegX, because there is a VegX height field
- 06:20 PM Revision 3857: mappings/VegX-VegCSV.stems.csv: Mapped VegX plot terms except unique keys
- 06:11 PM Revision 3856: mappings/VegX-VegCSV.stems.csv: Mapped remaining sourceAccessionCode user-defined terms to <VegX-table>.uniqueID
- 06:06 PM Revision 3855: mappings/VegX-VegCSV.stems.csv: Corrected sources of VegX names to point to the appropriate element in veg.xsd, rather than the appropriate type, because the names we used actually came from veg.xsd's top-level elements rather than from the type names
- 05:57 PM Revision 3854: mappings/VegX-VegCSV.stems.csv: Changed plantObservation.sourceAccessionCode to individualOrganismObservation.uniqueID, to be consistent with VegX names. (*source*AccessionCode only applies to an aggregate DB that preserves info from its inputs. accessionCode made less sense, because this field is for the datasource's primary key, which it may or may not consider an accession code.)
- 05:39 PM Revision 3853: mappings/VegX-VegCSV.stems.csv: Mapped aggregateOrganismObservation terms
- 05:36 PM Revision 3852: mappings/VegX-VegCSV.stems.csv: Changed base back to baseSaturation to distinguish this pH-related concept from other meanings of base, and to match VegBank
- 05:26 PM Revision 3851: mappings/DwC2-VegBIEN.specimens.csv: Removed no longer applicable comments, which were from the very first NY/SALVIAS->VegX/VegBank mapping and had been preserved by the map spreadsheet transformation scripts. Note that many comments have been left, because they either provide explanatory information or because we never reached a decision on the questions posed (such as many of Brad's "OMIT" comments).
- 05:18 PM Revision 3850: mappings/VegX-VegCSV.stems.csv: Removed no longer applicable comments, which were from the very first NY/SALVIAS->VegX/VegBank mapping and had been preserved by the map spreadsheet transformation scripts
- 05:15 PM Revision 3849: mappings/VegX-VegCSV.stems.csv: Mapped individualOrganismObservation user-defined terms
- 04:09 PM Revision 3848: Regenerated vegbien.ERD exports
- 04:02 PM Revision 3847: schemas/vegbien.ERD.mwb: Added link to VegBIEN schema wiki page
- 03:46 PM Revision 3846: inputs/import.stats.xls: Updated with stats from latest import
- 03:40 PM Revision 3845: README.TXT: After a new import: Added steps to check inputs' error counts and only continue with deleting previous imports, etc. if there were little to no errors. Added step to record the import times.
08/07/2012
- 09:45 AM Revision 3844: mappings/VegX-VegCSV.stems.csv: Mapped VegBank and SALVIAS abioticObservation terms
- 09:08 AM Revision 3843: mappings/VegX-VegCSV.stems.csv: Resolved ambiguous terms that appeared twice on the output side
- 08:52 AM Revision 3842: mappings/VegX-VegCSV.stems.csv: Mapped VegX abioticObservation terms
- 08:36 AM Revision 3841: mappings/VegX-VegCSV.stems.csv: Mapped standard DwC terms
- 08:13 AM Revision 3840: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
- 08:06 AM Revision 3839: mappings/DwC1-DwC2.specimens.csv: Corrected mapping for previousCatalogNumber
- 08:00 AM Revision 3838: mappings/DwC1-DwC2.specimens.csv: Added source of datasources' custom terms
- 07:51 AM Revision 3837: mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.2 (http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd), aka DwC Classic, terms
- 07:43 AM Revision 3836: mappings/DwC1-DwC2.specimens.csv: Added source of custom NY staging table terms in nimoy.bien2_staging.nybg_raw
- 07:27 AM Revision 3835: mappings/DwC1-DwC2.specimens.csv: Added source of DwC 1.21 (http://digir.net/schema/conceptual/darwin/manis/1.21/darwin2.xsd) terms
- 07:02 AM Revision 3834: mappings/DwC2-VegBIEN.specimens.csv, DwC1-DwC2.specimens.csv: Sources: Replaced DwC with http://rs.tdwg.org/dwc/terms/, because DwC terms can come from many places but the DwC source referred specifically to this web page
- 06:51 AM Revision 3833: mappings/DwC1-DwC2.specimens.csv: Added source of remappings of DwC terms with /_alt added
- 06:46 AM Revision 3832: mappings/DwC1-DwC2.specimens.csv: Added source of DwC terms with namespace removed
- 06:32 AM Revision 3831: mappings/VegX-VegCSV.stems.csv: Added "computer." before taxonomic terms whose VegX mapping used the "computer" role. (This is useful for datasources that supply separate determinations in the same row, such as SALVIAS.)
- 06:23 AM Revision 3830: mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
- 06:07 AM Revision 3829: inputs/SALVIAS*/maps/VegX.organisms.csv: Fixed missing join mappings for stemobservation-related fields
- 05:56 AM Revision 3828: mappings/DwC2-VegBIEN.specimens.csv: Repopulated Order values for the few rows that had lost it in the process of copying and pasting mappings
- 05:49 AM Revision 3827: mappings/DwC2-VegBIEN.specimens.csv: Added Source column containing "DwC" for every field with a an entry in the Order column, so that the source of the term can be tracked once we start combining DwC and VegCSV
- 05:38 AM Revision 3826: mappings/Makefile: VegX-VegCSV.stems.csv: Clean up when edited using sort_map
- 05:27 AM Revision 3825: Added mappings/VegCSV-VegBIEN.specimens.csv, which is generated from VegX-VegCSV.stems.csv
- 05:19 AM Revision 3824: mappings/for_review: svn:ignore OpenOffice.org lock files
- 05:14 AM Revision 3823: Added mappings/VegX-VegCSV.stems.csv. The initial version is autogenerated by joining the simplified VegBIEN XPaths of related maps.
- 05:05 AM Revision 3822: join: Support discarding multiple outputs if they should be considered ambiguous
- 04:40 AM Revision 3821: input.Makefile: Maps validation: $(missingMappingsCmd): Support non-DwC mappings by matching entire line containing mapping, not just word characters. Remove any XML function so that merging of non-empty join mappings still works properly.
- 03:35 AM Revision 3820: mappings/Makefile: Use new invert
- 03:35 AM Revision 3819: Added invert
- 03:31 AM Revision 3818: mappings/Makefile: for_review/VegBIEN-DwC2.specimens.csv: Include all comments column(s), not just the first
- 03:27 AM Revision 3817: cols: Removed special handling of '+' because list_subset() now handles this col_num value itself, by appending the rest of the columns. Support intermixing int and '+' columns, by using new format.str2int_passthru().
- 03:23 AM Revision 3816: util.py: list_subset(): Made an index of '+' append the rest of the list
- 03:21 AM Revision 3815: format.py: Added str2int_passthru()
- 03:16 AM Revision 3814: cols: Changed value for all columns to '+' so that it wouldn't need to be shell-escaped as '*' was
- 01:42 AM Revision 3813: review: Remove keys except last. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
- 01:39 AM Revision 3812: mappings/DwC2-VegBIEN.specimens.csv: Use :[] instead of [] for all XML functions, so that the XML function args will get removed by review
- 01:18 AM Revision 3811: review: Remove XML functions. This should increase the number of matches between human-readable VegBIEN XPaths of VegX and DwC2.
- 12:34 AM Revision 3810: mappings/Makefile: human-readable maps in for_review: Simplify just the output column so that the input column can be programmatically linked back to the original input names/XPaths
- 12:26 AM Revision 3809: mappings/Makefile: Removed no longer used $(chRoot), $(cpReview)
- 12:23 AM Revision 3808: Removed the human-readable mappings mappings/for_review/VegX-VegBIEN.plots.csv, VegX-VegBIEN.organisms.csv because these are now duplicates of VegX-VegBIEN.stems.csv
- 12:20 AM Revision 3807: review: Support limiting the XPath simplifying to custom columns, rather than always the first two
- 12:12 AM Revision 3806: review: Usage message: Fixed typo
- 12:10 AM Revision 3805: Added mappings/for_review/VegBIEN-DwC2.specimens.csv, generated by inverting for_review/DwC2-VegBIEN.specimens.csv. This will be used to help translate VegX->VegCSV.
08/06/2012
- 11:44 PM Revision 3804: mappings: Made VegX-VegBIEN.organisms.csv, VegX-VegBIEN.plots.csv symlinks to VegX-VegBIEN.stems.csv instead of building them in the Makefile by copying VegX-VegBIEN.stems.csv, since these files are now always the same
- 09:29 PM Revision 3803: mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map right-hand side of _eq in the left-hand side mapping, rather than in all then/else mappings. Distinguish this _if statement from others using new name param.
- 09:16 PM Revision 3802: xml_func.py: _if(): Documented that can add `name` param to distinguish separate _if statements
- 09:08 PM Revision 3801: xml_func.py: _if(): Made cond optional. When it's not specified or None, it is treated as False. This supports cases where all elements of the condition are required but not mapped to.
- 08:50 PM Revision 3800: mappings/VegX-VegBIEN.stems.csv: _if that maps to specimenreplicate via plantobservation or voucher: Refactored to map voucherType directly into _if/cond/_eq/left rather than mapping it to a temporary _ignore location and retrieving it with _ref
- 08:47 PM Revision 3799: xml_func.py: Removed no longer used _simplifyPath(), which is now a built-in function of db_xml.put()
- 08:36 PM Revision 3798: xml_func.py: _eq(): Documented that '' (empty node) is returned if a value was not mapped to, not if a value was None, since None arguments are no longer removed by process() (now XML functions do this manually with conv_items())
- 08:19 PM Revision 3797: xml_func.py: _ref(): Only display "XPath reference target missing" warning if target node does not exist, not if it exists but is empty
- 08:17 PM Revision 3796: xpath.py: get(): reference expansion: Use get_1() and check for None result instead of using get(), which returns multiple nodes when we just want the first
- 07:39 PM Revision 3795: mappings/VegX-VegBIEN.stems.csv: Reversed XPaths so that they start with location instead of plantobservation
- 07:30 PM Revision 3794: lib/common.Makefile: Added $(cp)
- 05:58 PM Revision 3793: mappings/Makefile: Include lib/common.Makefile
- 05:57 PM Revision 3792: lib/common.Makefile: Added $(CP)
- 05:36 PM Revision 3791: inputs/import.stats.xls: Updated with stats from latest import
08/03/2012
- 09:59 PM Revision 3790: mappings/VegX-VegBIEN.stems.csv: Reversed input XPaths so that they start with plot instead of individualOrganismObservation as stem
- 09:57 PM Revision 3789: inputs/CTFS: Disabled maps because CTFS is not yet compatible with reversed XPaths, but the effort required to make it compatible is not worth including in the current commit. We lose only 2 test rows of test VegX data by doing this, since the full CTFS VegX files were never able to be imported.
- 08:31 PM Revision 3788: ch_root, ch_root_via: Documented that these are usually *not* idempotent operations
- 07:42 PM Revision 3787: mappings/VegX-VegBIEN.stems.csv: input (VegX) root: Removed tcs namespace URL to simplify the XPath reversing process. It isn't needed now that we don't generate intermediate XML documents in the automated tests (because intermediate formats are no longer required to be XML schemas).
- 07:16 PM Revision 3786: mappings/DwC2-VegBIEN.specimens.csv: Reversed XPaths so that they start with location instead of specimenreplicate
- 07:00 PM Revision 3785: README.TXT: WinMerge setup: Documented how to get to Compare Options page
- 06:59 PM Revision 3784: README.TXT: WinMerge setup: Added step to set Whitespace to Ignore change
- 06:55 PM Revision 3783: README.TXT: Moved WinMerge setup to separate section. Changed Moved block detection link to the Configuration page.
- 06:32 PM Revision 3782: mappings/VegX-VegBIEN.stems.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
- 06:17 PM Revision 3781: mappings/VegX-VegBIEN.stems.csv: VegX XPaths: Expanded {} expressions using expand_braces, so that later use of expand_braces on the file would not affect the VegX output mappings of the inputs' via maps (VegX.organisms.csv, etc.)
- 05:54 PM Revision 3780: mappings/DwC2-VegBIEN.specimens.csv: Expanded {} expressions using expand_braces, so that each distinct output for the same input is on its own line, improving readability. This will also help enable search-and-replace reversing of XPaths for the re-rooting to location.
- 05:52 PM Revision 3779: README.TXT: Accepting test cases: Documented that when refactoring mappings, it's helpful to use WinMerge to detect moved lines
- 05:14 PM Revision 3778: expand_braces: Fixed bug where needed to get next line from stdin in raw mode, so that \ won't be parsed as escape chars
- 04:59 PM Revision 3777: join: Fixed bug where when an input to mapped to multiple outputs, the joined row for each output needed to be output separately using writer.writerow()
- 03:52 PM Revision 3776: sort_map: Remove duplicates resulting from multiple outputs for the same input. mappings/Makefile: $(mkSelfMap): Removed uniq now that sort_map does this.
- 03:24 PM Revision 3775: mappings/Makefile: $(mkSelfMap): Run uniq on the output to remove duplicates resulting from multiple outputs for the same input
- 03:10 PM Revision 3774: expand_braces: Also expand XPaths containing [], with up to one level of nesting (which is the most we currently use), because many {} XPaths do in fact contain []. Debug-print intermediate values when env var expand_braces_debug is true. Added usage message.
08/02/2012
- 11:13 PM Revision 3773: expand_braces: Fixed bug where ./{ and brackets with commas inside {} are unparseable, and should not be expanded
- 11:05 PM Revision 3772: expand_braces: Fixed bug where `head -1` seemed to read more lines than just the first, causing EOF to be returned after the first line, by using `read` instead. Support data containing \r (such as Excel-dialect CSVs) by removing it. Fixed bug where ./{...} was not being properly escaped.
- 10:08 PM Revision 3771: Added expand_braces
- 09:12 PM Revision 3770: mappings: location: Removed centerlatitude/centerlongitude mappings because the lat/long should be in only one place: the locationdetermination. It is up to the database querier to decide which locationdetermination(s) to use as the coordinates for a plot/specimen.
- 08:54 PM Revision 3769: bin/map: input is CSV: Removed unused map_ var
- 08:50 PM Revision 3768: bin/map: Documented that it's multi-safe (supports an input appearing multiple times)
- 08:39 PM Revision 3767: subtract: Documented that it's multi-safe (supports an input appearing multiple times)
- 08:32 PM Revision 3766: join: Made it multi-safe (supports an input appearing multiple times)
- 08:30 PM Revision 3765: lib/common.Makefile: Added empty clean target to make sure `make clean` always works
- 08:03 PM Revision 3764: root Makefile, input.Makefile: Maps validation: Treat missing join mappings differently from missing non-empty join mappings, because they indicate mapping to an invalid location, which is a bug. Factored maps validation code out into new lib/mappings.Makefile.
- 08:00 PM Revision 3763: lib/common.Makefile: Added vars for chars not allowed in make targets. Added functions/vars to replace "_" with " ".
- 07:38 PM Revision 3762: root Makefile: Include lib/common.Makefile
- 07:37 PM Revision 3761: input.Makefile: Include lib/common.Makefile
- 06:48 PM Revision 3760: intersect: Documented that it's multi-safe (supports an input appearing multiple times)
- 06:42 PM Revision 3759: union: Documented that it's multi-safe (supports an input appearing multiple times)
- 06:00 PM Revision 3758: mappings/DwC2-VegBIEN.specimens.csv: Moved shared /specimenreplicate root to mappings in preparation for reversing the XPaths so that parent table paths (such as location) don't contain a prefix for child tables (specimenreplicate, locationevent, etc.). This reversing will avoid the need to "ch_root" the child table map to obtain maps for parent tables with the prefixes removed, allowing all hierarchical levels to use the same map spreadsheet.
- 05:53 PM Revision 3757: ch_root: Support column headers without a root, for non-hierarchical formats such as DwC
- 05:45 PM Revision 3756: lib/common.Makefile: rsync: Time the rsync operation
- 05:29 PM Revision 3755: in_place: Wrap EXIT handler in shell function so that "-escaping can easily be used on the temp file path
- 05:26 PM Revision 3754: in_place: Documented that doesn't update file on error
- 05:23 PM Revision 3753: DwC mappings: Removed ':/list/' root (full version: '::[@xmlns:dcterms=http://purl.org/dc/terms/]/list/') from map spreadsheets to simplify the boilerplate in each file. Since intermediate DwC XML files no longer need to be produced for automated tests, these roots are not needed.
- 04:46 PM Revision 3752: inputs/import.stats.xls: Updated with stats from latest import
- 04:40 PM Revision 3751: inputs/import.stats.xls: Moved independent-import data to separate tab so that it wouldn't get moved to the side whenever a new column of simultaneous-import data is inserted. It is also no longer updated, because all column-based imports are now done simultaneously.
- 04:32 PM Revision 3750: Use strings.ustr() or strings.urepr() everywhere that columns are stringified, in order to support column names with non-ASCII characters (such as in the Madidi data)
- 04:16 PM Revision 3749: strings.py: concat(): Convert args to raw (non-Unicode) strings first, so that multi-byte Unicode sequences are considered by # of bytes instead of # of chars. This is necessary because PostgreSQL truncates identifiers by # of bytes instead of # of chars, so that identifiers will actually be less than 63 chars long when some chars were multi-byte.
- 04:11 PM Revision 3748: strings.py: ustr(): Call __str__() method manually like urepr() to avoid Unicode errors when the returning string is non-ASCII
- 03:54 PM Revision 3747: strings.py: Added urepr() and use it in repr_no_u(), to better support repr() return values with non-ASCII characters. Avoiding repr() also provides a more complete stack trace in the case of such errors.
08/01/2012
- 11:37 AM Revision 3746: schemas/vegbien.sql: plantobservation: plantobservation_aggregateoccurrence_count_1() trigger: Don't raise an error if existing count was >1, because there are in fact datasets (notably SALVIAS) where input records for individual stems may themselves contain aggregate data (such as plant and stem counts). For this data, we have an anomalous condition where an aggregateoccurrence has count >1 but contains one plantobservation, due to the plant/stem count being included in the first stem's record. (See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Data-interpretation-issues> for more info on this problem.) Note that our desired 1:1 relationship between aggregateoccurrence and plantobservation is still guaranteed by a constraint, but the anomalous data may still cause irregularities later on in the analysis.
- 10:55 AM Revision 3745: sql_io.py: put_table(): Ignoring all rows on unrecoverable errors: Also support the case where has_joins == True, by setting it to False so that the no-joins case is effectively used
- 10:32 AM Revision 3744: inputs/import.stats.xls: Moved Simultaneously above Independently because that is how we are now running the imports
- 10:21 AM Revision 3743: Regenerated vegbien.ERD exports
- 09:50 AM Revision 3742: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
- 09:48 AM Revision 3741: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes with a `WHERE sourceaccessioncode IS NULL` filter: Added IS NULL filters for other unique keys, so that these fallback indexes would only be used if there was no (or no other) way to uniquely identify their tables. For *_1_to_1 unique indexes, this is the case for specimens data.
- 09:41 AM Revision 3740: schemas/vegbien.sql: stemobservation: Replaced stemobservation_unique_code unique constraint with stemobservation_unique_within_plantobservation unique index that uses COALESCE() and WHERE ... IS NOT NULL appropriately, to work with sql_gen's use of COALESCE() indexes and (for the renaming) to better reflect what it does
- 09:36 AM Revision 3739: schemas/vegbien.ERD.mwb: Synced with schema
- 09:30 AM Revision 3738: schemas/vegbien.sql: *_1_to_1 and *_unique_within_* unique indexes intended to operate only when sourceaccessioncode is NULL: Changed to use `sourceaccessioncode IS NULL` WHERE condition instead of COALESCE(sourceaccessioncode, ...) element, since the sourceaccessioncode is not actually needed for the uniquification (it is already globally unique within the datasource if it's not NULL; this just covers the case where it is NULL)
- 09:23 AM Revision 3737: schemas/vegbien.sql: *_unique_within_* unique indexes used for 1:1 relationships: Renamed to *_*_1_to_1 to better reflect what they do
- 09:21 AM Revision 3736: schemas/vegbien.sql: *_unique_within_* unique indexes used for 1:1 relationships: Renamed to *_*_1_to_1 to better reflect what they do
- 08:58 AM Revision 3735: schemas/vegbien.sql: plantobservation: Corrected plantobservation_aggregateoccurrence_id_1_to_1's name to plantobservation_aggregateoccurrence_1_to_1 because it's 1:1 with aggregateoccurrence, not aggregateoccurrence_id. Made it a unique index for consistency with our general method of expressing unique constraints on potentially nullable columns.
- 08:54 AM Revision 3734: schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_plantobservation to specimenreplicate_plantobservation_1_to_1 to better reflect what it does
- 08:50 AM Revision 3733: schemas/vegbien.sql: locationevent unique indexes: Renamed to *_unique_within_* to better reflect what they do
- 08:34 AM Revision 3732: schemas/vegbien.sql: location: Removed redundant location_unique_sourceaccessioncode unique constraint, which has been replaced by location_unique_within_datasource
- 08:31 AM Revision 3731: schemas/vegbien.sql: Reset foreign key constraint names to autogenerated defaults for consistency
- 08:27 AM Revision 3730: schemas/vegbien.sql: Renamed *_unique_datasource unique indexes to *_unique_within_datasource to better reflect what they do
- 08:25 AM Revision 3729: schemas/vegbien.sql: locationevent: Renamed locationevent_unique_accessioncode to locationevent_unique_within_location to better reflect what it does
- 08:22 AM Revision 3728: schemas/vegbien.sql: specimenreplicate: Renamed specimenreplicate_unique_accessioncode to specimenreplicate_unique_within_datasource to better reflect what it does
- 08:11 AM Revision 3727: schemas/vegbien.sql: stemobservation: Renamed stemobservation_unique_accessioncode to stemobservation_unique_within_plantobservation and also apply it to NULL sourceaccessioncodes, so that a plantobservation can have a single stemobservation for its single stem's traits without needing a separate sourceaccessioncode for it
- 08:02 AM Revision 3726: schemas/vegbien.sql: aggregateoccurrence: Removed redundant aggregateoccurrence_unique_accessioncode unique constraint, which has been replaced by aggregateoccurrence_unique_within_taxonoccurrence
- 07:43 AM Revision 3725: schemas/vegbien.sql: plantnamescope: Added CHECK constraint to ensure that at least one key column is specified (an empty plantnamescope doesn't make sense; use NULL instead)
- 07:32 AM Revision 3724: schemas/vegbien.ERD.mwb: Synced with schema
- 07:23 AM Revision 3723: ch_root: Don't require both the input and output mappings to contain their respective new roots, since sometimes only one or the other root is being subset. This will occur, for example, in mappings that are flat on the input but normalized on the output, such as VegCSV.
- 07:06 AM Revision 3722: VegBIEN: Reversed aggregateoccurrence<->plantobservation relationship to point from plantobservation->aggregateoccurrence, so plantobservation could be scoped by aggregateoccurrence in the same way as all other core tables are scoped by their parent tables. This reversed direction was an anomaly due to the need to have a trigger auto-set aggregateoccurrence.count to 1 when there was an associated plantobservation. This was most easily accomplished on the aggregateoccurrence table itself, but required the reversed relationship. The trigger has now been reimplemented on plantobservation, which externally updates aggregateoccurrence.count.
- 06:53 AM Revision 3721: input.Makefile: Testing: diffing test outputs: Ignore changes in whitespace, due to e.g. different indent levels. This facilitates accepting tests when an element has been nested inside another element (or unnested), by showing only the opening and closing tags of the new outer element.
- 06:42 AM Revision 3720: dicts.py: DictProxy: Fixed bug where default value for inner param needed to be created in the constructor, or else every default instance would use and modify the same dictionary
- 06:26 AM Revision 3719: db_xml.py: put(): wrap_e(): Call augment_error() to add the current node to the error message
- 06:14 AM Revision 3718: db_xml.py: put(): Raise an error if there are multiple fields with the same name, instead of silently overwriting the first with the second. This generally indicates the need to use `:[@merge=1]` on the fields in question.
- 06:11 AM Revision 3717: dicts.py: Added OnceOnlyDict and helper exception KeyExistsError
- 06:10 AM Revision 3716: dicts.py: DictProxy: Added default value for inner param to facilitate creating empty wrapped dicts
- 05:48 AM Revision 3715: bin/map: out_is_db: row-based mode: Debug-log the processed XML tree produced by xml_func.process()
- 05:16 AM Revision 3714: sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
- 05:12 AM Revision 3713: sql_io.py: put_table(): Fixed bug where Missing mapping for NOT NULL column errors should actually be warnings because sometimes the mappings include extra tables which aren't used by the dataset
- 03:18 AM Revision 3712: schemas/vegbien.sql: aggregateoccurrence: Added UNIQUE INDEX that makes an aggregateoccurrence unique within a taxonoccurrence. When the sourceaccessioncode isn't specified (as for individual organisms data, where this goes in plantobservation and taxonoccurrence), this ensures a 1:1 relationship between aggregateoccurrence and taxonoccurrence.
- 03:08 AM Revision 3711: schemas/vegbien.sql: taxonoccurrence: Added UNIQUE INDEX that makes a taxonoccurrence unique within a locationevent. When the sourceaccessioncode isn't specified (as for specimens data), this ensures a 1:1 relationship between taxonoccurrence and locationevent.
- 03:05 AM Revision 3710: mappings/VegX-VegBIEN.stems.csv: binomial (full) plantname: Also mapped to an alternative for taxonoccurrence.sourceaccessioncode, for aggregate plots data that distinguishes taxonoccurrences only by plantname (such as CVS)
- 02:23 AM Revision 3709: exc.py: e_msg(): Fixed bug where exceptions with nothing in e.args (such as StopIteration) caused a failed assertion. Fixed bug where exceptions with multiple values in e.args (such as certain IOErrors) caused a failed assertion.
- 01:27 AM Revision 3708: sql.py: flatten(): Documented that shouldn't cache query because the temp table will usually be truncated after use
- 01:05 AM Revision 3707: sql_gen.py: merge_not_null(): For clarity, use to_text() to represent NULL as the string 'NULL' instead of as the null sentinel for the column's type
- 01:02 AM Revision 3706: sql_gen.py: Added to_text() and helper value null_as_str
- 12:52 AM Revision 3705: mappings/VegX-VegBIEN.stems.csv: plantobservation: sourceaccessioncode, authorplantcode: Removed no longer needed mapping to specimenreplicate.sourceaccessioncode, since specimenreplicate for plots data is now identified by its plantobservation fkey, without needing its own sourceaccessioncode
07/31/2012
- 10:41 PM Revision 3704: sql_io.py: put_table(): ignore_cond(): Fixed bug where if is_literals, need to return NULL, instead of trying to filter invalid rows out of a nonexistant input table
- 09:57 PM Revision 3703: mappings/VegX-VegBIEN.stems.csv: Replaced "/}" (with unnecessary "/") with "}"
- 09:51 PM Revision 3702: mappings/VegX-VegBIEN.stems.csv: Replaced doubled "/"s with single "/"
- 09:05 PM Revision 3701: backups/Makefile: Added synchronization of backups with vegbiendev. Added downloading backups to After a new import steps.
- 09:04 PM Revision 3700: lib/common.Makefile: rsync: $(remote): Fixed bug where the inputs/ dir was hardcoded, when the remote dir name needed to be determined dynamically based on the Makefile dir
- 08:54 PM Revision 3699: backups/Makefile: Refactored to include lib/common.Makefile
- 08:46 PM Revision 3698: inputs/Makefile: Added download-logs to download import logs onto local machine and added it to the "After a new import" steps
- 08:36 PM Revision 3697: Moved generally useful targets and vars from inputs/Makefile to lib/common.Makefile and lib/forwarding.Makefile
- 08:04 PM Revision 3696: bin/map: Don't create unneeded /_ignore/inLabel element containing the datasource name because sql_io.put_table() now autopopulates the datasource_id
- 07:57 PM Revision 3695: schemas/functions.sql, py_functions.sql: Removed no longer needed relational functions, since sql_io.put_table() supports regular SQL functions
07/30/2012
- 08:31 PM Revision 3694: inputs/Madidi/maps/VegX.plots.csv: Mapped all mappable columns
- 08:28 PM Revision 3693: mappings/VegX-VegBIEN.stems.csv: elevation, elevationrange: Added _rangeStart/_rangeEnd filter
- 08:19 PM Revision 3692: sql_io.py: Wrapping mapping in a sql_gen.ColDict: Documented that sql_gen.ColDict sanitizes both keys and values passed into it
- 08:18 PM Revision 3691: sql_gen.py: ColDict: Documented that anything that isn't a column is wrapped in a NamedCol
- 08:04 PM Revision 3690: README.TXT: Datasource setup: Accepting the test cases: Added instructions for what to do if you get errors
- 06:09 PM Revision 3689: bin/map: Fixed bug where needed to use sql.function_exists() to determine if something is a relational (now SQL) function, including in row-based mode, since that now uses sql_io.put_table(), which requires this. The bug fix relies on the new xml_func.process() feature that preserves unknown relational functions in case they are built-in functions rather than SQL functions.
- 06:04 PM Revision 3688: xml_func.py: process(): In row-based mode, when trying to evaluate function using DB, preserve unknown funcs because these might be built-in functions of db_xml.put(). The sql.DoesNotExistException should be raised again when db_xml.put() is run and it verifies whether the function is built-in or not (e.g. _simplifyPath is now built-in, for column-based support). See db_xml.put_special_funcs for built-in functions.
- 05:59 PM Revision 3687: db_xml.py: put(): Fixed bug where strings starting with "$" were interpreted as input columns in row-based mode (this should only apply to column-based mode). Explicitly store whether in row-based mode in is_literals var (similar to is_literals in sql_io.put_table()).
- 05:54 PM Revision 3686: sql_io.py: put_table(): unrecoverable errors: Returning default value: is_literals: Remove column rename from default value so it doesn't get treated as a column by db_xml.put() (which is handled differently from a literal value)
- 03:53 PM Revision 3685: db_xml.py: put(): put_(): Removed no longer needed in_row_ct_ref param, which is only used by put_table(). Rewrapped function body.
- 03:46 PM Revision 3684: sql_io.py: put_table(): ignore(): literals: Only replace invalid literal with NULL or remove row if that column actually contains the invalid value in question. This handles the case where all columns are being ignore()d because the specific column couldn't be identified, and this was not the invalid column.
- 03:02 PM Revision 3683: mappings/VegX-VegBIEN.stems.csv: plot: Mapped note
- 02:32 PM Revision 3682: mappings/VegX-VegBIEN.stems.csv: plot: Added landform mapping
- 02:24 PM Revision 3681: schemas/vegbank.ERD.pdf: Auto-repaired with Adobe Reader so that the repair message doesn't pop up whenever it's opened
- 02:22 PM Revision 3680: schemas: Added vegbank.ERD.pdf so the VegBank ERD is easily accessible when mapping
- 01:51 PM Revision 3679: mappings/VegX-VegBIEN.stems.csv: project: Mapped sourceaccessioncode. This entailed adding a distinguishing suffix to the projectname input mapping.
- 01:31 PM Revision 3678: mappings/DwC2-VegBIEN.specimens.csv, VegX-VegBIEN.stems.csv: Removed all manual mappings to datasource_id now that datasource_id is auto-populated, both on the VegBIEN output side and the DwC/VegX input side. This should greatly simplify many of the mappings!
- 12:11 PM Revision 3677: db_xml.py: put(): Don't suppress exceptions thrown by sql_io.put_table() by passing them to on_error(), because some exceptions indicate unrecoverable database connection problems such as a broken connection, which should abort the import
- 11:52 AM Revision 3676: db_xml.py: put(): Support datasets with no rows, where root.firstChild == None. Documented that to use an entire XML document, you need to pass root.firstChild rather than root.
- 11:31 AM Revision 3675: inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes CVS.
- 11:23 AM Revision 3674: README.TXT: Documented that the PostgreSQL server should be restarted after installing system updates that may affect it, to avoid spurious errors that crash the import but go away upon reimport
07/27/2012
- 11:12 PM Revision 3673: Regenerated vegbien.ERD exports
- 11:10 PM Revision 3672: schemas/vegbien.ERD.mwb: Fixed lines
- 11:08 PM Revision 3671: schemas/vegbien.ERD.mwb: Synced with schema
- 10:51 PM Revision 3670: bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
- 10:48 PM Revision 3669: bin/map: Call sys.stdout.flush() after every call to sys.stdout.write() to avoid interleaved stdout/stderr output due to stdout buffering
- 10:13 PM Revision 3668: schemas/vegbien.sql: *_unique_datasource UNIQUE INDEXes: Removed COALESCE() from datasource_id and datasource_id IS NOT NULL filter, because datasource_id is now always NOT NULL
- 10:07 PM Revision 3667: schemas/filter_ERD.csv: Removed AUTO_INCREMENT because that is not added to any other tables
- 10:05 PM Revision 3666: Regenerated schemas/vegbien.my.sql
- 10:04 PM Revision 3665: schemas/vegbien.sql: specimenreplicate: Inherit datasource_id from taxonoccurrence instead of defining it independently
- 09:56 PM Revision 3664: xml_func.py: Removed no longer needed local XML functions that have been translated to SQL functions
- 09:52 PM Revision 3663: input.Makefile: Testing: Removed VegBIEN.%.xml test because the import.%.xml test output includes the template tree that it's inserting, so there is no need to generate the XML tree in a separate test. This will also remove the need to maintain local XML functions that have already been translated to DB functions for the sole purpose of this automated test.
- 09:40 PM Revision 3662: schemas/vegbien.sql: Made datasource_id required on every table that has it, to trigger the automatic population of it by sql_io.put_table()'s col_defaults
- 09:38 PM Revision 3661: Moved importing of col_defaults from db_xml.put_table() to bin/map, so that it also happens in row-based mode. Note that this causes a DB entry for the datasource to always be created, even if the datasource has no mappings or no rows.
- 09:13 PM Revision 3660: Use new exc.reraise() where exc.raise_() was used, so that the stack trace is preserved when the exception is rethrown
- 09:11 PM Revision 3659: exc.py: reraise(): Take optional exception argument so it can be invoked in the same way as raise_(). Interestingly, this missing parameter does not produce the usual "...() takes no arguments (1 given)" error when the function is called inside an except block.
- 09:04 PM Revision 3658: exc.py: Added reraise()
- 09:02 PM Revision 3657: db_xml.py: put(): Inserting node: Wrap sql_io.put_table() call in catch-all exception handler that calls on_error_() (wrapper for error handler provided by caller) and returns None. This both adds additional debugging info to the exception (in on_error_()) and allows recovery from arbitrary exceptions that happen in sql_io.put_table(), so that an exception does not abort the import.
- 08:50 PM Revision 3656: exc.py: get_e_tracebacks_str(): Use the current system traceback if the exception doesn't contain its own traceback(s)
- 08:35 PM Revision 3655: schemas/vegbien.sql: specimenreplicate: Added locationevent fkey, since fkeys are not inherited from parent tables
- 08:30 PM Revision 3654: schemas/vegbien.sql: Added datasource_id fkey constraints to all tables that needed it
- 08:21 PM Revision 3653: bin/map: out_is_db: Use col_defaults in row-based mode as well
- 08:02 PM Revision 3652: db_xml.py: Renamed put_table_special_funcs to put_special_funcs because it is now used by put() as well
- 08:00 PM Revision 3651: db_xml.py: Moved put() before the functions that use it
- 07:58 PM Revision 3650: db_xml.py: Renamed _put_table_part() to put(), replacing the existing put() whose functionality it now performs
- 07:52 PM Revision 3649: db_xml.py: _put_table_part(): Reordered params to match put(), so that it can eventually be substituted for it
- 07:44 PM Revision 3648: db_xml.py: _put_table_part(): Allow being invoked directly by adding defaults for parameters
- 07:41 PM Revision 3647: db_xml.py: put(): Use _put_table_part(). This will ensure that all the put-related functionality is in one place, rather than duplicated.
- 07:30 PM Revision 3646: db_xml.py: _put_table_part(): Append the node to errors handled with on_error()
- 07:29 PM Revision 3645: sql_io.py: Added own SyntaxError class to replace built-in SyntaxError because it stringifies to only the first line
- 06:46 PM Revision 3644: input.Makefile: Testing: Removed $(via).%.xml tests because they require the via format (DwC/VegX) to be XML, but we want to flatten VegX into a DwC-like set of CSV column names
- 06:45 PM Revision 3643: Removed inputs/NY/test/VegX.specimens.xml.ref because NY is not mapped via VegX
- 06:31 PM Revision 3642: input.Makefile: Testing: Renamed import.*.out tests to end in .xml because they now contain XML import trees for validation, and this extension turns on XML syntax highlighting in a text editor
- 06:03 PM Revision 3641: bin/map: out_is_db: Output the put template to stdout so it will be validated in the automated testing
- 05:41 PM Revision 3640: xml_func.py: process(): If local XML function can't be found, just replace with last param instead of returning an error. This allows DB-only functions to be ignored in XML output mode.
- 05:32 PM Revision 3639: sql_gen.py: ColDict.__setitem__(): Fixed bug where None value should not be replaced with column default value if column has no underlying table
- 05:27 PM Revision 3638: sql.py: DbConn.col_info(): If column does not exist, raise sql_gen.NoUnderlyingTableException
- 04:58 PM Revision 3637: sql_io.py: put_table(): In log messages, use `.to_str(db)` instead of repr() where possible to use the SQL syntax of the DB driver
- 04:51 PM Revision 3636: sql_io.py: put_table(): ignore(): Replacing invalid value with NULL in nullable column: Corrected log message to "Replacing invalid value ... with NULL in column ..." because the rows with that value are not ignored in that case
- 04:47 PM Revision 3635: sql.py: run_query(): InvalidValueException: Parse any exception ending in "out of range", not just "field value out of range", in order to support errors that the timezone is out of range
- 04:35 PM Revision 3634: schemas/py_functions.sql: _dateRange*(): Made functions STRICT because they return NULL on NULL input
07/26/2012
- 09:53 PM Revision 3633: sql_io.py: put(): Use a simple case of put_table(), which now supports everything put() needs. This will enable all row-based and column-based processing to be maintained in the same function, put_table(), and avoids the need to reimplement any column-based functionality (like SQL functions) in put().
- 09:51 PM Revision 3632: xml_dom.py: NodeTextEntryIter: Allow empty values through as None, and instead filter them out in TextEntryOnlyIter using new helper function non_empty(). This allows XML functions to decide for themselves whether empty values should be filtered out, because process() will now no longer automatically remove them. This will enable process() to work with SQL functions, which *must not* have empty values filtered out because this will remove required, but nullable, arguments.
- 09:45 PM Revision 3631: xml_func.py: Use conv_items() in every XML function that needs empty (NULL) entries removed, so that they are not dependent on what process() does to the items
- 09:43 PM Revision 3630: sql_io.py: put_table(): ignore(): Support invalid literals in addition to invalid column values. This also allows put_table() to fully support being called by put().
- 08:55 PM Revision 3629: xml_func.py: process(): In row-based mode, if function is not explicitly a relational function but does not exist as a local XML function, treat it as a relational function. This will help in merging sql_io.put() and put_table(), since put() did not support SQL functions but put_table() does, and this ensures that a SQL function is always used if the local XML function has been removed in favor of it.
- 08:37 PM Revision 3628: sql_io.py: put_table(): Removed into param to set a custom into table name because put_table() now has all the info it needs to generate this name automatically, and callers are no longer providing it
- 07:56 PM Revision 3627: bin/map: by_col: db_xml.put_table() call: Use new col_defaults param to automatically set datasource_id to the in_label (datasource name)
- 07:46 PM Revision 3626: xpath.py: path2xml(): Skip to tree created inside root, since that is how callers want to use the returned node
- 07:45 PM Revision 3625: db_xml.py: put_table(): Import col_defaults to translate nodes to pkeys
- 07:44 PM Revision 3624: db_xml.py: _put_table_part(): Support no in_table, for iterations with only literal values
- 07:27 PM Revision 3623: sql_io.py: put_table(): is_literals: When ignoring all rows, return default value instead of always None
- 06:35 PM Revision 3622: db_xml.py: put_table(): Removed parent_ids_loc and next params since these are only used in the recursion
- 06:17 PM Revision 3621: db_xml.py: put_table(): Split into an outer function that sets up the database environment and subsets in_table, and a (recursive) inner function that imports the data
- 05:55 PM Revision 3620: db_xml.py: put_table(): Subsetting and partitioning in_table: Documented that it's OK to do this even if table already the right size because it takes <1 sec
- 05:43 PM Revision 3619: sql_io.py: put_table(): Use is_function where caller-provided is_func was used, since is_function determines whether something is a function based on whether it actually exists as a SQL function instead of just whether its name starts with "_". Removed now-unneeded is_func param.
- 05:36 PM Revision 3618: sql_io.py: put_table(): Added col_defaults param and use it if there's a missing mapping for a NOT NULL column. This requires callers passing arguments by position to add an empty value for this parameter.
- 04:48 PM Revision 3617: bin/map: by_col: Only clear errors table if doing full re-import starting from row 0, not if restarting import at a later row
- 04:47 PM Revision 3616: input.Makefile: Import to VegBIEN: Fixed bug where `&>>` was used to append stdout and stderr to the log file, but is not supported on Mac OS X. Replaced with `&>` (overwrite instead of append) because log file is unique by date/time the import runs, so there won't be an existing log file that would be overwritten.
- 04:34 PM Revision 3615: schemas/vegbien.sql: Added datasource_id to all tables with a sourceaccessioncode (and corresponding *_unique_datasource constraint on these columns) so they can be directly looked up using just the input table's own fkey to parent. This will enable loading hierarchical (plots) data without "breadcrumbs", a huge benefit! Also added sourceaccessioncode wherever there was a datasource_id, to standardize on these names as being the columns that link directly to the input table rows.
- 01:15 PM Revision 3614: README.TXT: Datasource setup: Installing the staging tables: View the logs: Fixed bug in tail syntax to also work on Linux
07/25/2012
- 11:04 PM Revision 3613: Added inputs/Madidi/ with empty mappings
- 11:01 PM Revision 3612: README.TXT: Datasource setup: Populating the src/ subdir with input data: Added step to make sure each header in multiple part files for a table is EXACTLY the same
- 10:56 PM Revision 3611: README.TXT: Datasource setup: Installing the staging tables: Added steps to deal with colliding column names in the flat file headers. Added command to view the logs.
- 10:53 PM Revision 3610: csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars
- 10:52 PM Revision 3609: csv2db: Run strings.to_unicode() on column names to handle Unicode chars
- 10:36 PM Revision 3608: csv2db: esc_name(): Use db.esc_name()
- 09:25 PM Revision 3607: Added inputs/BIEN2.datasources.xlsx (formerly bien_data_sources.xlsx in nimoy:/home/bien/raw_data/)
- 09:06 PM Revision 3606: exc.py: e_msg(): Added assertions to check that e.args is compatible with this function
- 08:59 PM Revision 3605: exc.py: Use new e_str() where its definition was used
- 08:54 PM Revision 3604: exc.py: Use new Unicode-safe e_msg() instead of strings.ustr() on exceptions
- 08:47 PM Revision 3603: exc.py: e_msg(): Run strings.ustr() on the returned string so it will be appendable to other Unicode strings
- 08:43 PM Revision 3602: exc.py: Added e_msg(), e_str() (from SQL py_functions._date())
- 02:06 PM Revision 3601: db_xml.py: put_table(): Adding fkey to parent: Fixed bug where should only add parent_ids_loc table to list of tables not to truncate if it's a column, because it is sometimes just a pkey value when that iteration contained only literals
- 01:56 PM Revision 3600: inputs/import.stats.xls: Updated with stats from latest import
- 01:42 PM Revision 3599: inputs/import.stats.xls: Corrected date of last import
07/24/2012
- 09:52 AM Revision 3598: sql_gen.py: plpythonu_error_handler: Fixed bug where PL/Python exceptions could not be filtered by strings after the first line, because only the "message" portion of the exception is available in SQLERRM
- 09:35 AM Revision 3597: schemas/py_functions.sql: _date(): YMD parsing: Fixed bug where exception for ValueError needed to be stored in local var so its message could be parsed
- 09:33 AM Revision 3596: sql_gen.py: plpythonu_error_handler: Always raise PL/Python exceptions as data_exception so they go in the errors table, instead of aborting the iteration
- 09:16 AM Revision 3595: sql_gen.py: plpythonu_error_handler: Fixed bug where not all PL/Python exceptions start with "PL/Python: " (e.g. on PostgreSQL 9.1 on vegbiendev), so the PL/Python prefix must be optional. Refactored to put IF clause for non-PL/Python exception at end for a more logical ordering of the conditions.
- 08:41 AM Revision 3594: Added inputs/CVS/
- 08:40 AM Revision 3593: README.TXT: Datasource setup: Added steps to place the relevant files under version control
- 08:31 AM Revision 3592: README.TXT: Datasource setup: Accepting the test cases: Don't auto-accept the initial tests because there could be bugs in the initial mappings that would be revealed upon inspecting the test output
- 08:14 AM Revision 3591: sql_gen.py: plpythonu_error_handler: Added section comment before handler block, so that it's clear in the (very long) wrapper function definition what the block is doing
- 07:59 AM Revision 3590: input.Makefile: Documentation: import/steps.by_col.sql: Added -s to make to avoid echoing make commands to the log file
- 07:46 AM Revision 3589: README.TXT: Moved Reinstall all datasources at once to Schema changes and renamed it to Reinstall staging tables to reflect that it is only necessary when the staging table format is changed
- 07:43 AM Revision 3588: README.TXT: Datasource setup: Updating vegbiendev: Added step to also install the staging tables on vegbiendev
- 07:42 AM Revision 3587: README.TXT: Datasource setup: Moved Install the staging tables before Map each table's columns because the install can run in the background while you're mapping. It must, however, come after Auto-create the map spreadsheets because it uses the filenames of the created maps to determine which staging tables to create.
- 07:40 AM Revision 3586: README.TXT: Datasource setup: Adding a new datasource: Changed <short_name> to <name> to match usage elsewhere. Documented that it may not contain spaces, and should be abbreviated.
- 07:33 AM Revision 3585: README.TXT: Datasource setup: Added steps to update vegbiendev
- 07:31 AM Revision 3584: inputs/Makefile: Input data: Added upload target
- 07:21 AM Revision 3583: README.TXT: Datasource setup: Added steps to accept the test cases and commit
- 07:18 AM Revision 3582: README.TXT: Datasource setup: Added step to install the staging tables
- 07:18 AM Revision 3581: bin/map: in_is_xml: doc2rows(): "Root not found in input" warning: Changed "error" to "warning" to match the type of error condition signaled
- 07:15 AM Revision 3580: bin/map: map_rows(): out_is_db: Changed `id_node != None` assertion to a warning because this is a normal circumstance in the base case where there are no mappings
- 07:13 AM Revision 3579: input.Makefile: Testing: Added test/accept-all
- 07:11 AM Revision 3578: csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead
- 06:37 AM Revision 3577: bin/map: in_is_xml: doc2rows(): "Root not found in input" error: Changed SystemExit to a warning because this is a normal circumstance in the base case where the input XML file contains no rows
- 06:12 AM Revision 3576: README.TXT: Datasource setup: Documented how to map each table's columns
- 05:57 AM Revision 3575: README.TXT: Datasource setup: Changed "Auto-create the src column spreadsheets" to "Auto-create map spreadsheets" and updated command to bootstrap all maps, including newly-autogeneratable via maps
- 05:50 AM Revision 3574: input.Makefile: Maps building: maps/$(via).%.csv: Auto-create by copying the src map if doesn't exist. Existing maps discovery: Look up via format in src maps' roots if no via map already exists.
- 05:46 AM Revision 3573: src_map: Fixed bug where non-header rows needed to be materialized with empty fields for each column in the header
- 04:27 AM Revision 3572: input.Makefile: Maps building: Via maps cleanup: Match maps/$(via).%.csv with pattern instead of $(viaMaps) var so that a non-existing via map will have the recipe run, too. When auto-creating via maps is later added, this will be required.
- 04:07 AM Revision 3571: inputs/*/maps/src.*.csv: Regenerated using new src_map output format
- 04:06 AM Revision 3570: parallelproc.py: MultiProducerPool: Removed warning if not using parallel processing because this also gets generated when it's explicitly turned off, which is currently the case and clutters up stderr when testing
- 03:57 AM Revision 3569: src_map: Also add columns for the output mappings and comments, so that the src map can be directly copied for use as the via map (DwC.specimens.csv, etc.). The output mapping column name must be provided by the caller, which input.Makefile maps/src.%.csv provides using the new mappings roots.
- 03:52 AM Revision 3568: Added mappings/roots for use in creating src maps
- 03:41 AM Revision 3567: input.Makefile: Maps building: maps/src.%.csv: Clean up by passing through `$(bin)/cols '*'` whenever it's changed. This ensures that the CSV dialect is always consistently Python's Excel dialect. (Note that this dialect actually uses \r\n as the line ending. The \n line endings were from src maps generated by a previous version of bin/src_map.)
- 03:28 AM Revision 3566: input.Makefile: Maps building: maps/$(via).%.full.csv: Removed alternate rule when $(srcMap) doesn't exist, because this effect is actually achieved by the no-prereqs rule for maps/src.%.csv, which causes make to think it exists when matching pattern rules even if its recipe doesn't actually create it
- 03:23 AM Revision 3565: input.Makefile: Maps building: maps/$(via).%.full.csv: Added alternate rule when $(srcMap) doesn't exist
- 03:21 AM Revision 3564: inputs/CTFS/maps/: Removed unneeded src.organisms.csv since there is an way to deal with it not existing in input.Makefile
- 03:18 AM Revision 3563: inputs/CTFS/maps/: Removed unneeded .VegX.plots.csv.last_cleanup
- 02:13 AM Revision 3562: inputs/*/maps/src.*.csv: Standardized line endings to \n
- 01:56 AM Revision 3561: input.Makefile: Maps building: maps/$(via).%.full.csv: Added the src map as a prerequisite so it would be rebuilt when the src map changes. This is possible now that every datasource has at least an empty src map. (An empty src map is now treated the same way as a non-existing one.)
- 01:52 AM Revision 3560: inputs/*/maps/src.*.csv: Removed extraneous quotes around fields, which are added by Excel but not by Python
- 01:49 AM Revision 3559: inputs/*/maps/src.*.csv: Removed extraneous quotes around fields, which are added by Excel but not by Python
- 01:41 AM Revision 3558: inputs/CTFS: Added empty maps/src.organisms.csv so that every table of every datasource has a src map
- 12:18 AM Revision 3557: README.TXT: Datasource setup: Documented how to populate the src/ subdir with input data
07/23/2012
- 10:52 PM Revision 3556: Added inputs/CVS/
- 10:28 PM Revision 3555: sql_gen.py: plpythonu_error_handler: Translate specific Python exception types to PostgreSQL error codes (ValueError -> data_exception) instead of assuming everything is a data_exception. When removing the PL/Python prefix, preserve the Python exception class in a DETAIL message. Support non-PL/Python internal_errors by re-raising them.
- 10:25 PM Revision 3554: sql_gen.py: Added reraise_exc
- 10:21 PM Revision 3553: schemas/py_functions.sql: _date(): Raise (or pass through) ValueErrors directly instead of wrapping them in FormatExceptions, to simplify the code. This will also enable later translation of ValueErrors to data_exceptions. When year is required and missing, output a parsable 'null value in column year violates not-null constraint' error.
- 09:48 PM Revision 3552: sql_io.py: put_table(): log_exc(): Handle infinite loops from repeated exceptions by removing all rows, instead of just aborting with a failed assertion
- 09:36 PM Revision 3551: sql_io.py: put_table(): is_function: Fixed bug where special case for unrecoverable errors needed to avoid creating an empty output pkeys table because function mode defines the returned pkeys table separately
- 09:08 PM Revision 3550: sql_io.py: put_table(): is_function: Factored defining the error handling wrapper function out of the main loop because it only needs to run once. Don't log "Trying to insert new rows" in function mode because it's inaccurate.
- 07:14 PM Revision 3549: sql_gen.py: Exceptions: Added suppress_exc and use it in ExcHandler.to_str()
- 06:53 PM Revision 3548: README.TXT: Backups: After a new import: Added step to delete previous imports so they won't bloat the full DB backup. (Note that these imports have already been backed up, and only the most recent import needs to be live in the DB.)
- 06:48 PM Revision 3547: README.TXT: Backups: Documented what to do after a new import
- 06:39 PM Revision 3546: backups/Makefile: Full DB: Added vegbien.backup/all to run both test and rotate
- 06:24 PM Revision 3545: README.TXT: Renamed Maintenance section to Backups for clarity
- 06:19 PM Revision 3544: backups/Makefile: %.sql: When testing, turn it off so make won't skip `%.sql: %` in favor of it
- 06:07 PM Revision 3543: backups/Makefile: Split %.backup and %.sql into separate targets for clarity
- 05:56 PM Revision 3542: inputs/import.stats.xls: Updated with stats from latest import. Note that this import adds data provider feedback for SQL functions as well as additional date processing using _date().
07/20/2012
- 07:10 AM Revision 3541: schemas/py_functions.sql: _date(): Re-enabled now that exceptions thrown are properly handled. FormatException: Support raising parsable data_exceptions when provided with the value that was invalid. Date parsing mode: Return date as the value in FormatException so it can be filtered out automatically by column-based import.
- 07:06 AM Revision 3540: sql_io.py: put_table(): is_function: Creating error handling wrapper function: Fixed bug where needed to cast NULL returned in error handler to appropriate type, because it's contained within a SELECT query which does not do implicit casts from type unknown
- 07:03 AM Revision 3539: sql_gen.py: Cast: Support types which are Code objects
- 06:05 AM Revision 3538: sql_io.py: func_wrapper_exception_handler(): Use new sql_gen.merge_not_null() to try to ensure that NULL values are not folded (which would cause the concatenated values not to match up with the concatenated column names). Note that this adds a dependency on the db object, which callers must now provide.
- 06:03 AM Revision 3537: sql_gen.py: Added merge_not_null()
- 06:03 AM Revision 3536: sql_gen.py: Added try_mk_not_null()
- 05:54 AM Revision 3535: sql_gen.py: Renamed ArrayJoin to ArrayMerge to avoid confusion with Join (a SQL construct)
- 05:46 AM Revision 3534: sql_io.py: put_table(): is_function: Creating error handling wrapper function: Set srcs on row_var so that the column type and nullability info of row_var's columns can be retrieved for use with sql_gen.ensure_not_null()
- 05:38 AM Revision 3533: sql_gen.py: RowExcIgnore.to_str(): Compare self.row_var to global const row_var using == to allow caller to provide a copy of row_var with the underlying table set appropriately
- 05:35 AM Revision 3532: sql_gen.py: underlying_table(): Support derived tables and row vars by obtaining the underlying table from the srcs
- 05:25 AM Revision 3531: sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug where also needed to do this when is_function if an empty pkeys table was created (due to an error that could not be localized to a row)
- 05:16 AM Revision 3530: sql_io.py: put_table(): After main loop: If is_literals, return immediately to avoid needing to test for is_literals in all the code that follows (which only applies to the normal case)
- 04:43 AM Revision 3529: sql_gen.py: RowExcIgnore: If a custom row_var is used, require it to already be defined. This also allows sql_io.ExcToErrorsTable to place the column var definition in the outer DECLARE, eliminating the extra DECLARE block.
- 04:30 AM Revision 3528: sql_io.py: put_table(): is_function: Creating error handling wrapper function: Use new sql_gen.row_var
- 04:28 AM Revision 3527: sql_gen.py: RowExcIgnore: Created global constant for default row_var for callers to use
- 04:24 AM Revision 3526: sql_gen.py: RowExcIgnore.to_str(): Moved SQL comment explaining the use of an EXCEPTION block for each individual row to Python code to avoid cluttering the logged SQL code
- 04:19 AM Revision 3525: sql_io.py: put_table(): is_function: Creating error handling wrapper function: Handle errors using new func_wrapper_exception_handler(), which saves any data_exceptions in the errors table in addition to handling PL/Python errors
- 04:13 AM Revision 3524: sql_io.py: Added func_wrapper_exception_handler()
- 04:10 AM Revision 3523: sql_gen.py: Added ArrayJoin
- 04:10 AM Revision 3522: sql_gen.py: Added Array and to_Array()
- 02:47 AM Revision 3521: sql_gen.py: Added List and inherit from it in Tuple
- 02:45 AM Revision 3520: sql_gen.py: Renamed Tuple to Row and List to Tuple to more accurately reflect the datatype generated by each class (a Tuple being merely a grouping of values)
- 02:43 AM Revision 3519: sql_gen.py: Moved Composite types to Literal values section as a subsection, since Composite types was really about just the input syntaxes for these types
- 02:32 AM Revision 3518: sql_gen.py: Replaced srcs_str() with cross_join_srcs() which more correctly combines the srcs of each column using a Cartesian product. Eventually, the entire tree of srcs will need to be preserved instead of flattened in order to properly attribute errors to a specific column or set of columns.
- 02:03 AM Revision 3517: sql_gen.py: srcs_str(): Fixed bug where needed to filter out columns with no srcs so that there aren't empty elements in the ","-separated list
- 02:00 AM Revision 3516: sql_gen.py: Added has_srcs()
- 01:44 AM Revision 3515: sql_gen.py: Added NestedExcHandler
- 01:44 AM Revision 3514: sql_gen.py: Added srcs_str()
- 01:43 AM Revision 3513: sql_gen.py: as_Col(): Support non-Code, non-string inputs by making them Literals
- 01:42 AM Revision 3512: sql_gen.py: Added is_col() and use it in is_table_col()
07/19/2012
- 11:54 PM Revision 3511: sql_io.py: ExcToErrorsTable: Require users to explicitly specify an expression for the value that caused the error, instead of assuming that a variable named "value" already exists. This allows a value expression to be computed only if needed for error handling.
- 11:22 PM Revision 3510: sql_gen.py: Moved __repr__() from ExcHandler to BaseExcHandler
- 11:21 PM Revision 3509: sql_gen.py: Added BaseExcHandler and inherit from it in ExcHandlers
- 10:58 PM Revision 3508: sql_io.py: cast(): Determining if will be saving errors: Don't add extra check if isinstance(col, sql_gen.Col) because the special case for sql_gen.Literal handles supported non-columns
- 10:56 PM Revision 3507: sql_io.py: data_exception_handler(): Removed no longer needed db param
- 10:47 PM Revision 3506: sql_io.py: Added ExcToErrorsTable, which separates out the errors table inserting code from the exception handling code. data_exception_handler(): Refactored to use new sql_gen.data_exception_handler() and ExcToErrorsTable.
- 10:43 PM Revision 3505: sql_gen.py: Added data_exception_handler
- 10:08 PM Revision 3504: sql_io.py: data_exception_handler(): Refactored to use new sql_gen.ExcToWarning when not using an errors table
- 10:03 PM Revision 3503: sql_gen.py: Added ExcToWarning
- 10:02 PM Revision 3502: schemas/vegbien.sql: taxondetermination: taxondetermination_taxonoccurrence_id_fkey(): Fixed bug where string containing a \-escape needed an "E" prefix
- 09:42 PM Revision 3501: sql_io.py: data_exception_handler(): Require the caller to provide a statement to return a default value in case of error, rather than assuming the caller can accept a return value of NULL
- 09:27 PM Revision 3500: sql_io.py: data_exception_handler(): Refactored to use new sql.define_func()
- 09:20 PM Revision 3499: sql_io.py: put_table(): is_function: Calling function on input rows: Convert PL/Python exceptions (internal_errors) to data_exceptions using sql_gen.plpythonu_error_handler and an error handling wrapper function
- 09:10 PM Revision 3498: debug2redmine.csv: EXPLAIN comments: Fixed bug where needed to also match whitespace at beginning of line (indent)
- 09:07 PM Revision 3497: Use sql_gen.ReturnQuery where RETURN QUERY was previously manually prepended
- 09:05 PM Revision 3496: sql_gen.py: Added ReturnQuery
- 08:48 PM Revision 3495: sql.py: define_func(): Fixed bug where next_version() needed to have module name removed since it's in the same module
- 08:47 PM Revision 3494: sql.py: mk_select(): Added explain param to turn off automatically running EXPLAIN on the created query. This is useful for SELECT statements which use local variables in PL/pgSQL functions.
- 08:44 PM Revision 3493: sql_gen.py: with_table(): Only set the table if the passed-in value is a Col or FunctionCall
- 08:41 PM Revision 3492: sql_gen.py: Added Tuple
- 08:41 PM Revision 3491: sql_gen.py: Added List and use it in Values.to_str()
- 08:14 PM Revision 3490: sql.py: Added define_func()
- 07:07 PM Revision 3489: Use sql_gen.SetOf where SETOF was previously manually prepended
- 07:06 PM Revision 3488: sql_gen.py: Added SetOf
- 07:06 PM Revision 3487: sql_gen.py: FunctionDef: Support return_types which are Code objects
- 06:55 PM Revision 3486: Use sql_gen.ColType where %TYPE was previously manually appended
- 06:54 PM Revision 3485: sql_gen.py: Added ColType
- 06:47 PM Revision 3484: Use sql_gen.RowType where %ROWTYPE was previously manually appended
- 06:45 PM Revision 3483: sql_gen.py: Added RowType
- 06:45 PM Revision 3482: sql_gen.py: RowExcIgnore: Accept row types which are Code objects
- 06:42 PM Revision 3481: sql_gen.py: TypedCol: Accept types which are Code objects
- 06:34 PM Revision 3480: sql_io.py: data_exception_handler(): Documented that the invalid value must be in a local variable of type text
- 06:33 PM Revision 3479: sql_io.py: data_exception_handler(): Documented that the invalid value must be in a local variable of type text
- 06:32 PM Revision 3478: sql_io.py: put_table(): is_function: Creating empty pkeys table so its row type can be used: Don't do this if is_literals because special error handling does not apply to that
- 06:13 PM Revision 3477: sql_io.py: put_table(): is_function: Create empty pkeys table before calling function on all rows so its row type can later be used in an error handling wrapper function
- 05:33 PM Revision 3476: input.Makefile: Staging tables: import/install-%: Run csv2db with a nice increment of +5 to avoid interfering with the user's other processes
- 05:28 PM Revision 3475: root map: Run bin/map with a nice increment of +5 to avoid interfering with the user's other processes
- 05:24 PM Revision 3474: sql_io.py: put_table(): Handle psycopg2.extensions.TransactionRollbackError by retrying the last query
- 05:00 PM Revision 3473: sql_io.py: Creating an empty output pkeys table: Assert that there are no join columns, so that the input pkeys table will be created correctly for the empty output pkeys table
- 04:53 PM Revision 3472: sql_io.py: put_table(): Creating an empty output pkeys table: Added "output" to clarify that the created table contains just the output pkeys, and must be joined with the input pkeys table
- 04:39 PM Revision 3471: sql_gen.py: FunctionDef: Renamed args to params
- 04:35 PM Revision 3470: sql_gen.py: FunctionDef: Accept parameters as FunctionParam objects instead of strings
- 04:32 PM Revision 3469: sql_gen.py: Added FunctionParam
- 04:04 PM Revision 3468: sql_gen.py: Added plpythonu_error_handler
07/18/2012
- 11:06 PM Revision 3467: Autogenerated SQL code: Use new strings.indent() where needed
- 11:05 PM Revision 3466: strings.py: Added indent()
- 10:50 PM Revision 3465: sql_io.py: data_exception_handler(): Refactored to use sql_gen.RowExcIgnore
- 10:31 PM Revision 3464: sql_io.py: cast(): Refactored to use sql_gen.FunctionDef
- 10:28 PM Revision 3463: sql_gen.py: ExcHandler: Removed extra newline after handler
- 10:27 PM Revision 3462: sql.py: mk_insert_select(): ignore: RETURN QUERY statement: Added back missing newline after ';'
- 10:23 PM Revision 3461: sql_gen.py: FunctionDef: Added support for parameters
- 10:03 PM Revision 3460: sql_io.py: cast(): Just use the first word of the type in the function name to help avoid name collisions. Note that type name collisions that may be introduced by this change are not a problem because the function name is versioned. (The caching mechanism prevents versioning when the function has the same name and definition as an already-defined function.)
- 09:59 PM Revision 3459: sql_io.py: Added data_exception_handler() and use it in cast()
- 09:58 PM Revision 3458: sql_gen.py: ExcHandler.to_str(): Removed extra newline after body
- 08:41 PM Revision 3457: sql_gen.py: ExcHandler: Added __repr__() since it's not a Code object
- 08:17 PM Revision 3456: sql_gen.py: FunctionDef: Support custom function modifiers
- 08:04 PM Revision 3455: sql_gen.py: RowExcIgnore: Changed exc param to exc_handler to allow user to specify handler code for the exception
- 08:01 PM Revision 3454: sql_gen.py: Added ExcHandler, unique_violation_handler
- 07:53 PM Revision 3453: sql_gen.py: RowExcIgnore: Don't automatically add 'RETURN QUERY' before the with_row code or ';' after it
- 07:02 PM Revision 3452: sql_gen.py: RowExcIgnore: Allow user to specify a custom row var name
- 06:55 PM Revision 3451: sql_gen.py: FunctionDef: Don't automatically add 'SETOF ' before the return type
- 06:44 PM Revision 3450: sql.py: mk_insert_select(): embeddable: Use new sql_gen.RowExcIgnore
- 06:44 PM Revision 3449: sql_gen.py: Added RowExcIgnore
- 06:12 PM Revision 3448: sql_gen.py: FunctionDef: Determine the lang from the body's Code object instead of receiving it as a parameter
- 06:10 PM Revision 3447: sql_gen.py: as_Code(): Fixed bug where needed to handle inputs that are already Code objects
- 05:52 PM Revision 3446: sql_gen.py: Code: Added lang instance var
- 05:49 PM Revision 3445: sql_gen.py: Fixed bug where Code subclasses needed to call Code.__init__() in their __init__() function. BasicObject: Fixed bug where __init__() expected a value param, when in fact the value param is something added by certain subclasses.
- 05:31 PM Revision 3444: sql_gen.py: FunctionDef: body param: Support Code inputs in addition to strings
- 05:14 PM Revision 3443: sql.py: mk_insert_select(): embeddable: Use new sql_gen.FunctionDef
- 05:13 PM Revision 3442: sql_gen.py: Added FunctionDef
- 03:49 PM Revision 3441: README.TXT: Schema changes: Documented how to reinstall errors tables
- 03:46 PM Revision 3440: csv2db: Creating errors table: Only drop existing errors table in errors_table_only mode, so that errors tables are not unintentionally deleted when `make inputs/install` is run. This helps to make `make install` idempotent.
- 03:40 PM Revision 3439: README.TXT: Maintenance: Full DB: Changed commands to autorotate the created backup and then test and restore a rotated backup
- 03:31 PM Revision 3438: backups/Makefile: Added %.backup/rotate
- 02:58 PM Revision 3437: backups/Makefile: Rearranged sections so that backup targets, which apply to both Archived imports and Full DB, are at the top of a common Backups section
- 02:54 PM Revision 3436: inputs/import.stats.xls: Updated with stats from latest import
07/17/2012
- 11:09 PM Revision 3435: schemas/py_functions.sql: Disabled _date() because it does not yet output errors in a format parsable by the import process, and the import process does not yet trap errors produced by SQL functions
- 11:00 PM Revision 3434: sql_io.py: put_table(): Determining if can use optimization for only literal values: Fixed bug where needed initial value for reduce()
- 10:52 PM Revision 3433: sql_io.py: put_table(): Needing >= one column for INSERT SELECT: Fixed bug where can't add pkey column if calling a function instead of outputting to a table
- 10:36 PM Revision 3432: sql_io.py: put_table(): Optimization for only literal values: Also support an empty in_tables list, for use by put()
- 10:20 PM Revision 3431: sql_io.py: put_table(): Added optimization for only literal values, which does the same operations as put() but with the additional error handling of put_table()
- 10:17 PM Revision 3430: pg_dump_vegbien: Don't use SET SESSION AUTHORIZATION because it doesn't work with the py_functions schema (it requires PL/Python functions to be created as user postgres and then the owner changed to bien, which SET SESSION AUTHORIZATION won't do)
- 08:53 PM Revision 3429: sql_gen.py: Added is_literal() and use it where isinstance(..., Literal) is used
- 08:44 PM Revision 3428: db_xml.py: put_table(): Divide fields into input columns and literal values: Translate values: Allow literal values other than strings or None (from the XML parsing), because sql_io.put_table() is getting an optimization for iterations containing only literal values, which just returns the pkey of the single row for these values (which is usually an integer) instead of a temp table with the same value in each row
- 08:28 PM Revision 3427: bin/map: by_col: Stripping XML functions not in the DB: Remove DB functions based on whether a plain SQL function of that name exists, rather than whether a relational function (i.e. a table) of that name exists. This will allow column-based import to use plain SQL functions that don't have a corresponding relational function.
- 08:23 PM Revision 3426: db_xml.py: Don't remove any explicit pkey because the output table may be a SQL function, which does not have a pkey. This feature only existed to support importing VegBank XML exports, which we don't use (and which would be incompatible with the schema anyway).
- 08:19 PM Revision 3425: sql.py: function_exists(): Fixed bug where select() needed to be run with auto-rollback in case it raised an exception
- 08:08 PM Revision 3424: xml_func.py: process(): Changed rel_funcs param to a callback is_rel_func, so that caller can specify any dynamic function to determine if a name is a relational function rather than having to list out all known relational functions
- 07:54 PM Revision 3423: sql.py: function_exists(): Use simpler cast to regproc instead of query of information_schema.routines to determine if function exists. When the schema is not specified, this also limits the schemas checked to the search_path instead of the whole DB.
- 07:51 PM Revision 3422: schemas/functions.sql, py_functions.sql: Renamed trigger functions to avoid collisions with plain SQL functions of the same name but different signatures, so that the plain SQL functions can be uniquely identified by their name without also requiring their signature
- 07:39 PM Revision 3421: sql.py: mk_select(): In queries without a FROM clause, don't order by pkey
- 07:15 PM Revision 3420: sql.py: mk_select(): Support queries without a FROM clause
- 07:03 PM Revision 3419: sql.py: Added DoesNotExistException and parse it in run_query()
- 06:46 PM Revision 3418: sql_io.py: put_table(): Removed no longer used conds var (invalid rows are removed from the in_table using sql.delete() instead of being filtered out in the main select)
- 06:43 PM Revision 3417: sql_io.py: put_table(): Removed no longer used distinct_on var (sql.distinct_table() handles filtering the join_cols)
- 06:24 PM Revision 3416: schemas/py_functions.sql: _date(): Just run str() on the returned datetime because it will usually be converted to a PostgreSQL timestamp anyway, so excluding the time from the string isn't necessary
- 06:15 PM Revision 3415: schemas/py_functions.sql: Added _date()
- 06:14 PM Revision 3414: sql.py: run_query(): Exception parsing: Remove PL/Python prefix from exception message so that the regexps can match at the beginning of the message
- 05:50 PM Revision 3413: sql_io.py: put_table(): Handle sql.InvalidValueExceptions by filtering the value out of all input columns. This will be useful for SQL functions that raise exceptions.
- 04:49 PM Revision 3412: schemas/vegbien.sql: namedplace, plantname: *_unique UNIQUE INDEX: Reordered columns to put rank after parent_id and plantname so that these columns, which are usually input table columns, can be used in a merge join index scan, while rank, which is usually a literal value, can applied as an index filter condition after the merge join
- 04:42 PM Revision 3411: sql.py: distinct_table(): Removed literal values from UNIQUE INDEXes because the query planner did not seem to use them to do a merge join
- 04:01 PM Revision 3410: README.TXT: Maintenance: Full DB: Documented how to test full DB backup
- 03:47 PM Revision 3409: backups/Makefile: Added %.backup/test
07/16/2012
- 08:32 PM Revision 3408: README.TXT: Documented maintenance of full DB (back up/restore)
- 08:23 PM Revision 3407: backups/Makefile: Full DB backups: Added vegbien.backup
- 08:22 PM Revision 3406: pg_dump_vegbien: If first arg is "all", dump entire DB. Require a first arg so that Usage message will be displayed if run with no args.
- 08:03 PM Revision 3405: Always output Usage messages to stderr and word-wrap them using `fold -s`
- 07:37 PM Revision 3404: backups/Makefile: Factored backup command into $(backup) for later use by full DB backups. Made Backups, Archived imports sections subsections of Archived imports so Full DB backups can have its own section.
- 07:16 PM Revision 3403: backups/Makefile: Fixed bug where $(SHELL) needed to be set to bash so that $'\n' would be interpreted correctly
- 07:06 PM Revision 3402: backups/Makefile: Fixed bug where *.sql files needed to be restored using psql because pg_restore only supports "non-plain-text formats"
- 06:35 PM Revision 3401: pg_dump_vegbien: For consistency with setting the --schema option, use `set -- "$@" args...` to append options to $@ which are then passed to pg_dump, instead of specifying several variables which are then included in the pg_dump command
- 06:26 PM Revision 3400: pg_dump_vegbien: Pass command line options directly to pg_dump after parsing out any schema name
- 06:19 PM Revision 3399: backups/Makefile: Don't log stderr or run the command verbosely and instead just output the command and run time to the terminal. This matches what we do for pg_dump, which works better because it just prints the useful information when it's done running.
- 05:28 PM Revision 3398: backups/Makefile: Remove log files after successful restore/extraction because they are only useful for tail -f when the restore operation is running in the background
- 05:14 PM Revision 3397: pg_dump_vegbien: Save owners when saving data (for full export)
- 05:03 PM Revision 3396: pg_dump_vegbien: Use SET SESSION AUTHORIZATION to ensure that owners are always recorded in the same format. This will help make plain text backups comparable using diff.
- 04:39 PM Revision 3395: backups/Makefile: Backups: Fixed bug where `%.sql: %` needed to come before %.sql with no prerequisites to be matched first
- 04:14 PM Revision 3394: Moved archived imports and make targets to maintain them to new backups dir
- 04:08 PM Revision 3393: Moved archived imports and make targets to maintain them to new backups dir
- 03:29 PM Revision 3392: Added psql_script_vegbien
- 12:54 PM Revision 3391: root Makefile: VegBIEN DB: Schemas: Added schemas/%.sql to extract a compressed custom-format backup to plain SQL
- 12:33 PM Revision 3390: root Makefile: VegBIEN DB: Schemas: Added schemas/%.backup/uninstall so that a schema can be removed by its backup file name (with extension) as well as its name
Also available in: Atom