Project

General

Profile

Activity

From 01/16/2013 to 02/14/2013

02/14/2013

11:49 AM Revision 7555: Added inputs/HVAA/
Aaron Marcuse-Kubitza
11:14 AM Revision 7554: Added inputs/ARIZ/_archive
Aaron Marcuse-Kubitza
11:13 AM Revision 7553: inputs/ARIZ/: Removed previous data now that it has been refreshed
Aaron Marcuse-Kubitza
11:08 AM Revision 7552: inputs/ARIZ/: Mapped refresh
Aaron Marcuse-Kubitza
11:04 AM Task #566 (New): automatically adjust staging tables for easier mapping
* remove empty columns
* mark columns with data in every row as NOT NULL
Aaron Marcuse-Kubitza
09:48 AM Revision 7551: Added inputs/ARIZ/import_order.txt
Aaron Marcuse-Kubitza
09:22 AM Revision 7550: Added inputs/NY/_archive/
Aaron Marcuse-Kubitza
09:20 AM Revision 7549: inputs/NY/: Removed tables from previous extract
Aaron Marcuse-Kubitza
08:59 AM Revision 7548: inputs/NY/: Mapped refresh
Aaron Marcuse-Kubitza
08:58 AM Revision 7547: inputs/*/*/VegBIEN.csv: Regenerated from mappings/VegCore-VegBIEN.csv
Aaron Marcuse-Kubitza
08:52 AM Revision 7546: Added inputs/NY/import_order.txt
Aaron Marcuse-Kubitza
07:20 AM Task #386 (Resolved): load Canadensys data
Aaron Marcuse-Kubitza
07:19 AM Task #470 (Resolved): source terms from old versions of DwC to the DwC history page
Aaron Marcuse-Kubitza
07:18 AM Task #472 (Rejected): replace accessioncodes with datasource_id+sourceaccessioncode
Accessioncode fields have been removed instead. Globally unique ID fields (#561) will eventually serve the purpose th... Aaron Marcuse-Kubitza
07:06 AM Task #565 (Resolved): partition the TaxonDetermination table by row into the different types of determinations
* this makes it easy to horizontally join the different types of determinations for a row
** it avoids the need for ...
Aaron Marcuse-Kubitza
06:37 AM Task #564 (New): make all VegBIEN column names globally unique
* this enables creating a table to contain the results of a join, without needing to resolve column name collisions
...
Aaron Marcuse-Kubitza
06:33 AM Task #563 (New): refactor VegBIEN to use VegCore terms
* add key VegCore tables such as Occurrence and Record
* note that many VegCore tables have an inheritance relations...
Aaron Marcuse-Kubitza
05:17 AM Task #562 (New): flatten the mappings
* put the destination table at the beginning of the mapping, rather than nesting it within a hierarchy of tables it h... Aaron Marcuse-Kubitza
04:47 AM Task #561 (New): make VegBIEN ID fields plain-text instead of numeric
h3. Rationale
* this makes it possible to append data from multiple sources without having pkey collisions
h3. ...
Aaron Marcuse-Kubitza
04:12 AM Task #560 (New): move VegCore data dictionary to a phpPgAdmin-accessible database
* term details go in column comments, with Redmine formatting translated to HTML
* each synonym becomes a parameter ...
Aaron Marcuse-Kubitza
02:51 AM Revision 7545: inputs/ARIZ/: Added SQL export for refresh
Aaron Marcuse-Kubitza
02:33 AM Revision 7544: my2pg.data: Translate indefinite (zero) months which have a definite day. This is unusual, but does appear in some data such as the ARIZ DB.
Aaron Marcuse-Kubitza
02:28 AM Revision 7543: my2pg.data: Translate indefinite dates (dates with 0 as the month or day)
Aaron Marcuse-Kubitza
02:23 AM Revision 7542: my2pg: Use my2pg.data to perform data-only replacements, instead of duplicating them in both my2pg and my2pg.data
Aaron Marcuse-Kubitza
02:01 AM Revision 7541: my2pg: named UNIQUE KEYs: Comment out the name because PostgreSQL requires it to be globally unique, but MySQL only requires it to be unique within the table
Aaron Marcuse-Kubitza
01:53 AM Revision 7540: my2pg: Translate UNIQUE KEYs instead of removing them
Aaron Marcuse-Kubitza
01:49 AM Revision 7539: my2pg*: Removed KEYs: Comment out the definition rather than removing it
Aaron Marcuse-Kubitza
01:45 AM Revision 7538: my2pg*: Remove FOREIGN KEYs because MySQL does not dump tables in dependency order, which prevents PostgreSQL from creating tables whose fkeys refer to a later table
Aaron Marcuse-Kubitza
01:33 AM Revision 7537: my2pg*: Replacing invalid table elements to remove them: Use a dummy CHECK constraint instead of a boolean field to avoid adding fields to the table. The elements can't always simply be removed because sed can't remove the trailing comma of the previous element, and removing the following comma doesn't work for the last element in the table.
Aaron Marcuse-Kubitza
12:11 AM Revision 7536: my2pg*: Replace '0000-00-00 00:00:00' with '-infinity'
Aaron Marcuse-Kubitza
12:04 AM Revision 7535: my2pg: Replace datetime with timestamp
Aaron Marcuse-Kubitza

02/13/2013

11:59 PM Revision 7534: my2pg: Remove COLLATE field attribute
Aaron Marcuse-Kubitza
11:56 PM Revision 7533: lib/MySQL.*.sql.make: Documented that $server user/host are for ssh, not the DB
Aaron Marcuse-Kubitza
11:55 PM Revision 7532: lib/MySQL.*.sql.make: Documented that $server can also contain a username (which will be used by ssh)
Aaron Marcuse-Kubitza
11:51 PM Revision 7531: my2pg_export: Use the --quick option to facilitate exporting large tables (it avoids retrieving all rows before outputting any of them)
Aaron Marcuse-Kubitza
11:00 PM Revision 7530: README.TXT: Datasource setup: Added instructions for MS Access databases
Aaron Marcuse-Kubitza
10:43 PM Revision 7529: README.TXT: Datasource setup: MySQL inputs: Added instruction to skip the Add input data for each table section
Aaron Marcuse-Kubitza
10:40 PM Revision 7528: inputs/NY/: Added SQL export for refresh
Aaron Marcuse-Kubitza

02/12/2013

01:08 PM Revision 7527: mappings/VegCore.htm: Regenerated from wiki. Brad's new DwC ID terms spreadsheet has now been added, and a number of the ID terms clarified, disambiguated, and recategorized. In particular, institutionCode has now been split into the custodialInstitutions and collectingInstitution, to differentiate between which institution has the specimen vs. stamped the specimen. This distinction is important because the catalogNumber, stamped on the specimen, is only unique within the collectingInstitution. Most datasources don't unambiguously specify which institution their institutionCode is referring to, so it has been assumed to be custodialInstitutions unless a data dictionary says otherwise (as is the case for UNCC). In addition, a MatchedTaxonDetermination table has been added with the *_matched fields from TNRS.
Aaron Marcuse-Kubitza
12:15 PM Revision 7526: inputs/CVS/observation_/map.csv: baseSaturation: Resolved ambiguous term
Aaron Marcuse-Kubitza
12:09 PM Revision 7525: mappings/Makefile: VegCore.vocab.csv: Ignore leading ? when sorting so that ambiguous terms sort alphabetically with other terms. This prevents terms from moving from their previous location when they become ambiguous.
Aaron Marcuse-Kubitza
12:07 PM Revision 7524: Added sort_ci to sort a spreadsheet, ignoring leading punctuation
Aaron Marcuse-Kubitza
12:05 PM Revision 7523: mappings/VegCore.vocab.csv: Changed line endings to \r\n in preparation for having a Python script run on it (which changes the line endings)
Aaron Marcuse-Kubitza
11:47 AM Revision 7522: mappings/Makefile: VegCore.vocab.csv: Added back ambiguous terms, so that the vocabulary contains all terms defined by VegCore, regardless of whether they are ambiguous or unambiguous terms
Aaron Marcuse-Kubitza
11:44 AM Revision 7521: mappings/Makefile: VegCore.vocab.csv: Added back synonyms, so that the vocabulary contains all terms defined by VegCore, regardless of whether they are synonyms or primary terms. This also prevents VegCore.vocab.csv from losing entries when terms are renamed, which made it difficult to verify that no terms were lost when refactoring.
Aaron Marcuse-Kubitza
05:50 AM Revision 7520: inputs/MO/Specimen/postprocess.sql: Remove frameshifted rows by detecting InstitutionCodes without any letters
Aaron Marcuse-Kubitza
04:59 AM Revision 7519: inputs/ARIZ/Specimen/map.csv: CollectorNumber/FieldNumber: Use /_first to map these identical fields to the same location
Aaron Marcuse-Kubitza
04:54 AM Revision 7518: inputs/ARIZ/Specimen/map.csv: Fixed bug where the column names for InstitutionCode and CollectionCode were reversed in the source data
Aaron Marcuse-Kubitza
04:14 AM Revision 7517: inputs/*/Specimen/map.csv for Canadensys sources: Remapped institutionID to UNUSED
Aaron Marcuse-Kubitza

02/09/2013

07:45 AM Revision 7516: mappings/VegCore.htm: Regenerated from wiki. The original*, accepted*, and verbatim* Taxon fields have now been moved to separate OriginalTaxonDetermination, AcceptedTaxonDetermination, and TaxonVerbatim tables.
Aaron Marcuse-Kubitza
06:52 AM Revision 7515: mappings/VegCore.htm: Regenerated from wiki
Aaron Marcuse-Kubitza
06:34 AM Revision 7514: mappings/VegCore.htm: Regenerated from wiki
Aaron Marcuse-Kubitza
04:08 AM Revision 7513: README.TXT: Maintenance: VegCore data dictionary: Replaced VegCore.*.csv with VegCore.htm because now that VegCore.*.csv are sorted alphabetically, they generally don't change when VegCore.htm changes
Aaron Marcuse-Kubitza
04:04 AM Revision 7512: mappings/VegCore.*.csv: Regenerated from wiki. A plain text label is now used for Replace with, which fixes a bug where the PRIVATE permalink pointed to its Replace with in realLatitude instead of its definition.
Aaron Marcuse-Kubitza
03:55 AM Revision 7511: redmine_synonyms: Support plain text labels other than Alternative, such as Replace with
Aaron Marcuse-Kubitza
03:13 AM Revision 7510: mappings/VegCore.*.csv: Regenerated from wiki. Alternatives now contain the "Alternative" label as plain text rather than as an image title, thus avoiding an HTML anchor conflict with the definition and allowing ambiguous terms to be placed before their alternatives as well as after.
Aaron Marcuse-Kubitza
03:11 AM Revision 7509: README.TXT: Maintenance: VegCore data dictionary: Updated VegCore.csv filename to VegCore.*.csv
Aaron Marcuse-Kubitza
02:57 AM Revision 7508: redmine_synonyms: Support alternatives which contain the "Alternative" label as plain text rather than as an image title. This is done to include the "Alternative" label in the HTML anchor and thus prevent the anchor from conflicting with the actual definition of the alternative (which would otherwise have the same anchor text). This allows ambiguous terms to be placed before their alternatives as well as after, because there won't be anchor conflicts that need to be resolved with careful ordering.
Aaron Marcuse-Kubitza
02:48 AM Revision 7507: mappings/VegCore.csv: Regenerated from wiki. Taxon terms with prefixes for other TaxonDeterminations now indicate the analogous term in an "analogous to" label next to the term
Aaron Marcuse-Kubitza
02:47 AM Revision 7506: mappings/VegCore.csv: Regenerated from wiki. Taxon terms with prefixes for other TaxonDeterminations now indicate the analogous term in an "analogous to" label next to the term
Aaron Marcuse-Kubitza
02:47 AM Revision 7505: mappings/VegCore.csv: Regenerated from wiki. Taxon terms with prefixes for other TaxonDeterminations now indicate the analogous term in an "analogous to" label next to the term
Aaron Marcuse-Kubitza

02/07/2013

01:57 PM Revision 7504: mappings/VegCore-VegBIEN.csv: datasourceRecordID: Fixed bug where also need to add datasourceRecordID next to occurrenceID for an institutionCode remap switch
Aaron Marcuse-Kubitza
01:57 PM Revision 7503: inputs/bien_web/observation/test.xml.ref: Regenerated
Aaron Marcuse-Kubitza
01:48 PM Revision 7502: inputs/import.stats.xls: Updated import times using the import_times bugfix for times longer than a day
Aaron Marcuse-Kubitza
01:45 PM Revision 7501: import_times: times(): Fixed bug where need to match whitespace in times, in order to match times with days
Aaron Marcuse-Kubitza
12:00 PM Revision 7500: inputs/*/Specimen/map.csv: Remapped ID to datasourceRecordID
Aaron Marcuse-Kubitza
11:55 AM Revision 7499: mappings/VegCore-VegBIEN.csv: Mapped datasourceRecordID
Aaron Marcuse-Kubitza
11:51 AM Revision 7498: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
08:38 AM Revision 7497: inputs/FIA/_src/_README.TXT: Documented that the refresh is missing some PLT_CN values present in the original version
Aaron Marcuse-Kubitza
08:33 AM Revision 7496: inputs/FIA/import_order.txt: Reverted back to using FIA_COND_unique instead of COND_unique because the PLT_CN IDs in the refresh don't match the PLT_CN IDs in the original version, making COND_unique and Organism incompatible
Aaron Marcuse-Kubitza
08:27 AM Revision 7495: inputs/FIA/import_order.txt: Removed FIA_COND_unique, which is superseded by COND_unique
Aaron Marcuse-Kubitza
08:26 AM Revision 7494: inputs/FIA/import_order.txt: Fixed bug where need to import COND_unique before Organism because the plot entries need to be created before they can be linked to by organisms
Aaron Marcuse-Kubitza
07:25 AM Revision 7493: redmine_synonyms: sed pattern: Match <h# directly at the beginning of the line rather than after ^.*, which greatly speeds up the pattern matching because the first character is a literal character. (If <h# were not located at the left margin, the ^.* would unfortunately still be needed because the beginning of the line needs to be matched in order to be removed by the replacement operation.)
Aaron Marcuse-Kubitza
07:22 AM Revision 7492: mappings/VegCore.csv: Regenerated from wiki. Alternatives are now able to use h3 instead of h4 (which had display problems). realLatitude/Longitude is now no longer needs the ? prefix to have its replacement (PRIVATE) interpreted as an alternative, and thus is properly able to be included in the vocabulary.
Aaron Marcuse-Kubitza
07:16 AM Revision 7491: mappings/Makefile: VegCore.vocab.csv: Use the term's type label instead of its header level to determine if it's a synonym or alternative. This allows header levels to be chosen for presentational reasons rather than being constrained by being parsable.
Aaron Marcuse-Kubitza
07:05 AM Revision 7490: redmine_synonyms: Don't require ambiguous terms to start with ?, because the ambiguous term for an alternative can be identified simply by choosing the last term that didn't have a type label (previously, this would have been the last term that wasn't h3 or h4)
Aaron Marcuse-Kubitza
07:01 AM Revision 7489: redmine_synonyms: Use the term's type label instead of its header level to determine if it's a synonym or alternative. This allows header levels to be chosen for presentational reasons rather than being constrained by being parsable.
Aaron Marcuse-Kubitza
06:26 AM Revision 7488: mappings/VegCore.csv: Regenerated from wiki. The data dictionary has been reformatted to be much more vertically compact, by placing the term type (Synonym, Alternative, etc.) and sources (From:) on the same line as the term. Note that globalUniqueIdentifier_SpeciesLink has been removed from the vocabulary because a definition entry has been added for it (when this entry is missing, the term is incorrectly identified as a primary term).
Aaron Marcuse-Kubitza
06:21 AM Revision 7487: mappings/Makefile, redmine_synonyms: Updated for new VegCore data dictionary format, which prefixes the term type (Synonym, Alternative, etc.) to the term instead of including it as a section label. This ensures that the term type of a non-primary term is shown next to the term when it is visited via a permalink, which causes the term header to appear at the top of the screen and obscures the section header containing the type.
Aaron Marcuse-Kubitza
06:00 AM Revision 7486: mappings/Makefile: VegCore.thesaurus.csv: removal of tables: ignore errors if grep found no match
Aaron Marcuse-Kubitza
02:06 AM Revision 7485: Renamed mappings/VegCore.csv to VegCore.vocab.csv and Veg+-VegCore.csv to VegCore.thesaurus.csv for clarity
Aaron Marcuse-Kubitza
02:03 AM Revision 7484: mappings/Makefile, input.Makefile: Renamed $(dict) to $(thesaurus) because Veg+-VegCore.csv is actually a thesaurus, not a dictionary
Aaron Marcuse-Kubitza
01:57 AM Revision 7483: mappings/Makefile: Replaced occurrences of VegCore.csv with $(vocab) and Veg+-VegCore.csv with $(dict)
Aaron Marcuse-Kubitza

02/06/2013

07:34 PM Revision 7482: README.TXT: Maintenance: VegCore data dictionary: When moving terms, check that no terms were lost: Updated steps now that VegCore.csv and Veg+-VegCore.csv are sorted by name, so that a comparison of added/deleted counts is not necessary and a simple `svn di` can be used
Aaron Marcuse-Kubitza
07:33 PM Revision 7481: mappings/Makefile: Veg+-VegCore.csv: Sort terms by name so that reordering terms in the VegCore data dictionary does not cause Veg+-VegCore.csv to change. This makes it much easier to identify synonyms and ambiguous terms that were accidentally deleted during a data dictionary refactoring. (Note that these are no longer included in VegCore.csv, so this is required in addition to sorting VegCore.csv by name.)
Aaron Marcuse-Kubitza
07:26 PM Revision 7480: mappings/Makefile: VegCore.csv: Sort terms by name so that reordering terms in the VegCore data dictionary does not cause VegCore.csv to change. This makes it much easier to identify terms that were accidentally deleted during a data dictionary refactoring.
Aaron Marcuse-Kubitza

02/05/2013

06:19 PM Revision 7479: mappings/VegCore.csv: Regenerated from wiki. This adds cf_aff.
Aaron Marcuse-Kubitza
06:18 PM Revision 7478: mappings/Makefile: VegCore.csv: Filter out namespaces by matching only terms whose header links within the data dictionary
Aaron Marcuse-Kubitza
06:08 PM Revision 7477: mappings/VegCore.csv: Regenerated from wiki. This causes TNRS's Annotations (cf/aff) to be mapped into VegBIEN.
Aaron Marcuse-Kubitza
06:05 PM Revision 7476: mappings/VegCore-VegBIEN.csv: matched*Fit_fraction: Remapped to taxonconfidence instead of taxonfit
Aaron Marcuse-Kubitza
05:56 PM Revision 7475: mappings/Makefile: VegCore.csv: Fixed bug where need to remove duplicates, which are no longer supported by canon, by removing alternatives of ambiguous terms when these occur separately from their definitions
Aaron Marcuse-Kubitza
05:29 PM Revision 7474: mappings/Makefile: VegCore.csv: Removed synonyms and ambiguous terms, since the canonicalization of them is handled by Veg+-VegCore.csv. This also reduces the time it takes canon to build the in-memory Python dict of replacements, which scales to all inputs and should speed up the build/test command.
Aaron Marcuse-Kubitza
05:22 PM Revision 7473: mappings/Makefile: VegCore.csv: Removed synonyms, since the canonicalization of them is handled by Veg+-VegCore.csv
Aaron Marcuse-Kubitza
05:10 PM Revision 7472: mappings/Makefile: VegCore.csv: Match terms by header # instead of matching all anchors, in order to include the leading ? before an ambiguous term
Aaron Marcuse-Kubitza
04:42 PM Revision 7471: mappings/Makefile: Veg+-VegCore.csv: Generate dynamically from VegCore.htm, which allows the VegCore thesaurus to be automatically kept up to date. More importantly, it allows terms in all map spreadsheets to be updated simultaneously when a term is renamed (e.g. by replacing a term with one of its synonyms).
Aaron Marcuse-Kubitza
04:40 PM Revision 7470: mappings/VegX-VegCore.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv. Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping several fields.
Aaron Marcuse-Kubitza
04:32 PM Revision 7469: mappings/VegCore-VegBIEN.csv, inputs/*/*/map.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv, which reflects the current state of the data dictionary. (Permanently switching to the new Veg+-VegCore.csv will be a separate change.) Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping and inferring units on several fields.
Aaron Marcuse-Kubitza
04:27 PM Revision 7468: mappings/VegCore-VegBIEN.csv: Mapped basalDiameter_in
Aaron Marcuse-Kubitza
04:15 PM Revision 7467: mappings/VegCore-VegBIEN.csv: Mapped diameterBreastHeightGentry_cm, basalDiameter_cm, precipitation_mm
Aaron Marcuse-Kubitza
04:14 PM Revision 7466: schemas/vegbien.sql: Added _mm_to_m()
Aaron Marcuse-Kubitza
03:56 PM Revision 7465: mappings/Makefile: Veg+-VegCore.csv: Fixed bugs where also need to filter out ambiguous tables, but shouldn't filter out acronyms (which are regular fields)
Aaron Marcuse-Kubitza
03:40 PM Revision 7464: mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
Aaron Marcuse-Kubitza
03:38 PM Revision 7463: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
03:19 PM Revision 7462: mappings/Makefile: Veg+-VegCore.csv: Fixed bug where need to filter out table names to avoid applying table replacements to fields which have the same name as a table
Aaron Marcuse-Kubitza
03:03 PM Revision 7461: inputs/Madidi/map.csv: Fixed bug where needed to remove duplicate input names, now that translate doesn't allow them
Aaron Marcuse-Kubitza
01:59 PM Revision 7460: mappings/Makefile: VegX-VegCore.csv: Sort by the input column instead of the output column to keep the sort order stable across VegCore term renames
Aaron Marcuse-Kubitza
01:46 PM Revision 7459: mappings/Makefile: Veg+-VegCore.csv: Before running collapse_multimap, canonicalize alternatives of ambiguous terms using unambiguous mappings. This ensures that the alternatives lists contain only canonical VegCore terms rather than synonyms.
Aaron Marcuse-Kubitza
01:43 PM Revision 7458: mappings/VegCore.csv: Regenerated from wiki. All synonyms are now hyperlinked, allowing them to be matched by redmine_synonyms.
Aaron Marcuse-Kubitza
01:31 PM Revision 7457: mappings/Veg+-VegCore.csv: Removed Sources, Definition columns because source information is now in the VegCore data dictionary
Aaron Marcuse-Kubitza
01:25 PM Revision 7456: mappings/VegCore.csv: Regenerated from wiki. Ambiguous terms newly available to redmine_synonyms due to the bugfix now have multiple alternatives.
Aaron Marcuse-Kubitza
01:25 PM Revision 7455: redmine_synonyms: Ambiguous terms: Fixed bug where need to use header # instead of term name to determine whether a term is an alternative, because some alternatives (e.g. verbatimElevation) don't follow the units-suffix naming convention.
Aaron Marcuse-Kubitza
12:58 PM Revision 7454: mappings/VegCore.csv: Regenerated from wiki. All ambiguous terms now have multiple alternatives, preventing them from being automapped to a single alternative without prompting the user for confirmation
Aaron Marcuse-Kubitza
12:50 PM Revision 7453: mappings/Makefile: Veg+-VegCore.csv: translate: Fixed bug where need to run on $@ instead of $<
Aaron Marcuse-Kubitza
12:49 PM Revision 7452: mappings/VegCore.csv: Regenerated from wiki. All ambiguous terms now have multiple alternatives, preventing them from being automapped to a single alternative without prompting the user for confirmation
Aaron Marcuse-Kubitza
12:22 PM Revision 7451: mappings/VegCore.csv: Regenerated from wiki. All mappings/Veg+-VegCore.csv terms are now added as synonyms or separate terms.
Aaron Marcuse-Kubitza
10:26 AM Revision 7450: mappings/VegCore.csv: Regenerated from wiki. Most ambiguous terms are now split into alternatives, and most mappings/Veg+-VegCore.csv terms are now added as synonyms.
Aaron Marcuse-Kubitza
06:12 AM Revision 7449: canon: Raise an error if two input terms map to the same simplified string
Aaron Marcuse-Kubitza
04:34 AM Revision 7448: translate: Changed dictionary to thesaurus, since the map used actually has synonyms rather than definitions
Aaron Marcuse-Kubitza
04:31 AM Revision 7447: mappings/Makefile: Veg+-VegCore.csv: Translate the thesaurus's output terms using itself in order to map a synonym of an ambiguous term directly to its alternatives list rather than only to the ambiguous term itself
Aaron Marcuse-Kubitza
04:26 AM Revision 7446: mappings/Makefile: Veg+-VegCore.csv: Run collapse_multimap on the generated map so that all alternatives are included, rather than just the first alternative, when translate maps an ambiguous term
Aaron Marcuse-Kubitza
04:25 AM Revision 7445: redmine_synonyms: Fixed bug where need to output a CSV rather than TSV to be usable by other programs that use map spreadsheets
Aaron Marcuse-Kubitza
04:23 AM Revision 7444: Added collapse_multimap, which collapses multimap entries in a spreadsheet dictionary
Aaron Marcuse-Kubitza
03:45 AM Revision 7443: mappings/Veg+-VegCore.csv: Separate alternatives of ambiguous terms with , instead of ", " for easier machine-parsability
Aaron Marcuse-Kubitza
03:31 AM Revision 7442: redmine_synonyms: Added support for ambiguous terms, which unlike the synonyms format nests the term (the alternative) under the synonym (the ambiguous term) rather than the synonym under the term. Note that ambiguous terms must also be prefixed with ? to differentiate them from composites (e.g. recordedBy_givenName), which use the same _-based naming convention.
Aaron Marcuse-Kubitza
03:08 AM Revision 7441: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
02:49 AM Revision 7440: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
02:22 AM Revision 7439: schemas/vegbien.sql: analytical_stem_view: Renamed scientificNameWithMorphospecies to taxonNameWithMorphospecies because it does not contain the scientific name author, as required by DwC scientificName <http://rs.tdwg.org/dwc/terms/#scientificName>
Aaron Marcuse-Kubitza
01:56 AM Revision 7438: mappings/Makefile: VegCore.tables.csv: Exclude ambiguous table names, which should not be part of the tables summary (as neither are table synonyms)
Aaron Marcuse-Kubitza
01:51 AM Revision 7437: input.Makefile: $(translate?): Merged with $(translate), which is not used independently
Aaron Marcuse-Kubitza
01:50 AM Revision 7436: input.Makefile: Use new translate_ci instead of translate
Aaron Marcuse-Kubitza
01:47 AM Revision 7435: mappings/Makefile: Use new translate_ci instead of translate
Aaron Marcuse-Kubitza
01:39 AM Revision 7434: Added translate_ci
Aaron Marcuse-Kubitza

02/04/2013

11:03 PM Revision 7433: mappings/VegCore-VegBIEN.csv: institutionCode list->sourcename mapping: _split(): Also match ; as a separator, and match separators with or without a following space
Aaron Marcuse-Kubitza

02/02/2013

05:39 PM Revision 7432: mappings/Makefile: Added target to create Veg+-VegCore.csv from VegCore.htm, initially commented out until all the synonyms in the existing Veg+-VegCore.csv are added to the VegCore data dictionary <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_data_dictionary>
Aaron Marcuse-Kubitza
05:38 PM Revision 7431: Added redmine_synonyms, which translates a Redmine HTML page to a thesaurus
Aaron Marcuse-Kubitza
04:37 PM Revision 7430: lockfile: Linux: Documented why newgrp and recursive invocation of lockfile are needed
Aaron Marcuse-Kubitza
04:33 PM Revision 7429: lockfile: Linux: Fixed bug where need to change primary group of the dotlockfile process to the group of the dir to contain the lockfile, because dotlockfile otherwise reports a "permission denied" error (even though the directory is actually writable, dotlockfile thinks it isn't). Running dotlockfile with a different primary group is complicated because newgrp, the command that does this, does not pass arguments to the new process, so they must instead be passed via environment variables and a recursive invocation of lockfile (with the $inner recursion flag set). Additionally, exec cannot be used to propagate the PPID (needed by dotlockfile) because newgrp creates a new process rather than using exec, so it must be manually entered into the lockfile after dotlockfile runs.
Aaron Marcuse-Kubitza
02:41 PM Revision 7428: lockfile: Linux: Fixed bug where need to lower retry count to avoid overflowing the retries variable
Aaron Marcuse-Kubitza
02:37 PM Revision 7427: lockfile: Linux: Added workaround for bug in dotlockfile where using -1 to retry indefinitely doesn't work, so need to use large integer instead
Aaron Marcuse-Kubitza
01:49 PM Revision 7426: lockfile: Linux: Use bin/dotlockfile instead of the system's dotlockfile, because the system's dotlockfile is SETGID mail, which prevents it from creating lockfiles in a directory owned by the bien user and group when being run by the login user
Aaron Marcuse-Kubitza
01:38 PM Revision 7425: bin/: svn:ignore: Added dotlockfile, which is copied from the system during installation
Aaron Marcuse-Kubitza
01:30 PM Revision 7424: bin/: svn:ignore: Removed no longer applicable test_output
Aaron Marcuse-Kubitza
01:26 PM Revision 7423: root Makefile: misc-Linux: Added command to copy dotlockfile to the bin/ dir, so that it can be used without being SETGID mail, which would prevent it from creating lockfiles in a directory owned by the bien user and group when being run by the user
Aaron Marcuse-Kubitza
01:24 PM Revision 7422: root Makefile: core: Added misc-* to install other dependencies
Aaron Marcuse-Kubitza
11:56 AM Revision 7421: schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Removed no longer needed canon_taxonverbatim.family alternative, since the family will be included in the canon_taxonlabel.taxonomicname by the mappings
Aaron Marcuse-Kubitza
11:49 AM Revision 7420: schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Fixed bug where need to use canon_*taxonlabel*.taxonomicname instead of canon_taxonverbatim.taxonomicname as one of the alternatives because only canon_taxonlabel.taxonomicname is guaranteed to be populated by the mappings, while canon_taxonverbatim.taxonomicname will only be populated if the datasource explicitly specifies that field. This distinction is only meaningful for data without a TNRS match, as TNRS supplies canon_taxonverbatim.taxonomicname.
Aaron Marcuse-Kubitza
11:28 AM Revision 7419: import_all: after_import(): Added wait on tnrs.make's lockfile to ensure that all background scrubbing processes are complete before creating the analytical DB
Aaron Marcuse-Kubitza
11:18 AM Revision 7418: import_all: Moved `waitpid $jobs` into after_import()
Aaron Marcuse-Kubitza

02/01/2013

04:57 PM Revision 7417: schemas/vegbien.ERD.mwb: Fixed table sizes
Aaron Marcuse-Kubitza
04:51 PM Revision 7416: schemas/vegbien.ERD.mwb: Regenerated exports
Aaron Marcuse-Kubitza
04:34 PM Revision 7415: schemas/vegbien.sql: removed all accessioncode fields, as VegBIEN does not use them
Aaron Marcuse-Kubitza
03:10 PM Revision 7414: Added inputs/FIA/_src/FIADB_version4.accdb and FIADB_version4.sql (created from it using Access To PostgreSQL and the additional transformations at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Tools#MS-Access-database-MDB>)
Aaron Marcuse-Kubitza

01/31/2013

08:20 PM Revision 7413: Added inputs/FIA/COND_unique/, generated from new FIA data
Aaron Marcuse-Kubitza
08:05 PM Revision 7412: inputs/FIA/FIA_COND_unique/create.sql: Fixed bug where need to remove `CREATE TABLE :table AS` at beginning because that is added by the make target
Aaron Marcuse-Kubitza
08:03 PM Revision 7411: inputs/FIA/geoscrub.~.clean_up.sql: Moved creation of FIA_COND_unique to FIA_COND_unique/create.sql
Aaron Marcuse-Kubitza
07:40 PM Revision 7410: README.TXT: Full database import: Updated time until import_all returns control to the shell to account for the TNRS names now being imported concurrently with the inputs rather than before them
Aaron Marcuse-Kubitza
07:31 PM Revision 7409: mappings/VegCore-VegBIEN.csv: Also include morphospecies in the accepted taxondetermination's taxonverbatim, so that it can easily be retrieved by the analytical DB views
Aaron Marcuse-Kubitza
07:15 PM Revision 7408: schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Fixed bug where need to use the taxonName or scientificName when the name components are not provided, as is the case when there is no scrubbed taxondetermination (because TNRS returns no match)
Aaron Marcuse-Kubitza
06:08 PM Revision 7407: mappings/VegCore.csv: Regenerated from wiki. This adds Brad's DwC ID terms and their definitions in <https://projects.nceas.ucsb.edu/nceas/attachments/download/621/vegbien_identifier_examples.xlsx>.
Aaron Marcuse-Kubitza
05:06 PM Revision 7406: schemas/vegbien.ERD.mwb: Regenerated exports
Aaron Marcuse-Kubitza
04:04 PM Revision 7405: join: Added support for direct mappings to VegBIEN by passing through outputs that start with / (indicating an XPath rather than a term)
Aaron Marcuse-Kubitza
04:01 PM Revision 7404: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
11:38 AM Revision 7403: schemas/vegbien.sql: analytical_stem_view: Added family_matched, taxonName_matched, scientificNameAuthorship_matched
Aaron Marcuse-Kubitza
11:02 AM Revision 7402: schemas/vegbien.sql: analytical_stem_view: Added family_verbatim, scientificName_verbatim, scientificNameAuthorship_verbatim from datasource taxondetermination
Aaron Marcuse-Kubitza
10:57 AM Revision 7401: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
10:30 AM Revision 7400: schemas/vegbien.sql: analytical_stem_view: Fixed bug where need to use identifiedBy and dateIdentified from the *datasource* taxondetermination rather than the canonical taxondetermination (whichever taxondetermination is most scrubbed)
Aaron Marcuse-Kubitza
10:23 AM Revision 7399: schemas/vegbien.sql: taxondetermination: taxondetermination_set_iscurrent(): is_datasource_current: Fixed bug where need to filter out determinationtypes for matched/accepted determinations, which are not datasource determinations
Aaron Marcuse-Kubitza
10:19 AM Revision 7398: schemas/vegbien.sql: taxondetermination: taxondetermination_set_iscurrent(): Fixed bug where need to also set existing datasource_current taxondetermination's is_datasource_current to false
Aaron Marcuse-Kubitza
08:52 AM Revision 7397: xml_dom.py: replace_with_text(): Added support for all scalar (non-Node) types, which will be stringified using strings.ustr()
Aaron Marcuse-Kubitza
03:52 AM Revision 7396: schemas/functions.sql: Added _fix_date()
Aaron Marcuse-Kubitza
02:49 AM Revision 7395: sql_io.py: put_table(): Documented that much of the complexity of the normalizing algorithm is due to PostgreSQL not having a native command for insert/on duplicate select
Aaron Marcuse-Kubitza
02:24 AM Revision 7394: sql_io.py: put_table(): Corrected "insert/if not exists get" to "insert/on duplicate select"
Aaron Marcuse-Kubitza
01:52 AM Revision 7393: sql_io.py: put_table(): Removed no longer applicable requirement that it be run at the beginning of a transaction, which was only required when the output table was locked during the function call
Aaron Marcuse-Kubitza
01:48 AM Revision 7392: sql_io.py: put_table(): Documented that the function's insert/if not exists get algorithm does not support database triggers that populate fields covered by a unique constraint
Aaron Marcuse-Kubitza
01:42 AM Revision 7391: inputs/FIA/_src/_README.TXT: Documented that FIA does not provide data for some states, e.g. HI
Aaron Marcuse-Kubitza

01/30/2013

10:48 PM Revision 7390: config/: Set svn:ignore to exclude *password files
Aaron Marcuse-Kubitza
10:41 PM Revision 7389: Removing config/bien_read_password from version control
Aaron Marcuse-Kubitza
10:30 PM Revision 7388: Removing config/bien_password from version control
Aaron Marcuse-Kubitza

01/29/2013

03:26 PM Revision 7387: inputs/FIA/: Added refreshed data (not yet mapped)
Aaron Marcuse-Kubitza
03:15 PM Revision 7386: input.Makefile: Existing maps discovery: $(exts): Also match uppercase versions of extensions
Aaron Marcuse-Kubitza
03:12 PM Revision 7385: lib/common.Makefile: Added $(ucase) and $(ci)
Aaron Marcuse-Kubitza
01:56 PM Revision 7384: inputs/FIA/_src/Makefile: Table bundling: $(tableCsvs): Fixed bug where need to replace % with $* in $(csvPattern)
Aaron Marcuse-Kubitza
01:15 PM Revision 7383: inputs/FIA/_src/Makefile: Table bundling: Fixed bug where need to remove trailing slashes from dirs that will match a target pattern
Aaron Marcuse-Kubitza
01:09 PM Revision 7382: inputs/FIA/_src/Makefile: Added Table bundling targets to regroup CSVs by tables
Aaron Marcuse-Kubitza
01:09 PM Revision 7381: lib/common.Makefile: Added $(mkdir)
Aaron Marcuse-Kubitza
11:02 AM Revision 7380: Added inputs/FIA/_src/_README.TXT with Bob's comments
Aaron Marcuse-Kubitza
11:02 AM Revision 7379: input.Makefile: SVN: $(_svnFilesGlob): Added README.TXT
Aaron Marcuse-Kubitza
10:33 AM Revision 7378: mappings/VegCore.csv: Regenerated from wiki. Synonym lists have now been translated to sections to create a web page anchor for each synonym, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_refactoring#Index-synonyms-as-web-page-anchors>. This enables searching for VegCore synonyms in the data dictionary as well as terms, and makes it possible to swap a term and a synonym while still keeping both as indexed anchors.
Aaron Marcuse-Kubitza
06:19 AM Revision 7377: mappings/VegCore.csv: Regenerated from wiki. All uncategorized terms have now been moved to tables.
Aaron Marcuse-Kubitza
06:19 AM Revision 7376: README.TXT: Maintenance: VegCore data dictionary: Added steps to check that no terms were lost when moving terms
Aaron Marcuse-Kubitza

01/28/2013

05:13 PM Revision 7375: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
05:12 PM Revision 7374: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza

01/25/2013

03:54 PM Revision 7373: schemas/vegbien.sql: analytical_stem_view: coordinates: Only use county_centroids coordinates when datasource coordinates are not provided, not also when datasource coordinates aren't geovalid. This also fixes a bug where (NULL) county_centroids coordinates were used for non-geovalid coordinates even when there was no county_centroids match, rather than including the non-geovalid coordinates.
Aaron Marcuse-Kubitza
03:34 PM Revision 7372: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
11:27 AM Revision 7371: schemas/vegbien.sql: taxondetermination: Added is_datasource_current, which is autopopulated to the most recent *datasource-provided* taxondetermination
Aaron Marcuse-Kubitza
11:07 AM Revision 7370: schemas/vegbien.sql: taxondetermination: Added taxondetermination_single_accepted_determination unique index to facilitate joining on the accepted determination
Aaron Marcuse-Kubitza
11:05 AM Revision 7369: schemas/vegbien.sql: taxondetermination: Added taxondetermination_single_matched_determination unique index to facilitate joining on the matched determination
Aaron Marcuse-Kubitza
10:32 AM Revision 7368: schemas/vegbien.sql: taxondetermination: Removed notespublic, notesmgt, which are not used by VegBIEN
Aaron Marcuse-Kubitza
09:30 AM Revision 7367: schemas/vegbien.sql: taxon_trait_view: scientificName: Use taxonverbatim.taxonname when taxonlabel/taxonverbatim.taxonomicname are not provided, to accommodate TNRS names. This is part of the workaround for the bug where the taxonlabel's taxonomicname (concatenated taxonomicname) is occasionally not populated.
Aaron Marcuse-Kubitza
09:10 AM Revision 7366: schemas/vegbien.sql: taxon_trait_view: Added workaround for bug where the taxonlabel's taxonomicname (concatenated taxonomicname) is occasionally not populated due to a taxonlabel constraint violation, by using the taxonverbatim's taxonomicname instead in these cases. This bug, which appeared in the r7317 import, is so far not reproducible (tested on Mac OS X), so its cause is unknown, but may be caused by a bug in functions._merge_prefix(), which is run on the taxonlabel's taxonomicname but not the taxonverbatim's taxonomicname.
Aaron Marcuse-Kubitza

01/24/2013

09:51 PM Revision 7365: schemas/vegbien.sql: analytical_stem_view: Added dateIdentified, identificationRemarks per Brad's request (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Spot-checking#E-mail-on-2013-1-16)
Aaron Marcuse-Kubitza
09:40 PM Revision 7364: inputs/FIA/_src/Makefile: Added extraction targets to extract zip archives
Aaron Marcuse-Kubitza
09:07 PM Revision 7363: inputs/FIA/_src/download: Use new Makefile, which uses make logic to determine if a file needs to be downloaded
Aaron Marcuse-Kubitza
09:05 PM Revision 7362: Added inputs/FIA/_src/Makefile, with targets to download each zip archive
Aaron Marcuse-Kubitza
08:00 PM Revision 7361: schemas/vegbien.sql: analytical_stem_view: derived terms: Added _bien suffix per Brad's request (https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Spot-checking#Brad-Boyles-comments)
Aaron Marcuse-Kubitza
03:22 PM Revision 7360: Added inputs/FIA/_src/FIADB_version4.accdb.url
Aaron Marcuse-Kubitza
03:18 PM Revision 7359: inputs/FIA/_src/download: Only run wget on files that don't yet exist
Aaron Marcuse-Kubitza
03:16 PM Revision 7358: inputs/FIA/_src/download: Run wget in same directory as script to ensure files get downloaded there
Aaron Marcuse-Kubitza
03:06 PM Revision 7357: inputs/FIA/_src/download: Set svn:executable
Aaron Marcuse-Kubitza
03:04 PM Revision 7356: Added inputs/FIA/_src/download to download archives of CSVs for each state
Aaron Marcuse-Kubitza
03:03 PM Revision 7355: to_do/timeline.2013.xls: Updated with changes during conference call
Aaron Marcuse-Kubitza
09:46 AM Revision 7354: schemas/vegbien.sql: taxon_trait_view: Renamed datasource_taxonverbatim to taxonverbatim because there is now only one taxonverbatim
Aaron Marcuse-Kubitza
09:31 AM Revision 7353: schemas/vegbien.sql: taxon_trait_view: Moved the taxondetermination.iscurrent filter to the join condition to allow using the taxondetermination_single_current_determination index
Aaron Marcuse-Kubitza
09:24 AM Revision 7352: schemas/vegbien.sql: taxon_trait_view: Join only on the primary taxonlabel, not the accepted taxonlabel, because the scrubbed name is now available directly via the taxonlabel attached to the scrubbed taxondetermination
Aaron Marcuse-Kubitza
09:11 AM Revision 7351: schemas/vegbien.sql: analytical_stem_view: Added locality
Aaron Marcuse-Kubitza
08:18 AM Revision 7350: inputs/UNCC/Specimen/map.csv: accession: Remapped to catalogNumber per Bob's corrections
Aaron Marcuse-Kubitza

01/23/2013

10:31 PM Revision 7349: schemas/vegbien.ERD.mwb: Regenerated exports
Aaron Marcuse-Kubitza
10:25 PM Revision 7348: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
10:01 PM Revision 7347: README.TXT: Schema changes: Added instructions to run the appropriate sync function when changing the analytical views
Aaron Marcuse-Kubitza
09:56 PM Revision 7346: schemas/vegbien.sql: analytical_stem_view: Added georeferenceProtocol, which is set to 'county centroid' when county centroid coordinates are used
Aaron Marcuse-Kubitza
08:12 PM Revision 7345: make_analytical_db: Don't run export_analytical_db if the SQL script exits with an error
Aaron Marcuse-Kubitza
08:04 PM Revision 7344: README.TXT: Full database import: record the import times in inputs/import.stats.xls: Added `export version=<version>` because import_times may be run in a shell different from the one that the import was run in
Aaron Marcuse-Kubitza
08:03 PM Revision 7343: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza

01/22/2013

07:43 PM Revision 7342: schemas/vegbien.sql: taxonverbatim: taxonverbatim_unique: Added morphoname for cases when there is just a morphoname, and to distinguish taxonverbatims with the same taxonlabel but different morphonames
Aaron Marcuse-Kubitza
07:43 PM Revision 7341: schemas/vegbien.sql: stemobservation: Added stemobservation_non_empty CHECK constraint to prevent creating an empty stemobservation for plantobservation rows without stem *data* but with stem *mappings*
Aaron Marcuse-Kubitza
07:36 PM Revision 7340: schemas/vegbien.sql: stemobservation: Added stemobservation_non_empty CHECK constraint to prevent creating an empty stemobservation for plantobservation rows without stem *data* but with stem *mappings*
Aaron Marcuse-Kubitza
07:34 PM Revision 7339: schemas/vegbien.sql: stemobservation: Added stemobservation_non_empty CHECK constraint to prevent creating an empty stemobservation for plantobservation rows without stem *data* but with stem *mappings*
Aaron Marcuse-Kubitza
07:16 PM Revision 7338: schemas/vegbien.sql: taxonverbatim: taxonverbatim_unique: Added morphoname for cases when there is just a morphoname, and to distinguish taxonverbatims with the same taxonlabel but different morphonames
Aaron Marcuse-Kubitza
07:11 PM Revision 7337: schemas/vegbien.sql: taxonverbatim: Allow taxonlabel_id to be NULL when morphoname is provided
Aaron Marcuse-Kubitza
07:09 PM Revision 7336: schemas/vegbien.sql: taxonverbatim: Allow taxonlabel_id to be NULL when morphoname is provided
Aaron Marcuse-Kubitza
07:04 PM Revision 7335: schemas/vegbien.sql: taxonverbatim: Added source_id to allow creating taxonverbatims without a (scoping) taxonlabel
Aaron Marcuse-Kubitza
05:34 PM Revision 7334: schemas/vegbien.sql: analytical_stem_view: Removed speciesBinomialWithMorphospecies now that it's duplicated by scientificNameWithMorphospecies
Aaron Marcuse-Kubitza
05:28 PM Revision 7333: schemas/vegbien.sql: analytical_stem_view: scientificNameWithMorphospecies: Create it using the speciesBinomialWithMorphospecies formula, per Brad's request at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Spot-checking#2013-1-18>
Aaron Marcuse-Kubitza
05:05 PM Revision 7332: schemas/vegbien.sql: analytical_stem_view: Added coordinateSource to indicate whether coordinates are from county_centroids (georeferencing) or the source data
Aaron Marcuse-Kubitza
05:00 PM Revision 7331: schemas/vegbien.sql: Added coordinatesource enum
Aaron Marcuse-Kubitza
04:50 PM Revision 7330: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
04:34 PM Revision 7329: schemas/vegbien.sql: analytical_stem_view: coordinates: Also use the county_centroids coordinates when the datasource coordinates are not geovalid. (Note that canon_place.geovalid will be NULL, i.e. not true, when the datasource coordinates are NULL.)
Aaron Marcuse-Kubitza
04:28 PM Revision 7328: schemas/vegbien.sql: scientificName: Set to taxonverbatim.taxonname instead per Brad's changes at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Spot-checking#2013-1-18>. Renamed to taxonName since this now doesn't include the author, which is part of DwC's scientificName field.
Aaron Marcuse-Kubitza
03:55 PM Revision 7327: schemas/vegbien.sql: sync_analytical_stem_to_view(): Support running the function when dependent views do not exist. This allows using the sync function when changing column names of the analytical_stem_view, which sometimes requires manually dropping and re-creating the analytical_aggregate_view.
Aaron Marcuse-Kubitza
02:49 PM Revision 7326: backups/Makefile: %.md5/test: Added comment to run with `make -s` to avoid echoing make commands
Aaron Marcuse-Kubitza
02:42 PM Revision 7325: README.TXT: Full database import: Added steps to scrub unscrubbed taxondeterminations (if they are not scrubbed automatically)
Aaron Marcuse-Kubitza
02:06 PM Revision 7324: inputs/.geoscrub/_src/README.TXT: Added e-mails from Jim about how the county_centroids data was generated
Aaron Marcuse-Kubitza
01:18 PM Revision 7323: schemas/vegbien.sql: analytical_stem_view: coordinates: Use new county_centroids coordinates and uncertainty when the datasource's coordinates are not available
Aaron Marcuse-Kubitza
01:10 PM Revision 7322: Added inputs/.geoscrub/county_centroids/ from Jim
Aaron Marcuse-Kubitza
01:09 PM Revision 7321: inputs/.geoscrub/import_order.txt: Added geoscrub_output
Aaron Marcuse-Kubitza
12:24 PM Revision 7320: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
12:19 PM Revision 7319: README.TXT: Full database import: In PostgreSQL: Added step to check that there are TNRS taxondeterminations
Aaron Marcuse-Kubitza
12:17 PM Revision 7318: README.TXT: Full database import: In PostgreSQL: Added step to check that unscrubbed_taxondetermination_view returns no rows
Aaron Marcuse-Kubitza

01/18/2013

02:37 PM Revision 7317: Added inputs/newWorld/newWorldCountries/_no_import
Aaron Marcuse-Kubitza
02:33 PM Revision 7316: to_do/timeline.2013.xls: Updated with Brad's modifications
Aaron Marcuse-Kubitza
02:18 PM Revision 7315: Added inputs/FIA/_src/FIA_summary.b-e.00079.pdf from Bob
Aaron Marcuse-Kubitza
02:07 PM Revision 7314: Added inputs/.herbaria/_archive/
Aaron Marcuse-Kubitza
01:02 PM Revision 7313: inputs/.herbaria/: Removed no longer needed geoscrub.*.sql, which has been replaced with bien3_adb.*.sql
Aaron Marcuse-Kubitza
01:00 PM Revision 7312: inputs/.herbaria/: Removed no longer needed herbaria/. Use ih/ instead.
Aaron Marcuse-Kubitza
12:58 PM Revision 7311: Added inputs/.herbaria/ih/ and corresponding bien3_adb MySQL export
Aaron Marcuse-Kubitza
12:43 PM Revision 7310: mappings/VegCore-VegBIEN.csv: Don't create NCBI crosslinks for the matched taxonomic name. These crosslinks are no longer needed now that TNRS provides a separate accepted name on which crosslinks can be made.
Aaron Marcuse-Kubitza
12:32 PM Revision 7309: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Include the accepted name's row next to the matched name's row instead of merging the two together into one TNRS row, to allow including separate taxondeterminations for the matched and accepted names. Added Max_score from TNRS.tnrs.
Aaron Marcuse-Kubitza
12:25 PM Revision 7308: schemas/vegbien.sql: taxondetermination_set_iscurrent(): Added new determinationtype accepted to sort order
Aaron Marcuse-Kubitza
12:01 PM Revision 7307: mappings/VegCore-VegBIEN.csv: Mapped accepted* taxonomic name, now to separate accepted taxondetermination
Aaron Marcuse-Kubitza
11:35 AM Revision 7306: mappings/VegCore.csv: Regenerated from wiki
Aaron Marcuse-Kubitza
11:20 AM Revision 7305: schemas/vegbien.sql: taxondetermination_set_iscurrent(): Changed TNRS determinationtype from computer to matched, to allow for a separate accepted determinationtype
Aaron Marcuse-Kubitza
10:57 AM Revision 7304: schemas/vegbien.sql: taxonlabel: Removed creationdate, which duplicates taxondetermination.determinationdate
Aaron Marcuse-Kubitza
10:08 AM Revision 7303: schemas/vegbien.sql: analytical_stem_view: isNewWorld: Removed no longer needed COALESCE() to false, because newWorldCountries now uses false where applicable instead of NULL. This also ensures that isNewWorld will be NULL if there is no country name to test, which was not the case in the previous workaround.
Aaron Marcuse-Kubitza
10:02 AM Revision 7302: Added inputs/newWorld/newWorldCountries/ with postprocess.sql that sets isNewWorld to false wherever it's NULL. (The input table only marks New World countries as true, but doesn't mark non-New World countries as false.)
Aaron Marcuse-Kubitza
09:50 AM Revision 7301: schemas/vegbien.sql: analytical_stem_view: isNewWorld: Fixed bug where need to COALESCE() "newWorldCountries"."isNewWorld" to false, because it is only set to a boolean for countries that are New World
Aaron Marcuse-Kubitza
09:19 AM Revision 7300: README.TXT: Full database import: freeing disk space: Updated import schema size, which is smaller due to the removed CTFS staging tables, removed duplicate rows, and possibly fewer index holes
Aaron Marcuse-Kubitza
08:56 AM Revision 7299: README.TXT: Full database import: After running `make schemas/$version/publish`, added `unset version` to make sure future version-dependent commands use the public schema
Aaron Marcuse-Kubitza
08:50 AM Revision 7298: schemas/vegbien.sql: taxon_trait_view: Fixed bug where measurementUnit needed to be set to trait.units, not name
Aaron Marcuse-Kubitza
08:42 AM Revision 7297: schemas/vegbien.sql: provider_count_view: Don't set default values for sourcetype/observationtype, because the appropriate values are now set for all top-level inputs and these defaults are not applicable for data owners not in geoscrub.herbaria
Aaron Marcuse-Kubitza
08:41 AM Revision 7296: inputs/bien2_traits/Source/map.csv: Mapped observationType
Aaron Marcuse-Kubitza
08:27 AM Revision 7295: schemas/vegbien.sql: taxondetermination: Removed taxondetermination_computer_min_fit CHECK constraint, whose functionality is now duplicated by unscrubbed_taxondetermination_view's Max_score filter condition. The score threshold value should only be maintained in one place, namely unscrubbed_taxondetermination_view.
Aaron Marcuse-Kubitza
08:23 AM Revision 7294: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to filter out any names that will be rejected by taxondetermination's constraints, because otherwise, these names will stay in unscrubbed_taxondetermination_view and be repeatedly reimported
Aaron Marcuse-Kubitza
07:38 AM Revision 7293: inputs/.TNRS/schema.sql: tnrs: Added Max_score column for use in filtering out names that will be rejected by taxondetermination's constraints
Aaron Marcuse-Kubitza
07:22 AM Revision 7292: inputs/.TNRS/schema.sql: Renamed tnrs_populate_accepted_scientific_name() trigger to tnrs_populate_derived_fields() to accommodate additional derived fields
Aaron Marcuse-Kubitza
07:14 AM Revision 7291: tnrs_db: Support multiple appended columns in the tnrs table
Aaron Marcuse-Kubitza
07:13 AM Revision 7290: csvs.py: ColInsertFilter: Support adding multiple, consecutive columns
Aaron Marcuse-Kubitza
06:30 AM Revision 7289: schemas/functions.sql: _max(), _min(): Put $n params all on one line to match other aggregating functions
Aaron Marcuse-Kubitza
06:28 AM Revision 7288: schemas/functions.sql: _max(), _min(): Use PostgreSQL built-in functions GREATEST(), LEAST() instead of a query with aggregating functions
Aaron Marcuse-Kubitza
06:02 AM Revision 7287: README.TXT: Added Single datasource import section with commands to import/reimport/scrub just a datasource rather than the full DB
Aaron Marcuse-Kubitza
05:54 AM Revision 7286: schemas/vegbien.sql: taxondetermination: taxondetermination_set_iscurrent_on_delete() trigger: Fixed bug where need to suppress any foreign key exception, which occurs during a cascading delete because the associated taxonoccurrence has already been deleted, preventing any other taxondeterminations of that taxonoccurrence from being updated
Aaron Marcuse-Kubitza
05:35 AM Revision 7285: input.Makefile: Taxonomic scrubbing: Added reimport_scrub
Aaron Marcuse-Kubitza
05:34 AM Revision 7284: input.Makefile: Import to VegBIEN: Added reimport
Aaron Marcuse-Kubitza
05:28 AM Revision 7283: input.Makefile: Taxonomic scrubbing: Added rescrub
Aaron Marcuse-Kubitza
05:21 AM Revision 7282: input.Makefile: Taxonomic scrubbing: Added scrub target and use it in import_scrub
Aaron Marcuse-Kubitza
05:18 AM Revision 7281: input.Makefile: Import to VegBIEN: Moved import, rm to top of section since they are top-level targets and don't depend on the variables defined for %/import
Aaron Marcuse-Kubitza
05:17 AM Revision 7280: input.Makefile: Moved rm to Import to VegBIEN section
Aaron Marcuse-Kubitza
05:16 AM Revision 7279: input.Makefile: Moved taxonomic scrubbing targets to separate Taxonomic scrubbing section
Aaron Marcuse-Kubitza
04:43 AM Revision 7278: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
03:34 AM Revision 7277: schemas/vegbien.sql: provider_count_view: Include only sources with at least one row. Currently (as of r7023), all entries in BIEN2's geoscrub.herbaria are also in VegBIEN, so the filter is not yet necessary, but switching to bien3_adb.ih could create source entries without data rows which should be excluded from the providers list.
Aaron Marcuse-Kubitza
03:25 AM Revision 7276: import_all: Output the PIDs of the import_scrub and after_import processes, so those processes can be managed without shell job control. This is useful if the connection is lost to the remote shell running the import, which prevents using job control on the import processes.
Aaron Marcuse-Kubitza
01:23 AM Revision 7275: input.Makefile: Import to VegBIEN: import_scrub: Run `make scrub` in the background, to allow the import to continue with the next table rather than having to wait for the current table to be scrubbed
Aaron Marcuse-Kubitza
12:53 AM Revision 7274: inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Moved waitself call to top of script
Aaron Marcuse-Kubitza
12:52 AM Revision 7273: inputs/import.stats.xls: Updated import times
Aaron Marcuse-Kubitza
12:24 AM Revision 7272: inputs/import.stats.xls: Added Postprocessing section for use with the next import
Aaron Marcuse-Kubitza
12:05 AM Revision 7271: inputs/import.stats.xls: Updated import times. Total does not yet include postprocessing.
Aaron Marcuse-Kubitza

01/17/2013

11:29 PM Revision 7270: import_times: Add blank line before \"Postprocessing logs\" to separate it from the input logs
Aaron Marcuse-Kubitza
11:28 PM Revision 7269: import_times: Separate out the postprocessing logs (e.g. public.unscrubbed_taxondetermination_view), as the import times in these logs are not aggregated together (each input has its own run of the postprocessing script)
Aaron Marcuse-Kubitza

01/16/2013

02:55 PM Revision 7268: root Makefile: Datasources: import: Use new import_scrub instead of import (input.Makefile)
Aaron Marcuse-Kubitza
02:51 PM Revision 7267: import_all: Use new import_scrub (input.Makefile) instead of import, which avoids needing to start background processes for tnrs-remake and scrub-remake
Aaron Marcuse-Kubitza
02:50 PM Revision 7266: inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need to use tnrs.make's lockfile instead because can't be importing while tnrs.make is scrubbing. tnrs.make leaves tnrs in an incomplete state while running because the accepted names are parsed *after* their matched names. Using a separate lockfile would cause some accepted names to be missing.
Aaron Marcuse-Kubitza
02:27 PM Revision 7265: input.Makefile: Import to VegBIEN: Added import_scrub, which runs `make scrub` after the import
Aaron Marcuse-Kubitza
02:26 PM Revision 7264: root Makefile: Datasources: Added scrub, which runs tnrs-remake and scrub-remake
Aaron Marcuse-Kubitza
02:18 PM Revision 7263: inputs/.TNRS/*/*.make: Only allow one instance of the script to be running at any time, by using new waitself
Aaron Marcuse-Kubitza
02:15 PM Revision 7262: waitpid, lockfile: Changed $interval default to 5s to work with smaller imports, where less waiting is needed
Aaron Marcuse-Kubitza
02:14 PM Revision 7261: Added waitself
Aaron Marcuse-Kubitza
02:11 PM Revision 7260: bin/lockfile: Include the PID in the lockfile to avoid the need to manually remove lockfiles. On Mac, this requires using shlock instead of lockfile.
Aaron Marcuse-Kubitza
01:35 PM Revision 7259: Added bin/lockfile
Aaron Marcuse-Kubitza
01:34 PM Revision 7258: Added pid2name
Aaron Marcuse-Kubitza
01:33 PM Revision 7257: Added name2pids
Aaron Marcuse-Kubitza
01:33 PM Revision 7256: waitpid: Use `ps` instead of /proc to also work on Mac
Aaron Marcuse-Kubitza
01:07 PM Revision 7255: inputs/.TNRS/tnrs/tnrs.make: Fixed bug where need special handling to support being run as a .make script
Aaron Marcuse-Kubitza
11:59 AM Revision 7254: inputs/.geoscrub/_src/README.TXT: Added dates for e-mails from Jim
Aaron Marcuse-Kubitza
11:57 AM Revision 7253: inputs/.geoscrub/_src/README.TXT: Added e-mail from Jim about repository with scripts to generate the geoscrub_output table
Aaron Marcuse-Kubitza
11:02 AM Revision 7252: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Fixed bug where need to use tnrs_accepted.Name_submitted IS NOT NULL rather than tnrs_accepted.* IS NOT NULL, because tnrs_accepted.* (which plain tnrs_accepted gets changed to by PostgreSQL) checks *each field* of the tnrs_accepted tuple rather than checking if the tuple itself is NULL
Aaron Marcuse-Kubitza
10:23 AM Revision 7251: inputs/.TNRS/schema.sql: Added back tnrs+accepted view, which is useful for debugging the import of the TNRS results
Aaron Marcuse-Kubitza
09:21 AM Revision 7250: inputs/REMIB/Specimen/postprocess.sql: Added back ARIZ, NY because some REMIB specimens for these datasources are not yet in the datasources themselves
Aaron Marcuse-Kubitza
08:43 AM Revision 7249: Added inputs/REMIB/Specimen/postprocess.sql to remove institutions that we have direct data for
Aaron Marcuse-Kubitza
08:43 AM Revision 7248: Placed inputs/REMIB/_archive/ under version control
Aaron Marcuse-Kubitza
08:23 AM Revision 7247: Added inputs/SpeciesLink/Specimen/postprocess.sql to remove institutions that we have direct data for
Aaron Marcuse-Kubitza
08:21 AM Revision 7246: Placed inputs/SpeciesLink/_archive/ under version control
Aaron Marcuse-Kubitza
07:56 AM Revision 7245: input.Makefile: $(import?): Renamed $public_import option to $full_import because it applies to any import of all datasources, not just a public import on vegbiendev
Aaron Marcuse-Kubitza
07:23 AM Revision 7244: schemas/vegbien.sql: analytical_stem_view: Changed `WHERE COALESCE(taxondetermination.iscurrent, true)` to a join condition to enable using the taxondetermination_single_current_determination index, which produces the filtered rows directly. Note that this index will not be used for full-database imports, because the query planner uses hash joins everywhere instead of nested loops.
Aaron Marcuse-Kubitza
06:47 AM Revision 7243: db_xml.py: put_table(): Fixed bug where for views, shouldn't advance start (OFFSET clause) after each chunk, because views are typically dynamic and will contain a new set of rows after the first set is imported
Aaron Marcuse-Kubitza
06:41 AM Revision 7242: sql.py: Added view_exists()
Aaron Marcuse-Kubitza
06:16 AM Revision 7241: inputs/.TNRS/schema.sql: Removed no longer used tnrs_canon. unscrubbed_taxondetermination_view uses its definition directly instead.
Aaron Marcuse-Kubitza
06:14 AM Revision 7240: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon
Aaron Marcuse-Kubitza
06:12 AM Revision 7239: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Added comment from tnrs_canon
Aaron Marcuse-Kubitza
06:09 AM Revision 7238: schemas/vegbien.sql: unscrubbed_taxondetermination_view: Do the tnrs_canon joins manually instead of using tnrs_canon, to allow PostgreSQL to use a nested loop join on just the needed tnrs rows instead of a hash self-join of all tnrs rows. The query planner is not yet advanced enough to automatically integrate the select on the view into the top-level joins list, which would make this change automatically.
Aaron Marcuse-Kubitza
05:52 AM Revision 7237: inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Look at last 100 rows instead of last 10, because rows are added to the log file each time the script waits and the Inserted # new rows message must be in the tailed rows
Aaron Marcuse-Kubitza
05:48 AM Revision 7236: inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: rowsAdded(): Fixed bug where need to test if log file exists before using it in tail, because if tail fails and causes rowsAdded to return false, this error exit status will be indistinguishable from false for no rows added and the script will keep going
Aaron Marcuse-Kubitza
05:40 AM Revision 7235: inputs/.TNRS/public.unscrubbed_taxondetermination_view/scrub.make: Fixed bug where need special handling to support being run as a .make script
Aaron Marcuse-Kubitza
03:35 AM Revision 7234: input.Makefile: Editing import: Added unscrub to remove TNRS taxondeterminations
Aaron Marcuse-Kubitza
03:34 AM Revision 7233: psql_script_vegbien: Added no_query_results option to hide results of calls to void functions
Aaron Marcuse-Kubitza
03:33 AM Revision 7232: schemas/vegbien.sql: Added delete_scrubbed_taxondeterminations()
Aaron Marcuse-Kubitza
01:43 AM Revision 7231: root Makefile: python-Darwin: Added instructions to install dateutil for Python 3 as well as Python 2, for use in PL/Python functions
Aaron Marcuse-Kubitza
01:42 AM Revision 7230: root Makefile: python-Darwin: Added note that Python 2 comes preinstalled
Aaron Marcuse-Kubitza
01:15 AM Revision 7229: Added inputs/GBIF/Specimen/postprocess.sql to remove institutions that we have direct data for
Aaron Marcuse-Kubitza
 

Also available in: Atom