Project

General

Profile

CVS validation

1 critical issues, 2 non-critical issues, 15 feature requests

critical issues

  1. #3: "It was good to see communityConcept.name__@VegX__.communityDet@vegbiendev.nceas.ucsb.edu populated, but some critical associated fields were not present, including the Community Code and the fit and confidence values. Community code is very important and is needed for both VegBank and CVS. At a minimum we need the CommunityCode from CVS, and preferably Fit and confidence."

non-critical issues

  1. #8: "Cover values are given as the midpoint of a range with no indication of the range of the bin? A cover value of 0.505 seems very precise, but it is really the bin #2 in the CVS scale corresponding to 0.1-1% cover. This needs to be indicated in some way."
  2. #7: "Do were really want to discard all soil data?"

priority feature requests

  1. validate whether confidential data are removed properly: "For lack of an easy solution I did not check whether confidential data are being kept confidential. This is something Mike Lee could check with a more complete download when he returns from DownUnder in middle September."
  2. map the projectContributor table
  3. add project name to validation view (it is already in the normalized DB)
  4. fix CVS embargoes (issues for the December CSV release have been fixed)
  5. populate area from observation.taxonObservationArea when Plot.area is not specified
  6. some stems are duplicated

feature requests

  1. (Bob, Mike Lee) map STRATUM_ID as subplot ID
    - the subplot ID is STRATUM_ID for those stratum entries that have stratumType.stratumName = 'module'
  2. include STRATUM_ID in validation view (only needed once subplot ID mapped to it)
  3. add a column that omits the family from the taxon name
  4. store the original plant name from taxonObservation.authorPlantName
  5. (Bob) map natural keys instead of numeric IDs
  6. (from VegBank) (Mike Lee) do not use taxoninterpretations with interpretationtype = "simplification for analysis" as the current interpretation
  7. (from VegBank) (Mike Lee: syntax issue) rename the input scientificName to scientificName_verbatim
  8. (from VegBank) (Mike Lee: needed if doing concept-based taxonomy; Bob: would hate to lose that info if we have it, but OK not to validate this) map plantconcept.reference_id to accordingTo
  9. (from VegBank) (Mike Lee; Bob: OK with validating just the most recent interpretation for each taxonobservation; Brad: might have to live with validating just one taxoninterpretation per taxonobservation in the interest of moving on) include all taxonInterpretations in validation extract

completed

  1. #2: "the fields georeferenceProtocol__@DwC__@vegbiendev.nceas.ucsb.edu & “geovalid_bien” [...] were blank for all records and I suspect they relate to geovalidation, in which case they should not be blank." this is issue #950
  2. #5: "For every record “identifiedBy__@DwC__@vegbiendev.nceas.ucsb.edu” has the value “Robert Peet” and “dateIdentified__@DwC__@vegbiendev.nceas.ucsb.edu” has a value of “10/1/2008”"
  3. #9: "The admittedly weird modules and subplots of the CVS protocol are very badly handled [...] We can simplify by ignoring subplot data."
  4. #4: "Morphospecies have non-alphabetical characters scrubbed out. For example Hypericum [graveolens + mitchellianum] is rendered as Hypericum [graveolens mitchellianum]. Those extra characters in the morphospecies need to be retained."
  5. #1: "locality__@DwC__@vegbiendev.nceas.ucsb.edu: In some cases (eg 040-04-0144) this is actually presenting “Location Narrative”, and not “Author Location” as needed, whereas in other cases where Location narrative is not populated then we see the Author location data as needed. This should be strictly “Author Location”."
  6. #6: "I see many data lines duplicated for no obvious reason" this is issue #948
  7. "There were some fields where I did not recognize the names, but they appeared blank. Not sure what to make of these: For example, what are occurrenceID__@DwC__@vegbiendev.nceas.ucsb.edu and recordedBy__@DwC__@vegbiendev.nceas.ucsb.edu" occurrenceID is only for specimens, but recordedBy should definitely be populated if there is a collector
  8. remove duplication of communities in list: e.g. row 1035 sort_col 230881 is {CEGL003760,CEGL003760}
  9. project name is needed for attribution
  10. add project_name
  11. some taxa appear to be mismapped or mistranslated by TNRS
  12. use author location in addition to locationNarrative as the locality description
  13. individualCount should be NULL instead of defaulting to 1 when not specified
  14. map VegBank taxoninterpretation.currentinterpretation -> taxondetermination.iscurrent
  15. map coordinateUncertaintyInMeters (fixed itself)
  16. add slopeAspect, slopeGradient to denormalized view
  17. omit authorplantname because it is not specific to the taxoninterpretation row (this is in a separate taxoninterpretation for the original determination instead)
  18. map taxoninterpretation.party_id to identifiedBy
  19. add stemCount to denormalized view (we actually call this individualCount, because for us, stemCount is the count of stems within the individual, not the size class)
  20. 3 fields that I'd include normally were missing in VegBIEN. Perhaps not absolutely CRITICAL, but certainly best-practice to include, especially locationAccuracy
  21. multiple taxonInterpretations (always problematic) aren't being ported [they are to VegBIEN, just not to the validation view]
  22. taxonInterpretation record that is chosen for each taxonObservation also appears not to be correct
  23. Community classification is missing
  24. The person collecting the plot is not shown this would be in observationContributor .
  25. State is wrong, not Wyoming, but Tennessee
  26. County is incorrect (not Powell, but Orange)
  27. CoordinateUncertaintyInMeters is missing
  28. Species occur in various strata, but the strata are not indicated

2014-10-x

→ Bob's document revised after the conference call

2014-10-3 conference call on CVS issues

upcoming

  • Bob will touch base with Aaron next Wednesday after the traits data refresh is done

availability

  • Bob is here next week, but will be gone after 10/14
  • Brian E will be at McGill University

to do for Mike Lee

  • send corrected CVS data by end of next week (10/10)
    • fix identifiedBy/dateIdentified (issue #5 below)
    • don't include subplots (issue #9 below)

to do for Bob/Mike Lee

  • talk to Brad about TNRS bug (issue #4 below)

to do for Aaron

  1. send Mike Lee a link to the VegBIEN analytical DB schema
  2. on 10/13 (late in the day), provide extract of whatever has been done
    • Bob will be gone after 10/14 and won't be available to review extracts (but Mike Lee will be available to do this)
  3. once everything on list done, provide another extract the non-critical issues have been postponed, so we will just provide an extract when the critical issues are fixed

decisions

priorities

  1. essential: #1,10
    high: # 9,4,6 ,3
  2. #8
  3. #7

?: #2,5

issue priorities

using the issue #s from Bob's revised document below

  1. has to be done; high priority; Brian E: a top priority; Bob: #1 on priorities list
  2. defer to Aaron how important it is
  3. a #2 priority; low priority high priority
    • _update from Bob:_
      > Does this mean it is now a high priority to populate the CEGL community codes?
      
      ==RKP:  I thpough CEGL was already a high priority.  It should be.
      
  4. a #1 priority; assigned to Brad/Bob; they will see if the problem is in Brad's code [it is]
  5. Mike Lee will fix
  6. done
  7. do this last; #3 priority
  8. level 2 priority; Brian E: a top priority
  9. #1 priority; Bob: highest priority; Mike Lee will fix
  10. essential; #1 priority

issue clarifications

  1. 2 separate issues:
    1. critical: author_location is sometimes missing from the locality
    2. location_narrative is included in the locality, but should be omitted because it is too detailed/not useful for finding the site the location_narrative actually contains useful directions to the site that can help in finding it again, so we should include it
  2. applies to other datasources as well
  3. 2 tasks:
    1. add community_ID to analytical DB
    2. add community_determination's confidence, fit to VegBIEN
  4. ask Brad whether TNRS strips non-alpha chars [it does]
    • "It appears our best solution is for you to routinely convert any occurrence of + in a name string to @ prior to submission to the TNRS, and then convert back to + after TNRS. The @ sign is sufficiently rare in name strings that we do not anticipate problems." (e-mail from Bob)
    • to rescrub names containing +:
      1. delete names containing + from TNRS cache:
        DELETE FROM "TNRS".taxon_match WHERE "*Name_submitted" IN (
        SELECT "*Name_submitted" FROM "TNRS".taxon_match WHERE "*Name_submitted" LIKE '%+%'
        );
        Query returned successfully: 2374 rows affected, 33913 ms execution time.
        795 names removed: "Took 0:01:12.420972 sec/795 name(s) = 91.1 ms/name" 
        
      2. run TNRS:
        make scrub & # runtime: 9 min ("8m38.097s")
        
  5. reload CVS after Mike Lee fixes this in the input data
    • e-mails from Mike Lee:
      I am in the process of transferring to Aaron via WeTransfer the CVS database with [...] dates fixed on the taxonInterpretation (determinations) table.
      
      > Was the identifiedBy also fixed?
      
      Yes, everything is fixed in the CVS database as best we can get it.  There is no longer a huge list of Bob interpreting everything on the same date.  There are some intermediate updates where we don't have the exact person and date of update, but the original interpretations have been reset to match our best understanding.
      
    • there are now a wide variety of identifiedBy and dateIdentified values:
      SELECT DISTINCT "identifiedBy" FROM "CVS"."taxonObservation_" ORDER BY "identifiedBy";
      
      SELECT DISTINCT "dateIdentified" FROM "CVS"."taxonObservation_" ORDER BY "dateIdentified";
      
  6. rerun the export to verify that the duplication is gone
  7. at end of priority list
  8. add plot-level cover scale, cover scale system to analytical DB
    • cover values should be stored as a range
    • tabled for now: no conclusion as to how to represent this in the schema
    • e-mail from Bob:
      > do you still want me to map the coverMethod and coverIndex tables in VegBank and CVS?
      
      This is certainly desirable, but not high priority.
      
  9. Mike Lee will provide refresh that excludes subplots
    • "I am in the process of transferring to Aaron via WeTransfer the CVS database with subplots completely removed (and stems aggregated to the plot level)" (e-mail from Mike Lee)
    • previously, the subplots were stored as strata: "CVS Modules: A stratum method that treats modules [subplots] as strata for ease of database management (rather than subplots)" (CVS.stratumMethod[stratumMethodName=CVS Modules]{stratumMethodDescription})
      all the stratumTypes that correspond to the CVS Modules stratumMethod have now been removed.
  10. the plot name is supposedly a required field in the VegBank input data [VegBank.plot.locationName is NOT NULL], so this is a bug in our scripts

Bob's revised document:

Issues with CVS plot data.
  1. locality__@DwC__@vegbiendev.nceas.ucsb.edu: In some cases (eg 040-04-0144) this is actually presenting “Location Narrative”, and not “Author Location” as needed, whereas in other cases where Location narrative is not populated then we see the Author location data as needed. This should be strictly “Author Location”. [Seems easy to fix]
  2. I do not know the definitions of the fields georeferenceProtocol__@DwC__@vegbiendev.nceas.ucsb.edu & “geovalid_bien”. However, they were blank for all records and I suspect they relate to geovalidation, in which case they should not be blank. [Seems simple, but not sure. Need a report from Aaron. I think this is a problem with other datasets as well.]
  3. It was good to see communityConcept.name__@VegX__.communityDet@vegbiendev.nceas.ucsb.edu populated, but some critical associated fields were not present, including the Community Code and the fit and confidence values. Community code is very important and is needed for both VegBank and CVS. [At a minimum we need the CommunityCode from CVS, and preferably Fit and confidence. In VegBank Fit and confidence are in the CommunityInterpretation table. Not sure if CommunityCode is to be found in VegBank – need to ask Michael.] duplicat
  4. Morphospecies have non-alphabetical characters scrubbed out. For example Hypericum [graveolens + mitchellianum] is rendered as Hypericum [graveolens mitchellianum]. Those extra characters in the morphospecies need to be retained. This may be a problem with all morphospecies in BIEN. [Seems easy]
  5. For every record “identifiedBy__@DwC__@vegbiendev.nceas.ucsb.edu” has the value “Robert Peet” and “dateIdentified__@DwC__@vegbiendev.nceas.ucsb.edu” has a value of “10/1/2008”. I think currently you are using the person who contributed the dataset (Peet) and the date of the contribution of the dataset (2008). I need clearer definitions of these two fields, but it appears they refer to the person who Identified the plant and on which date. These fields do have a home in CVS and we need to point you to these. If for some reason the field is blank the default should be the person who collected the plot and the date of the collection. [Seems easy]
  6. I see many data lines duplicated for no obvious reason [?Symptom of deeper problem?]
  7. Do were really want to discard all soil data? [I prefer to retain, though this is a bit awkward in the VegBank model. Discuss.]
  8. Cover values are given as the midpoint of a range with no indication of the range of the bin? A cover value of 0.505 seems very precise, but it is really the bin #2 in the CVS scale corresponding to 0.1-1% cover. This needs to be indicated in some way. It may be a problem with all the cover values in BIEN. [Need to discuss.]
  9. The admittedly weird modules and subplots of the CVS protocol are very badly handled. In the current version it is not uncommon for a species to have on the order of 15 records recorded for a single plot. Some of these have different numbers of individuals and some have difference cover values. Unfortunately, all are recorded as having the areas of the full plot, but a module is usually smaller than a plot , say .01 rather than .1 ha. The area associated with the species and cover or count can refer to the module or the full plot. In addition there can be separate records for different size classes of trees with no indication of either the size or the area associated with the record. This we need to discuss so that you understand the structure of the data and whether it all needs to be retained.
    [This is very complicated. We can simplify by ignoring subplot data. We will need to walk through this with Aaron, Michael and myself, and possibly Brad and Brian]
  10. [This is new. I observe that some, but not all, plots in VegBank dump do not have the plot name populated (plotName__@VegX__.plot@vegbiendev.nceas.ucsb.edu). I have not seen this problem in CVS, but then I have seen only a small dump of data]

Bob's document, as later prioritized by Bob:

Here is a rehash of issues with CVS. Several of these are likely to apply to Vegbank as well. I view the first 5 as critical and 6-9 as important. [+emphasis]

If you wish me to call you in the morning soon, let me know when you are available and which numbers to try. My cell might work here = 919-368-4971

Best,
Bob

Issues with CVS plot data.
  1. locality__@DwC__@vegbiendev.nceas.ucsb.edu: In some cases (eg 040-04-0144) this is actually presenting “Location Narrative”, and not “Author Location” as needed, whereas in other cases where Location narrative is not populated then we see the Author location data as needed. This should be strictly “Author Location”.
  2. The admittedly weird modules and subplots of the CVS protocol are very badly handled. In the current version it is not uncommon for a species to have on the order of 15 records recorded for a single plot. Some of these have different numbers of individuals and some have difference cover values. Unfortunately, all are recorded as having the areas of the full plot, but a module is usually smaller than a plot , say .01 rather than .1 ha. The area associated with the species and cover or count can refer to the module or the full plot. In addition there can be separate records for different size classes of trees with no indication of either the size or the area associated with the record. This we need to discuss so that you understand the structure of the data and whether it all needs to be retained.
  3. Morphospecies have non-alphabetical characters scrubbed out. For example Hypericum [graveolens + mitchellianum] is rendered as Hypericum [graveolens mitchellianum]. Those extra characters in the morphospecies need to be retained. This may be a problem with all morphospecies in BIEN.
  4. It was good to see communityConcept.name__@VegX__.communityDet@vegbiendev.nceas.ucsb.edu populated, but some critical associated fields were not present, including the Community Code and the fit and confidence values. Community code is very important and is needed for both VegBank and CVS.
  5. For every record “identifiedBy__@DwC__@vegbiendev.nceas.ucsb.edu” has the value “Robert Peet” and “dateIdentified__@DwC__@vegbiendev.nceas.ucsb.edu” has a value of “10/1/2008”. I think currently you are using the person who contributed the dataset (Peet) and the date of the contribution of the dataset (2008). I need clearer definitions of these two fields, but it appears they refer to the person who Identified the plant and on which date. These fields do have a home in CVS and we need to point you to these. If for some reason the field is blank the default should be the person who collected the plot and the date of the collection
  6. I see many data lines duplicated for no obvious reason
  7. Do were really want to discard all soil data?
  8. Cover values are given as the midpoint of a range with no indication of the range of the bin? A cover value of 0.505 seems very precise, but it is really the bin #2 in the CVS scale corresponding to 0.1-1% cover. This needs to be indicated in some way. It may be a problem with all the cover values in BIEN.
  9. I do not know the definitions of the fields georeferenceProtocol__@DwC__@vegbiendev.nceas.ucsb.edu & “geovalid_bien”. However, they were blank for all records and I suspect they relate to geovalidation, in which case they should not be blank.

2013-12-17

Bob's conference call feedback

OK with no more validation extracts [for VegBank/CVS]

extract

*CVS.2013-12-17.9_plots.xls*
(input and output data are in separate tabs. refer to the VegCore data dictionary for column definitions.)

subset import command

time yes|(export log= version=CVS_VegBIEN; make schemas/$version/reinstall; make inputs/CVS/{Source,'^taxon_observation.**.sample'}/import_temp by_col=1 n=; make inputs/CVS/{observationContributor_,observation_community}/import_temp by_col=1 n=; make inputs/CVS/scrub; make inputs/CVS/publish; bin/make_analytical_db) # runtime: 1.5 min ("1m23.726s") @starscream; 3.5 min ("3m31.441s") @vegbiendev

query

SET search_path TO "CVS_VegBIEN"; -- needed for locationevent__contributors(), locationevent__communities() for now
SELECT *
FROM      "CVS"."^taxon_observation.**.sample" 
LEFT JOIN "CVS_VegBIEN".analytical_plot ON
    analytical_plot."datasource"                     = 'CVS'
AND analytical_plot."taxonOccurrenceID"              = "^taxon_observation.**.sample"."taxonOccurrenceID"::text
AND analytical_plot."aggregateOrganismObservationID" = "^taxon_observation.**.sample"."aggregateOrganismObservationID" 
ORDER BY "^taxon_observation.**.sample"."locationName", "^taxon_observation.**.sample"."identificationID", "^taxon_observation.**.sample"."aggregateOrganismObservationID" 

2013-12-4

Mike Lee's e-mail feedback on attribution:[Bien-db]+Primary+plot+data+providers+and+projects+in+BIEN3+

If the VegBank[+CVS] project name could be captured, that would be ideal. The person responsible for the data is not always clear in VegBank, but it should be either in the observationContributor table and/or the projectContributor table.

So we will need to map the projectContributor table as well? Is this also true for CVS?

For VegBank, that would be great. It is not currently populated in CVS, so that won't provide any additional information at present. [but it is populated, with 119 rows]

Bob's e-mail feedback on attribution:[Bien-db]+Primary+plot+data+providers+and+projects+in+BIEN3+

whether the project PI is a critical attribution field, or whether the plot collectors are sufficient attribution for December (I think the plot collectors list would usually include the project PI)

Project is essential.

Mike Lee's e-mail feedback:[Bien-db]+CVS+validation+extract+

CVS uses strata to store subplots. These "strata" are "subplots" which are horizontally rather than vertically (or life-form) delineated. This is much easier when managing data, as it avoids duplication of geocoordinates, plot contributors, etc.

is it the case that the subplot ID is STRATUM_ID for those stratum entries that have stratumType.stratumName = 'module'? Do you store relative plot coordinates for subplots?

There is a particular method associated with modules and the module is the only stratum type for it.

There are no relative coordinates.

I have finished the CVS validation. There are four errors, two very small, one indeterminate (is it really an error?), the other only affects one point, but it is more serious.

AREA (minor)

The first is the the area is not propagating to BIEN when it is not filled out at all locations in CVS. Plot.area is presumably used for area, and for complicated reasons this is not always filled in. If it is missing, observation.taxonObservationArea should be used. [Is taxonObservationArea always populated? If so, we could just use it instead. (Merging this with Plot.area may be tricky because they are in different tables.)] This was missing for two plots. If this is problematic, we could populate it on the CVS side and send you a new database, but that sounds more challenging overall to me, let me know if that is not the case. [That may be easiest at this point.]

[FEATURE REQUEST] COMMUNITIES (minor)
[FEATURE REQUEST] are duplicated 3 times on each row, listing the name three times separated by commas.

DUPLICATED STEMS (indeterminate)

The indeterminate issue is there are 5 records that are duplicated, but I can't be sure that this is the denormalization or if in fact they are duplicated. all the stems on 001-04-0226 for Acer rubrum L. are duplicated, no others are.

MAPPING TAXA (more serious, or at least misunderstood by me)

The more serious issue is the mapping of taxa. Of the 243 record submitted from CVS, 19 have errors in translating the species names. Part of this may be that I don't understand how it is happening in vegBien, or which fields to check.

[FEATURE REQUEST] VegBIEN is showing names concatenating family and genus like "Anacardiaceae Toxicodendron" when it enounters presumably a name that it doesn't recognize. Just the genus name would be preferable in my opinion [We prepend the family because it is needed by TNRS to resolve homonyms. We can add a column that omits the family, if you like (however, I would not see this as critical to have for the December deadline).] when you get to Nyssa sylvatica {swamp variety}, Viola [blanda + incognita], or Rubus sect. Dewberry.

Others are legitimate species that had this happen: Toxicodendron radicans var. radicans, Eupatorium steelei, or Dulichium arundinaceum var. arundinaceum. In these cases, the name matched goes down to genus or family. It seems to me records like "Celtis laevigata Willd. var. reticulata (Torr.) L. Benson" could go to species or certainly genus, but it is now being mapped to family. [I don't see that name in the *CVS extract*. Is that from VegBank instead?]

[FEATURE REQUEST] My preference would be that the original name for each plant be captured and stored (the name in taxonObservation.authorPlantName) just as it is. Whatever name we call this isn't critical, but capturing it is. I don't see these names anywhere in the VegBIEN output, just the matched and filtered names.

The full list of mismatches is *attached* as a tab-delimited text file. Please let me know if any of this is unclear.

extract

*CVS.2013-12-4.7_plots.xls*
(input and output data are in separate tabs. refer to the VegCore data dictionary for column definitions.)

subset import command

time (export log= version=CVS_VegBIEN; make schemas/$version/reinstall; make inputs/CVS/{Source,'^taxon_observation.**.sample'}/import_temp by_col=1 n=; make inputs/CVS/{observationContributor_,observation_community}/import_temp by_col=1 n=; make inputs/CVS/scrub; make inputs/CVS/publish; bin/make_analytical_db) # runtime: 1.5 min ("1m23.726s") @starscream; 3 min ("3m8.031s") @vegbiendev

query

SET search_path TO "CVS_VegBIEN"; -- needed for locationevent__contributors(), locationevent__communities() for now
SELECT *
FROM      "CVS"."^taxon_observation.**.sample" 
LEFT JOIN "CVS_VegBIEN".analytical_plot ON
    analytical_plot."datasource"                     = 'CVS'
AND analytical_plot."taxonOccurrenceID"              = "^taxon_observation.**.sample"."taxonOccurrenceID"::text
AND analytical_plot."aggregateOrganismObservationID" = "^taxon_observation.**.sample"."aggregateOrganismObservationID" 
WHERE "^taxon_observation.**.sample"."locationName" IN ('005-02-0301', '041-09-0577', '088-08-1204', '114-01-0043', '001-04-0226', '067-ANGE-6', '052-02-080')
ORDER BY "^taxon_observation.**.sample"."locationName", "^taxon_observation.**.sample"."identificationID", "^taxon_observation.**.sample"."aggregateOrganismObservationID" 

2013-11-26

Bob's e-mail feedback:[Bien-db]+CVS+validation+extract+

here are some preliminary thoughts.

  1. I am worried about the number of fields populated only by a meaningless foreign key.
    [The problem is that the VegBank tables themselves are connected only by the numeric IDs, so in many cases we need to include these in VegBIEN in order for separately-imported tables to match up. Creating a full natural key for every table would require pulling the parent table's natural key value for the corresponding numeric ID, which is non-trivial. (This would likely happen in an autopopulation trigger.)]
  2. I am worried that plot area is not consistently captured.
    [The plots that are missing this in VegBIEN are also missing it in the staging tables, so I think this is actually a problem in the source data that that field wasn't specified.]
  3. I am uncertain as to how nested plots are being handled.
    [They are included as strata instead of plots, since this is how CVS provides them (see Mike Lee's e-mail below)]
  4. I do not see the observers listed
    [That is a known issue that has not yet been fixed (this first round omitted some fixes in order to send it out before Thanksgiving).]
  5. I do not see the author location recorded
    [That's true, it looks like we should be using that instead of locationNarrative as the locality description (unlike VegBank, which uses locationNarrative).]
  6. Soil data is inconsistently filled in
    [KNOWN FEATURE REQUEST: Are you referring to the input tab, which contains the staging tables? We haven't yet mapped soil data through to the validation view (although some of it should be in normalized VegBIEN).]
  7. I do not see the assignments to community types
    [As above, that is a known issue that has not yet been fixed.]

plots to include

from Mike Lee on 2013-11-21:needed+by+next+Tuesday)

005-02-0301
041-09-0577
088-08-1204
114-01-0043
001-04-0226
067-ANGE-6
052-02-0804

from Mike Lee on 2013-12-16:

I made a mistake in my original list of plots I sent you to validate. We have none that have confidentiality in them. Could we please add (authorPlotCodes):

003-03-0077
and
067-DAVY-19
to the validation list?

This way we will be sure to validate that the correct lat/long fields are being used.

extract

*CVS.2013-11-26.7_plots.xls*
(input and output data are in separate tabs. refer to the VegCore data dictionary for column definitions.)

subset import command

time (export log= version=CVS_VegBIEN; make schemas/$version/reinstall; make inputs/CVS/{Source,'^taxon_observation.**.sample'}/import_temp by_col=1 n=; echo rem: make inputs/CVS/{observationcontributor_,observation__community}/import_temp by_col=1 n=; make inputs/CVS/scrub; make inputs/CVS/publish; bin/make_analytical_db) # runtime: 1.5 min ("1m23.333s") @starscream; 4 min ("4m6.181s") @vegbiendev

query

SET search_path TO "CVS_VegBIEN"; -- needed for locationevent__contributors(), locationevent__communities() for now
SELECT *
FROM      "CVS"."^taxon_observation.**.sample" 
LEFT JOIN "CVS_VegBIEN".analytical_plot ON
    analytical_plot."datasource"                     = 'CVS'
AND analytical_plot."taxonOccurrenceID"              = "^taxon_observation.**.sample"."taxonOccurrenceID"::text
AND analytical_plot."aggregateOrganismObservationID" = "^taxon_observation.**.sample"."aggregateOrganismObservationID" 
WHERE "^taxon_observation.**.sample"."locationName" IN ('005-02-0301', '041-09-0577', '088-08-1204', '114-01-0043', '001-04-0226', '067-ANGE-6', '052-02-080')
ORDER BY "^taxon_observation.**.sample"."identificationID", "^taxon_observation.**.sample"."aggregateOrganismObservationID"