Project

General

Profile

VegCore refactoring

Merge the Order within table column with the Source URL

In mappings/VegCore.csv, mappings/Veg+-VegCore.csv:

  1. Open in Excel/LibreOffice
  2. Copy the last two columns to a text editor (which decodes the CSV to TSV)
  3. Search for regexp ^(?:\S+\t(?!\S*$)|(?:\S*, ){1}\S*\t(?!(?:\S*, ){1}\S*$)|(?:\S*, ){2}\S*\t(?!(?:\S*, ){2}\S*$)|(?:\S*, ){3}\S*\t(?!(?:\S*, ){3}\S*$)|(?:\S*, ){4}\S*\t(?!(?:\S*, ){4}\S*$))
    which matches rows with a mismatched (or empty) sort order field
    1. Fix each match's sort order field so there is one entry for every URL
  4. Replace regexp ^((?:!\S*, )*)([^\s!]+?(?:#[^\s!]*?|(?=, )))([\w:#-]*(?:, .*?)?\t)((?=.)\d*)(?:, )?
    with $1!$2($4)$3
    repeatedly, until no replacements are made
  5. Search for regexp terms/#(?!\()
    1. Check that no matches are terms (categories are OK)
  6. Replace text ! with nothing
  7. Replace text () with nothing
  8. Replace regexp (?<=(?<!salvias_data_dictionary\.html)[#/]\()(?=\d{1}\))
    with 0
  9. Replace regexp (?<=http://rs\.tdwg\.org/dwc/terms/#\()(?=\d{2}\))
    with 0
  10. In Excel, delete the selection. This ensures that pasting over the previous text replaces it, including when the new field is empty.
  11. Paste the text editor text at the beginning of the selection.
  12. Check that the Order within table column is empty
  13. Save
  14. make mappings/
  15. Check diffs
  16. Commit
  17. Delete the Order within table column
  18. Save
  19. make mappings/
  20. Check diffs
  21. Commit

Scope DwC sort order by category

In mappings/VegCore.csv, mappings/Veg+-VegCore.csv:

  1. Open in Excel/LibreOffice
  2. Copy the last column to a text editor (which decodes the CSV to TSV)
  3. Replace regexp (\(0:Record-level\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 0)
  4. Replace regexp (\(1:Occurrence\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 19)
  5. Replace regexp (\(2:Event\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 41)
  6. Replace regexp (\(3:dcterms:Location\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 56)
  7. Replace regexp (\(4:GeologicalContext\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 100)
  8. Replace regexp (\(5:Identification\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 118)
  9. Replace regexp (\(6:Taxon\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 126)
  10. Replace regexp (\(7:ResourceRelationship\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 159)
  11. Replace regexp (\(8:MeasurementOrFact\)\()(\d{3})(?=\))
    with BeanShell snippet _1 + (Integer.parseInt(_2) - 166)
  12. Replace regexp (terms/#\(\d+:.*?\))\((?=\d\))
    with $1(0
  13. Save
  14. make mappings/
  15. Check diffs
  16. Commit

Indent fields inside tables

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp ^(p|h[2-9])(\(*\. )
    with $1(($2
  3. For each h1. section, unindent the paragraphs by ((
  4. Replace regexp ^(?:p|h[3-9])(?=\(*\. )
    with $0(
  5. In sections at the top and bottom of the page, remove the indents
  6. Copy contents back
  7. Save
  8. Regenerate the VegCore terms list

Change single, identically-named sources to synonyms

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    (h2\(\(\. \[\[VegCore#(.*?)\|\2\]\](?:
    
    p\(*\. .*)*
    
    p\(*\. \*)Sources(:\* "DwC):\2(":http://rs\.tdwg\.org/dwc/terms/#\2)$
    
    with $1Used by$3$4
  3. Copy contents back
  4. Save
  5. Regenerate the VegCore terms list

Change dcterms sources to synonyms

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    \*Sources:\* "DwC:(dcterms:(\w+))":http://rs\.tdwg\.org/dwc/terms/#dcterms:\2
    
    with
    *Synonyms:*
    
    h3(((((((. dcterms_$2
    
    p(((((((((. From: "DwC":http://rs.tdwg.org/dwc/terms/#$1
    
  3. Copy contents back
  4. Save
  5. Regenerate the VegCore terms list

Change Sources to specific label

Edit VegCore:

  1. Copy contents to a text editor
  2. For each occurrence of Sources:
    1. Replace it with
      
      p(((((. *<label>:* 
      
      where <label> is one of Synonyms, Related, or Named like
  3. Copy contents back
  4. Save
  5. Regenerate the VegCore terms list

Change single, identically-named synonyms to Froms

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    (h2\(*\. \[\[VegCore#(.*?)\|\??\2\]\](?:
    
    p\(*\. .*)*)((
    
    p\(*\. )Synonyms(:)) ("\w+):\2(":\w+://\S+), (.*)$
    
    with
    $1$4From$5 $6$7$3 $8
    
  3. Replace regexp
    (h2\(*\. \[\[VegCore#(.*?)\|\??\2\]\](?:
    
    p\(*\. .*)*
    
    p\(*\. )Synonyms(: "\w+):\2(":\w+://\S+)$
    
    with $1From$3$4
  4. Copy contents back
  5. Save
  6. Regenerate the VegCore terms list

Index synonyms as web page anchors

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp ^(p(\(*)\. Synonyms: )(?:(.+), )?("[^"]*):([^"]*?)(":.*?)$
    with
    $1$3
    
    h3$2((. [[VegCore#$5|$5]]
    
    p$2(((((. From: $4$6
    

    repeatedly, until no replacements are made
  3. Search for Synonyms and for each match:
    1. Merge synonyms of the same name
    2. If there is a synonym with the same name as the term, merge it with the main term
  4. Remove trailing whitespace after Synonyms:
  5. Copy contents back
  6. Save
  7. Regenerate the VegCore terms list

Hyperlink all term names

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    ^(h\d\(+\. )((?!VegCore$)\w+)$
    
    with
    $1[[VegCore#$2|$2]]
    
  3. Copy contents back
  4. Save
  5. Regenerate the VegCore terms list

Change Related terms to VegCore terms

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    ([\w ]+: (?:.*, )?)"(?:DwC|TCS):([a-z]\w*)":\S*\2\b
    
    with
    $1[[VegCore#$2|$2]]
    

    repeatedly, until no replacements are made
  3. Search for Related:"
    and for each match:
    1. Make sure each term is a VegCore term or synonym
  4. Copy contents back
  5. Save
  6. Regenerate the VegCore terms list

Group terms that only differ in their units

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    ^h2\(\(\. \[\[VegCore#((\w+)_[a-z0-9]+)\|\1\]\](((?:
    
    (?:p|h3)\(*\. .*)*?)(?:
    
    p\(*\. Comments: .*)?)
    
    h2\(\(\. \[\[VegCore#(\2_[a-z0-9]+)\|\5\]\]\3?((?:
    
    (?:p|h3)\(*\. (?!Comments).*)*)(?:
    
    p\(*\. Comments: .*)?
    
    with
    h2((. [[VegCore#$2|?$2]]
    
    p(((((. _Requires units_$6$4
    
    p(((((. Unit alternatives:
    
    h4(((((((. [[VegCore#$1|$1]]
    
    h4(((((((. [[VegCore#$5|$5]]
    
  3. In the SoilObservation section:
    1. Replace regexp
      ^p\(*\. Related: "SALVIAS:.*\n\n
      
      with ""
    2. Replace text Related
      with Synonyms
  4. Copy contents back
  5. Save
  6. Regenerate the VegCore terms list

Reformat synonyms/alternatives

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    (^p\(*\. (.*?)s?:(?=
    
    (h[34]))
    (?:
    (?:\3|h[34]|p).*
    )*
    \3\(*?)\(\(\((\. )(?=\[)(?:(.*
    
    p)?\(\()?
    
    with
    $1$4!{padding: 0 0.3em 3pt 0.65em; font-weight: normal;}_($2:)! $5
    
    repeatedly, until no replacements are made
  3. Make sure there are no occurrences of regexp
    h3\(*\. (?=\[)
    
  4. Replace regexp
    ^p\(*\. (.*):
    
    (?=h[34])
    
    with ""
  5. Copy contents back
  6. Save
  7. Replace regexp
    ^(h\d\(*\. .*)(
    (?:
    p.*(?:
    (?![hp]).*)*
    )*
    p\(*\. From: )(\(?)(?:([^",()\n]+)|"(\w+)(?:\([^)]*\))?([^"]*)"(:\S*)(?<![,)])|(,))(\)?) ?
    
    with
    $1 !{font-size: small;}_($3$4$5$6$8$9)!$7$2
    
    repeatedly, until no replacements are made
  8. Replace regexp
    ^p\(*\. From: \n\n
    
    with ""
  9. Make sure there are no occurrences of text _(()!
  10. Make sure there are no occurrences of text _())!
  11. Replace text ]] from !{font-size
  12. Replace text _(,)!
    with _(|)!
  13. Replace regexp
    (\()\[\[(CTFS)(#\w+)\]\](\)!)
    
    with
    $1$2$4:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/CTFS$3
    
  14. Manually fix other occurrences of _([[
    using the formula above
  15. Copy contents back
  16. Save
  17. Replace regexp
    (?<=\])( !.*)?
    
    p\(*\. _(\S+(?: \S+)?)_$
    
    with
     !{padding-left: 1em; font-size: small; font-weight: normal; font-style: italic;}_($2)!$1
    
  18. Replace regexp
    ^(h3.*)font-size: small; 
    
    with
    $1
    
    repeatedly, until no replacements are made
  19. Copy contents back
  20. Save
  21. Regenerate the VegCore downloads

Add links to the ambiguous terms for alternatives

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    Synonym:\)! \[\[VegCore#(([^\W_]+)_(?!DMS)[A-Z]\w+)\|\1\]\].*
    
    with
    $0 !{padding-left: 1em; font-weight: normal;}_(alternative of)! !_(?$2)!:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_data_dictionary#$2
    
  3. Copy contents back
  4. Save

Move related terms onto the same line for terms without their own from

Edit VegCore's Taxon section:

  1. Copy contents to a text editor
  2. Replace regexp
    ^(h2.*)
    
    p\(*\. Related: \[\[VegCore#(\w+)\|\2\]\]$
    
    with
    $1 !{padding-left: 1em; font-size: small; font-weight: normal;}_(analogous to)! !{font-size: small;}_($2)!:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_data_dictionary#$2
    
  3. Copy contents back
  4. Save

Wikify all references to VegCore terms

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    (?<![\[#|/])\b(?!VegCore)(?:[A-Z]+|[A-Z]?[a-z]+(?:[A-Z][a-z]+)+)[\w]+)(?<!ions)(?=s?\b)
    
    with
    [[VegCore#$0|$0]]
    
  3. Copy contents back
  4. Save

Include just the entity's name in citations, with other reference info in the URL

Edit VegCore:

  1. Copy contents to a text editor
  2. Replace regexp
    e-mail from ("[^"]+":mailto:\S+) on ([\d-]+)
    
    with
    $1/$2
    
  3. Replace regexp
    !\{font-size: small;\}_\(e-mail from\)! (!\{font-size: small;\}_\([^)]+\)!:mailto:\S+) !\{font-size: small;\}_\(on ([\d-]+)\)!
    
    with
    $1/$2
    
  4. Copy contents back
  5. Save