VegCore refactoring¶
- Table of contents
- VegCore refactoring
- Merge the Order within table column with the Source URL
- Scope DwC sort order by category
- Indent fields inside tables
- Change single, identically-named sources to synonyms
- Change dcterms sources to synonyms
- Change Sources to specific label
- Change single, identically-named synonyms to Froms
- Index synonyms as web page anchors
- Hyperlink all term names
- Change Related terms to VegCore terms
- Group terms that only differ in their units
- Reformat synonyms/alternatives
- Add links to the ambiguous terms for alternatives
- Move related terms onto the same line for terms without their own from
- Wikify all references to VegCore terms
- Include just the entity's name in citations, with other reference info in the URL
Merge the Order within table column with the Source URL¶
In mappings/VegCore.csv
, mappings/Veg+-VegCore.csv
:
- Open in Excel/LibreOffice
- Copy the last two columns to a text editor (which decodes the CSV to TSV)
- Search for regexp
^(?:\S+\t(?!\S*$)|(?:\S*, ){1}\S*\t(?!(?:\S*, ){1}\S*$)|(?:\S*, ){2}\S*\t(?!(?:\S*, ){2}\S*$)|(?:\S*, ){3}\S*\t(?!(?:\S*, ){3}\S*$)|(?:\S*, ){4}\S*\t(?!(?:\S*, ){4}\S*$))
which matches rows with a mismatched (or empty) sort order field- Fix each match's sort order field so there is one entry for every URL
- Replace regexp
^((?:!\S*, )*)([^\s!]+?(?:#[^\s!]*?|(?=, )))([\w:#-]*(?:, .*?)?\t)((?=.)\d*)(?:, )?
with$1!$2($4)$3
repeatedly, until no replacements are made - Search for regexp
terms/#(?!\()
- Check that no matches are terms (categories are OK)
- Replace text
!
with nothing - Replace text
()
with nothing - Replace regexp
(?<=(?<!salvias_data_dictionary\.html)[#/]\()(?=\d{1}\))
with0
- Replace regexp
(?<=http://rs\.tdwg\.org/dwc/terms/#\()(?=\d{2}\))
with0
- In Excel, delete the selection. This ensures that pasting over the previous text replaces it, including when the new field is empty.
- Paste the text editor text at the beginning of the selection.
- Check that the Order within table column is empty
- Save
make mappings/
- Check diffs
- Commit
- Delete the Order within table column
- Save
make mappings/
- Check diffs
- Commit
Scope DwC sort order by category¶
In mappings/VegCore.csv
, mappings/Veg+-VegCore.csv
:
- Open in Excel/LibreOffice
- Copy the last column to a text editor (which decodes the CSV to TSV)
- Replace regexp
(\(0:Record-level\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 0)
- Replace regexp
(\(1:Occurrence\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 19)
- Replace regexp
(\(2:Event\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 41)
- Replace regexp
(\(3:dcterms:Location\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 56)
- Replace regexp
(\(4:GeologicalContext\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 100)
- Replace regexp
(\(5:Identification\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 118)
- Replace regexp
(\(6:Taxon\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 126)
- Replace regexp
(\(7:ResourceRelationship\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 159)
- Replace regexp
(\(8:MeasurementOrFact\)\()(\d{3})(?=\))
with BeanShell snippet_1 + (Integer.parseInt(_2) - 166)
- Replace regexp
(terms/#\(\d+:.*?\))\((?=\d\))
with$1(0
- Save
make mappings/
- Check diffs
- Commit
Indent fields inside tables¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
^(p|h[2-9])(\(*\. )
with$1(($2
- For each
h1.
section, unindent the paragraphs by (( - Replace regexp
^(?:p|h[3-9])(?=\(*\. )
with$0(
- In sections at the top and bottom of the page, remove the indents
- Copy contents back
- Save
- Regenerate the VegCore terms list
Change single, identically-named sources to synonyms¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
(h2\(\(\. \[\[VegCore#(.*?)\|\2\]\](?: p\(*\. .*)* p\(*\. \*)Sources(:\* "DwC):\2(":http://rs\.tdwg\.org/dwc/terms/#\2)$
with$1Used by$3$4
- Copy contents back
- Save
- Regenerate the VegCore terms list
Change dcterms sources to synonyms¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
\*Sources:\* "DwC:(dcterms:(\w+))":http://rs\.tdwg\.org/dwc/terms/#dcterms:\2
with*Synonyms:* h3(((((((. dcterms_$2 p(((((((((. From: "DwC":http://rs.tdwg.org/dwc/terms/#$1
- Copy contents back
- Save
- Regenerate the VegCore terms list
Change Sources to specific label¶
Edit VegCore:
- Copy contents to a text editor
- For each occurrence of
Sources
:- Replace it with
p(((((. *<label>:*
where<label>
is one ofSynonyms
,Related
, orNamed like
- Replace it with
- Copy contents back
- Save
- Regenerate the VegCore terms list
Change single, identically-named synonyms to Froms¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
(h2\(*\. \[\[VegCore#(.*?)\|\??\2\]\](?: p\(*\. .*)*)(( p\(*\. )Synonyms(:)) ("\w+):\2(":\w+://\S+), (.*)$
with$1$4From$5 $6$7$3 $8
- Replace regexp
(h2\(*\. \[\[VegCore#(.*?)\|\??\2\]\](?: p\(*\. .*)* p\(*\. )Synonyms(: "\w+):\2(":\w+://\S+)$
with$1From$3$4
- Copy contents back
- Save
- Regenerate the VegCore terms list
Index synonyms as web page anchors¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
^(p(\(*)\. Synonyms: )(?:(.+), )?("[^"]*):([^"]*?)(":.*?)$
with$1$3 h3$2((. [[VegCore#$5|$5]] p$2(((((. From: $4$6
repeatedly, until no replacements are made - Search for
Synonyms
and for each match:- Merge synonyms of the same name
- If there is a synonym with the same name as the term, merge it with the main term
- Remove trailing whitespace after
Synonyms:
- Copy contents back
- Save
- Regenerate the VegCore terms list
Hyperlink all term names¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
^(h\d\(+\. )((?!VegCore$)\w+)$
with$1[[VegCore#$2|$2]]
- Copy contents back
- Save
- Regenerate the VegCore terms list
Change Related terms to VegCore terms¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
([\w ]+: (?:.*, )?)"(?:DwC|TCS):([a-z]\w*)":\S*\2\b
with$1[[VegCore#$2|$2]]
repeatedly, until no replacements are made - Search for
Related:"
and for each match:- Make sure each term is a VegCore term or synonym
- Copy contents back
- Save
- Regenerate the VegCore terms list
Group terms that only differ in their units¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
^h2\(\(\. \[\[VegCore#((\w+)_[a-z0-9]+)\|\1\]\](((?: (?:p|h3)\(*\. .*)*?)(?: p\(*\. Comments: .*)?) h2\(\(\. \[\[VegCore#(\2_[a-z0-9]+)\|\5\]\]\3?((?: (?:p|h3)\(*\. (?!Comments).*)*)(?: p\(*\. Comments: .*)?
withh2((. [[VegCore#$2|?$2]] p(((((. _Requires units_$6$4 p(((((. Unit alternatives: h4(((((((. [[VegCore#$1|$1]] h4(((((((. [[VegCore#$5|$5]]
- In the SoilObservation section:
- Replace regexp
^p\(*\. Related: "SALVIAS:.*\n\n
with""
- Replace text
Related
withSynonyms
- Replace regexp
- Copy contents back
- Save
- Regenerate the VegCore terms list
Reformat synonyms/alternatives¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
(^p\(*\. (.*?)s?:(?= (h[34])) (?: (?:\3|h[34]|p).* )* \3\(*?)\(\(\((\. )(?=\[)(?:(.* p)?\(\()?
with$1$4!{padding: 0 0.3em 3pt 0.65em; font-weight: normal;}_($2:)! $5
repeatedly, until no replacements are made - Make sure there are no occurrences of regexp
h3\(*\. (?=\[)
- Replace regexp
^p\(*\. (.*): (?=h[34])
with""
- Copy contents back
- Save
- Replace regexp
^(h\d\(*\. .*)( (?: p.*(?: (?![hp]).*)* )* p\(*\. From: )(\(?)(?:([^",()\n]+)|"(\w+)(?:\([^)]*\))?([^"]*)"(:\S*)(?<![,)])|(,))(\)?) ?
with$1 !{font-size: small;}_($3$4$5$6$8$9)!$7$2
repeatedly, until no replacements are made - Replace regexp
^p\(*\. From: \n\n
with""
- Make sure there are no occurrences of text
_(()!
- Make sure there are no occurrences of text
_())!
- Replace text
]] !{font-size
- Replace text
_(,)!
with_(|)!
- Replace regexp
(\()\[\[(CTFS)(#\w+)\]\](\)!)
with$1$2$4:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/CTFS$3
- Manually fix other occurrences of
_([[
using the formula above - Copy contents back
- Save
- Replace regexp
(?<=\])( !.*)? p\(*\. _(\S+(?: \S+)?)_$
with!{padding-left: 1em; font-size: small; font-weight: normal; font-style: italic;}_($2)!$1
- Replace regexp
^(h3.*)font-size: small;
with$1
repeatedly, until no replacements are made - Copy contents back
- Save
- Regenerate the VegCore downloads
Add links to the ambiguous terms for alternatives¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
Synonym:\)! \[\[VegCore#(([^\W_]+)_(?!DMS)[A-Z]\w+)\|\1\]\].*
with$0 !{padding-left: 1em; font-weight: normal;}_(alternative of)! !_(?$2)!:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_data_dictionary#$2
- Copy contents back
- Save
Move related terms onto the same line for terms without their own from¶
Edit VegCore's Taxon section:
- Copy contents to a text editor
- Replace regexp
^(h2.*) p\(*\. Related: \[\[VegCore#(\w+)\|\2\]\]$
with$1 !{padding-left: 1em; font-size: small; font-weight: normal;}_(analogous to)! !{font-size: small;}_($2)!:https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegCore_data_dictionary#$2
- Copy contents back
- Save
Wikify all references to VegCore terms¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
(?<![\[#|/])\b(?!VegCore)(?:[A-Z]+|[A-Z]?[a-z]+(?:[A-Z][a-z]+)+)[\w]+)(?<!ions)(?=s?\b)
with[[VegCore#$0|$0]]
- Copy contents back
- Save
Include just the entity's name in citations, with other reference info in the URL¶
Edit VegCore:
- Copy contents to a text editor
- Replace regexp
e-mail from ("[^"]+":mailto:\S+) on ([\d-]+)
with$1/$2
- Replace regexp
!\{font-size: small;\}_\(e-mail from\)! (!\{font-size: small;\}_\([^)]+\)!:mailto:\S+) !\{font-size: small;\}_\(on ([\d-]+)\)!
with$1/$2
- Copy contents back
- Save