Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1784 04/03/2012 05:45 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping

1783 04/03/2012 05:43 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps

1782 04/03/2012 05:41 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace

1781 04/03/2012 05:40 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV: Built maps/src.%.csv

1780 04/03/2012 05:24 PM Aaron Marcuse-Kubitza

Added inputs/ACAD/maps/src.specimens.csv

1779 04/03/2012 05:23 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.

1778 04/03/2012 05:22 PM Aaron Marcuse-Kubitza

Added src_map to make a source map spreadsheet from a CSV header

1777 04/03/2012 04:32 PM Aaron Marcuse-Kubitza

input.Makefile: Split Maps section into "Existing maps discovery" and "Maps building" sections. Sources: Added cat, cat-% to cat out sources.

1776 04/03/2012 04:17 PM Aaron Marcuse-Kubitza

input.Makefile: Factored out sources-related code to new Sources section

1775 04/03/2012 04:08 PM Aaron Marcuse-Kubitza

input.Makefile: $(srcMaps): Removed `$(filter-out maps/src.join.%.csv,...)` because maps/src.join.%.csv are no longer created

1774 04/03/2012 03:47 PM Aaron Marcuse-Kubitza

README.TXT: Schema changes: Split updating graphical ERD exports into separate section. Update graphical ERD exports: Added schemas/vegbien.ERD.core.pdf .

1773 04/03/2012 03:42 PM Aaron Marcuse-Kubitza

README.TXT: Added Datasource setup section with instructions to add a new datasource

1772 04/03/2012 03:38 PM Aaron Marcuse-Kubitza

Added inputs/ACAD

1771 04/03/2012 03:37 PM Aaron Marcuse-Kubitza

input.Makefile: Only setSvnIgnore the input dir, since it already exists and doesn't need to be added (inputs/Makefile adds it)

1770 04/03/2012 03:23 PM Aaron Marcuse-Kubitza

inputs/*/maps/DwC.specimens.csv: Removed extranenous XML meta info from DwC column root, since it now just needs to be present in the core via map mappings/DwC-VegBIEN.specimens.csv

1769 04/03/2012 03:22 PM Aaron Marcuse-Kubitza

union: Use new maps.merge_headers() to write properly combined header

1768 04/03/2012 03:21 PM Aaron Marcuse-Kubitza

maps.py: join_combinable(): Fixed roots_combinable() to run on col names instead of roots, which were passed in. merge_mappings(): Factored out mapping column combining into merge_mapping_cols(), which handles an optional prefer param as well to take the header_num env var. Added merge_headers().

1767 04/03/2012 03:17 PM Aaron Marcuse-Kubitza

util.py: Added sort_by_len(), shortest(), longest()

1766 04/03/2012 02:12 PM Aaron Marcuse-Kubitza

join: Use new maps.join_combinable() to check if column names match

1765 04/03/2012 02:11 PM Aaron Marcuse-Kubitza

maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.

1764 04/03/2012 01:13 PM Aaron Marcuse-Kubitza

xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream

1763 04/03/2012 01:11 PM Aaron Marcuse-Kubitza

xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser

1762 04/03/2012 01:10 PM Aaron Marcuse-Kubitza

streams.py: FilterStream: Forward all reads to readline()

1761 04/03/2012 01:08 PM Aaron Marcuse-Kubitza

strings.py: Added is_ctrl() and strip_ctrl()

1760 04/03/2012 08:34 AM Aaron Marcuse-Kubitza

xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted

1759 04/03/2012 08:33 AM Aaron Marcuse-Kubitza

streams.py: Added consume(). Added documentation labels to each section.

1758 04/03/2012 08:23 AM Aaron Marcuse-Kubitza

bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions

1757 04/03/2012 08:21 AM Aaron Marcuse-Kubitza

xml_parse.py: Added on_error() handler to parse_next() (passed through by docs_iter()), so that the caller can add useful info like the input line # to the exception message, and decide not to suppress rather than re-raising the exception

1756 04/03/2012 07:19 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field identificationLabel2 to identificationLabel. Distinguish what are now two identificationLabel fields of the same name by tagging each one with [@id=2] or [@id=1]. inputs/SALVIAS-CSV/maps/VegX.organisms.csv: Merge tag1/stem_tag1 and tag2/stem_tag2 using _alt, since they are never set to different values when both are not NULL (although sometimes just one or just the other is not NULL).

1755 04/02/2012 05:37 PM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field tag2 to identificationLabel2 to reflect that it will become a second instance of identificationLabel

1754 04/02/2012 05:31 PM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field lineCover to already existing volumeCanopy

1753 04/02/2012 05:29 PM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field cover to already existing attribute.coverPercent

1752 04/02/2012 05:13 PM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field count to already existing aggregateOrganismObservation.aggregateValue

1751 04/02/2012 04:44 PM Aaron Marcuse-Kubitza

vegbien.ERD.mwb: Fixed lines

1750 04/02/2012 01:50 PM Aaron Marcuse-Kubitza

README.TXT: Documented that `make reinstall_db` will delete your VegBIEN DB

1749 04/02/2012 01:48 PM Aaron Marcuse-Kubitza

README.TXT: Documented that `make empty_db` will delete your VegBIEN DB

1748 04/02/2012 01:44 PM Aaron Marcuse-Kubitza

root Makefile: empty_db: Confirm deletion just like for rm_db. rm_db: put $(confirmRmDb) on a separate line and move the $(error) call to the main $(confirm) macro since you always want to abort make if the user cancels (not just not run that command).

1747 04/02/2012 01:34 PM Aaron Marcuse-Kubitza

root Makefile: rm_db: If user cancels, abort in case target was reinstall_db to prevent installing

1746 04/02/2012 01:28 PM Aaron Marcuse-Kubitza

root Makefile: core, rm_core: Fixed bug where no longer existing prerequisites postgres_user, rm_postgres_user were not removed

1745 04/02/2012 01:25 PM Aaron Marcuse-Kubitza

root Makefile: rm_db: Confirm deletion with user. Merged postgres_user, rm_postgres_user into db, rm_db so that deletion confirmation applies to user deletion as well (which would indirectly cause the DB to be deleted).

1744 04/02/2012 01:04 PM Aaron Marcuse-Kubitza

README.TXT: Testing: Updated to add missing mappings

1743 04/02/2012 01:03 PM Aaron Marcuse-Kubitza

root Makefile: test-all: Added missing_mappings

1742 04/02/2012 01:00 PM Aaron Marcuse-Kubitza

Moved maps validation targets from main Makefile to input.Makefile. main Makefile: maps validation: Summarize the output of the inputs' maps validations.

1741 04/02/2012 12:22 PM Aaron Marcuse-Kubitza

Makefile: Also find missing input mappings, in addition to missing join mappings

1740 04/02/2012 12:21 PM Aaron Marcuse-Kubitza

join: Also produce warnings for no input mapping (if no comment explaining why no input mapping), in addition to no join mapping

1739 04/02/2012 12:21 PM Aaron Marcuse-Kubitza

join: Also produce warnings for no input mapping (if no comment explaining why no input mapping), in addition to no join mapping

1738 04/02/2012 12:20 PM Aaron Marcuse-Kubitza

inputs/NY/maps/DwC.specimens.csv: Documented why there is no input mapping for key

1737 04/02/2012 11:29 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined fields stem* to remove the stem* prefix to be consistent with VegBIEN

1736 04/02/2012 11:23 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation/plotObservation user-defined fields sourceaccessioncode to sourceAccessionCode to be consistent with VegX case sensitivity

1735 04/02/2012 11:19 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field interceptCm to lineCover to be consistent with VegBIEN

1734 04/02/2012 11:18 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field individualCode to authorPlantCode to be consistent with VegBIEN

1733 04/02/2012 11:17 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field htFirstBranchM to heightFirstBranch to be consistent with VegBIEN

1732 04/02/2012 11:15 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field coverPercent to cover to be consistent with VegBIEN

1731 04/02/2012 11:12 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field siltPercent to silt to be consistent with VegBIEN

1730 04/02/2012 11:11 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field sandPercent to sand to be consistent with VegBIEN

1729 04/02/2012 11:10 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field pottasium to potassium to be consistent with VegBIEN

1728 04/02/2012 11:08 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field organicPercent to organic to be consistent with VegBIEN

1727 04/02/2012 11:07 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field clayPercent to clay to be consistent with VegBIEN

1726 04/02/2012 11:06 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field cationCap to cationExchangeCapacity to be consistent with VegBIEN

1725 04/02/2012 11:02 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Renamed plotObservation user-defined field precipMm to precipitation to be consistent with VegBIEN

1724 04/02/2012 10:56 AM Aaron Marcuse-Kubitza

VegX-VegBIEN.organisms.csv: Changed plotObservation user-defined field plotMethodology to /simpleUserdefined[name=method]/*ID/method/name

1723 04/02/2012 09:47 AM Aaron Marcuse-Kubitza

schemas/postgresql.nimoy.conf: Increased default_statistics_target to 8.4 default value to improve execution query plans

1722 04/02/2012 09:43 AM Aaron Marcuse-Kubitza

Added schemas/postgresql.Mac.conf (for tuning developers' local testing DBs)

1721 04/02/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/postgresql*.conf: Increased checkpoint_segments and checkpoint_completion_target so that checkpoints (performance intensive) are written less often and load-balanced better

1720 04/02/2012 08:35 AM Aaron Marcuse-Kubitza

xml_dom.py: Don't print whitespace from parsed XML document when pretty-printing XML. minidom modifications section: Added subsection labels for the class each modification applies to.

1719 04/02/2012 08:20 AM Aaron Marcuse-Kubitza

Parser.py: Renamed SyntaxException to SyntaxError because it's an unexpected condition that should exit the program, a.k.a. an error

1718 04/02/2012 08:05 AM Aaron Marcuse-Kubitza

bin/map: process_rows(): When iterating over each row, only retrieve the next row if the end (limit of # of rows) has not been reached. This prevents the next row from being fetched, possibly causing an entire additional consecutive XML document to be parsed, if the limit has already been reached. This is primarily useful for XML inputs with a ".0.top" segment prepended before the other documents, which contains just the first two nodes for fast parsing of this smaller XML document when only the first two nodes are needed for testing. Without this fix, the ".0.top" segment would have needed to contain the first three nodes instead.

1717 04/02/2012 07:55 AM Aaron Marcuse-Kubitza

inputs/XAL: Accepted initial test outputs

1716 04/02/2012 07:54 AM Aaron Marcuse-Kubitza

inputs/XAL: Added maps

1715 04/02/2012 07:52 AM Aaron Marcuse-Kubitza

bin/map: Extended consecutive XML document support to direct-XML inputs (without a map spreadsheet). Factored out consecutive XML document row-iteration code into helper method get_rows() which does the iters.flatten() and itertools.imap() calls.

1714 04/02/2012 07:37 AM Aaron Marcuse-Kubitza

bin/map: Fixed bug in iteration over consecutive XML documents where only the first element of the first document was processed. Use of iters.flatten() and itertools.imap() fixes this problem so that the consecutive XML documents are regarded as a continuous stream of rows.

1713 04/02/2012 07:16 AM Aaron Marcuse-Kubitza

bin/map: Use new xml_parse.docs_iter() to iterate over each consecutive XML document in stdin

1712 04/02/2012 07:16 AM Aaron Marcuse-Kubitza

xml_parse.py: Added support for parsing consecutive XML documents in a stream

1711 04/02/2012 07:01 AM Aaron Marcuse-Kubitza

Added iters.py

1710 03/29/2012 10:33 PM Aaron Marcuse-Kubitza

streams.py: Added FilterStream. Changed TracedStream to use FilterStream.

1709 03/29/2012 10:24 PM Aaron Marcuse-Kubitza

Moved parse_str() from xml_dom.py to xml_parse.py

1708 03/29/2012 10:24 PM Aaron Marcuse-Kubitza

Added xml_parse.py

1707 03/29/2012 10:21 PM Aaron Marcuse-Kubitza

streams.py: CaptureStream: Ignore start_str when recording and end_str when not recording

1706 03/29/2012 10:13 PM Aaron Marcuse-Kubitza

streams.py: CaptureStream: Get each match as a separate array elem instead of concatenated together

1705 03/29/2012 09:59 PM Aaron Marcuse-Kubitza

ch_root, repl, map: Use new maps.col_info() instead of parsing col name manually. This allows maps with prefixes containing ":" to be supported, without the ":" being misinterpreted as the label-root separator.

1704 03/29/2012 09:57 PM Aaron Marcuse-Kubitza

maps.py: Added col_info() to get label, root, prefixes from col_name. Added col_formats() for use by combinable(). Use new col_formats() in combinable(). Removed no longer needed col_label().

1703 03/29/2012 09:55 PM Aaron Marcuse-Kubitza

input.Makefile: Use with_cat instead of with_cat_csv for XML sources

1702 03/29/2012 09:54 PM Aaron Marcuse-Kubitza

Renamed inputs/XAL/src/digir.xml.make to digir.specimens.xml.make so it would generate an output file with the proper table name

1701 03/29/2012 08:53 PM Aaron Marcuse-Kubitza

bin/map: Support concatenated XML documents for XML inputs

1700 03/29/2012 08:46 PM Aaron Marcuse-Kubitza

bin/map: Merged XML inputs with and without a map into the in_is_xml section

1699 03/29/2012 08:33 PM Aaron Marcuse-Kubitza

digir_client: Output profiling information

1698 03/29/2012 08:21 PM Aaron Marcuse-Kubitza

Added inputs/XAL/src/digir.xml.make

1697 03/29/2012 08:21 PM Aaron Marcuse-Kubitza

digir_client: Import http to take advantage of httplib modifications to deal with IncompleteRead errors

1696 03/29/2012 08:20 PM Aaron Marcuse-Kubitza

Added http.py with httplib modifications to deal with IncompleteRead errors

1695 03/29/2012 07:46 PM Aaron Marcuse-Kubitza

digir_client: Fixed bug where chunk size was being adjusted even if count == None (indicating no determinable last chunk), causing a type mismatch between None and the integer total

1694 03/29/2012 07:28 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed "ifneq ($(wildcard test/),)" guard around Testing section because all inputs now have a test subdir

1693 03/29/2012 07:25 PM Aaron Marcuse-Kubitza

Added inputs/XAL

1692 03/29/2012 07:22 PM Aaron Marcuse-Kubitza

digir_client: Made chunk_size a configurable env var. Removed schema env var because schema is always the same for DiGIR (can be different for TAPIR). Make sure output ends in a newline so that consecutive XML documents are on different lines.

1691 03/29/2012 07:13 PM Aaron Marcuse-Kubitza

digir_client: Fixed bug where chunk_size records would always be retrieved even in the last chunk, which ignored any manual count the user might have set via the "n" option

1690 03/29/2012 07:07 PM Aaron Marcuse-Kubitza

digir_client: Repeatedly retrieve data in chunks. Provide match count. Added section comments.

1689 03/29/2012 06:52 PM Aaron Marcuse-Kubitza

xpath.py: Added get_value() to run get_1() and returns the value of any result node

1688 03/29/2012 06:51 PM Aaron Marcuse-Kubitza

xml_dom.py: Added parse_str()

1687 03/29/2012 06:13 PM Aaron Marcuse-Kubitza

digir_client: Use new streams.copy() to copy returned data to stdout

1686 03/29/2012 06:13 PM Aaron Marcuse-Kubitza

streams.py: Added copy(). Added section comment for traced streams.

1685 03/29/2012 06:06 PM Aaron Marcuse-Kubitza

digir_client: Label debugging output