Project

General

Profile

Statistics
| Revision:

# Date Author Comment
8220 03/27/2013 11:43 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Removed no longer applicable comment that directional joins are needed for PostgreSQL query planner to avoid slow sorts

8219 03/27/2013 11:40 PM Aaron Marcuse-Kubitza

inputs/FIA/TREE/import: Reclustered table by TREE.parent path index, to facilitate path-order joins

8218 03/27/2013 11:39 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Changed all RIGHT JOINs to inner joins so that tables would be joined in path order (i.e. general->specific). This optimizes the incremental joins so that the small tables are joined to each other before being joined to the large tables, rather than each row of the large tables being looked up in the small tables. This effect may not be noticeable for small LIMIT values, but would become apparent for large LIMIT values, such as the 1-million-row partitions used by db_xml.put_table() for column-based import. Note that inner joins used to cause the query planner to produce incorrect results containing slow sorts, but now this appears to no longer be an issue, perhaps because the result is not sorted by the TREE.ID index (which is not in the same order as the path indexes *.unique, *.parent).

8217 03/27/2013 10:46 PM Aaron Marcuse-Kubitza

inputs/FIA/occurrence_all/import: Removed trailing whitespace

8216 03/27/2013 10:30 PM Aaron Marcuse-Kubitza

Removed unused inputs/FIA/COND_unique/. Use COND instead.

8215 03/27/2013 09:52 PM Aaron Marcuse-Kubitza

inputs/FIA/import: Use `set -o errexit` instead of putting ` || exit` after each command

8214 03/27/2013 09:52 PM Aaron Marcuse-Kubitza

lib/import.sh: map_table(): Removed unneeded () around psql. This also fixes a bug where an error exit status from psql would not have aborted the script because `set -o errexit` does not apply to commands enclosed in (). For () you need to use ` || exit` instead (or ` || return` inside a function).

8213 03/27/2013 09:42 PM Aaron Marcuse-Kubitza

lib/import.sh: Use `set -o errexit` so any command that exits with an error aborts the script. Note that a command's exit status can still be ignored using ` || true`. Removed no longer needed ` || return` in functions.

8212 03/27/2013 09:40 PM Aaron Marcuse-Kubitza

schemas/util.sql: Renamed rename_if_exists() to try_create() because it can be used to create a column in any way, not just by renaming another column

8211 03/27/2013 09:33 PM Aaron Marcuse-Kubitza

lib/import.sh: functions: abort if a command encounters an error

8210 03/27/2013 09:17 PM Aaron Marcuse-Kubitza

schemas/VegCore/mk_derived: Added cultivated from oldGrowth

8209 03/27/2013 09:16 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added try_mk_derived_col()

8208 03/27/2013 08:35 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Run mk_derived after postprocessing commands

8207 03/27/2013 08:28 PM Aaron Marcuse-Kubitza

inputs/FIA/import_order.txt: Added occurrence_all/

8206 03/27/2013 08:23 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided

8205 03/27/2013 08:15 PM Aaron Marcuse-Kubitza

Added inputs/FIA/map.csv, which maps shared columns to VegCore

8204 03/27/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/FIA_COND_unique/test.xml.ref: Updated now that PLOT, CONDID have been mapped

8203 03/27/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv for pre-refresh tables: Added back * before unmapped column names

8202 03/27/2013 08:03 PM Aaron Marcuse-Kubitza

lib/csvs.py: stream_info(): Fixed bug where headers with multiline columns were not supported because only the first line (not the first multiline row) is sniffed for the dialect

8201 03/27/2013 06:56 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/header.csv: Fixed bug where newlines inside column names were incorrectly formatted by psql's table header formatting, by using COPY TO STDOUT instead

8200 03/27/2013 05:28 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added do_optionally_ignore()

8199 03/27/2013 04:28 PM Aaron Marcuse-Kubitza

schemas/util.sql: Renamed rename_if_exists() to try_create() because it can be used to create a column in any way, not just by renaming another column

8198 03/27/2013 04:12 PM Aaron Marcuse-Kubitza

lib/import.sh: Added mk_derived(). Added mk_derived to usage template.

8197 03/27/2013 04:11 PM Aaron Marcuse-Kubitza

Added schemas/VegCore/mk_derived, which will be run in the import scripts

8196 03/27/2013 04:09 PM Aaron Marcuse-Kubitza

lib/import.sh: psql(): Set psql vars :schema, :table, :table_str for use by the psql commands

8195 03/27/2013 03:22 PM Aaron Marcuse-Kubitza

lib/import.sh: Export $schema, $table so they are available to programs invoked within an import script, which should not reset these vars if they include import.sh

8194 03/27/2013 03:20 PM Aaron Marcuse-Kubitza

lib/import.sh: Only set $table, $schema if they don't already exist

8193 03/27/2013 03:11 PM Aaron Marcuse-Kubitza

lib/import.sh: Added $root_dir and use it in $bin_dir

8192 03/27/2013 03:11 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Use new mk_*_col()

8191 03/27/2013 02:50 PM Aaron Marcuse-Kubitza

schemas/*functions.sql: Renamed to *util.sql because now that these schemas are used by the new-style import scripts, there can be more than just functions in them

8190 03/27/2013 02:43 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added mk_const_col()

8189 03/27/2013 02:37 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added type_qual()

8188 03/27/2013 02:34 PM Aaron Marcuse-Kubitza

schemas/util.sql: mk_derived_col(): Added "idempotent" comment

8187 03/27/2013 02:23 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added mk_derived_col()

8186 03/27/2013 02:22 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/import: oldGrowth: Updated expr column names

8185 03/27/2013 01:49 PM Aaron Marcuse-Kubitza

schemas/util.sql: Added typeof(text, regtype)

8184 03/27/2013 12:54 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Removed util. before function names because util is in the search_path

8183 03/27/2013 12:43 PM Aaron Marcuse-Kubitza

schemas/*functions.sql: Renamed to *util.sql because now that these schemas are used by the new-style import scripts, there can be more than just functions in them

8182 03/25/2013 11:19 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added existing_cols()

8181 03/25/2013 11:12 PM Aaron Marcuse-Kubitza

schemas/functions.sql: col_type(): Fixed bug where a NULL col name crashed the undefined_column throw, because MESSAGE can't be NULL and the NULL name was nulling out the entire message

8180 03/25/2013 11:08 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_exists()

8179 03/25/2013 10:31 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/map.csv: Mapped SLOPE, ASPECT

8178 03/25/2013 10:23 PM Aaron Marcuse-Kubitza

web/main/.htaccess: remove linewraps (of the form table.path.vg/_-term) used to create a newline for Google spreadsheets

8177 03/25/2013 09:45 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Replaced . between table and column name with newline, so that table viewers like pgAdmin will display both the table and column name at the left edge of the header cell, rather than displaying only the table name because the column name doesn't fit. This fixes the problem of seeing a bunch of columns whose names all start with a table name, and not knowing what each of them is. It also preserves the ability to see at a glance which table a column is in, which helps in navigating wide tables. Removed * before unmapped terms, because whether a term is mapped is generally obvious from the table name itself.

8176 03/25/2013 09:01 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)

8175 03/25/2013 08:45 PM Aaron Marcuse-Kubitza

Added bin/fix_line_endings to standardize \r\n line endings to \n

8174 03/25/2013 08:12 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/import: Renamed COND.oldgrowth to VegCore name oldGrowth

8173 03/25/2013 07:52 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Ensured that joined columns are globally unique, so they don't map to an ambiguous VegCore term in the future

8172 03/25/2013 07:38 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Mapped terms to VegCore

8171 03/25/2013 07:22 PM Aaron Marcuse-Kubitza

schemas/functions.sql: col_type(): Include column name in error message

8170 03/25/2013 06:57 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Updated column names to match map.csv

8169 03/25/2013 06:47 PM Aaron Marcuse-Kubitza

schemas/functions.sql: col_type(): Raise undefined_column exception if column does not exist, instead of silently returning NULL

8168 03/25/2013 06:34 PM Aaron Marcuse-Kubitza

inputs/FIA/import: Abort if any invoked script encounters an error

8167 03/25/2013 05:44 PM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: Updated for current progress

8166 03/25/2013 04:55 PM Aaron Marcuse-Kubitza

inputs/FIA/*/map.csv: Removed no longer needed leading . from joined fields (globally-unique terms), because functions.to_global_col_names() is not used anymore

8165 03/25/2013 04:46 PM Aaron Marcuse-Kubitza

Added inputs/FIA/occurrence_all/, which combines all the core tables in a denormalized view. Note that it is not necessary to materialize this view into a (large) denormalized table, because the unique indexes and left/right joins allow the rows to be denormalized on the fly.

8164 03/25/2013 04:36 PM Aaron Marcuse-Kubitza

inputs/FIA/*/import: Use map_table to set column names based on the contents of map.csv, instead of using functions.to_global_col_names() and functions.rename_if_exists(). Added map.csv for all tables.

8163 03/25/2013 03:19 PM Aaron Marcuse-Kubitza

inputs/FIA/: Changed postprocess.sql scripts to import scripts that can be run directly. Added top-level inputs/FIA/import to run all of them together.

8162 03/25/2013 03:05 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Removed trailing whitespace

8161 03/25/2013 02:25 PM Aaron Marcuse-Kubitza

Added lib/import.sh, for use by new, simpler import scripts used by FIA. Note that for now, input.Makefile is still used to create map.csv.

8160 03/22/2013 11:13 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Moved postprocess.sql from $(exportHeader) to %/install because that is not part of the $(exportHeader) functionality. Added %/header.csv and use it in $(exportHeader).

8159 03/22/2013 11:05 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(catSrcs): Fixed bug where need to use $(nonHeaderSrcs) instead of $(srcs) to exclude header.csv

8158 03/22/2013 08:07 PM Aaron Marcuse-Kubitza

schemas/functions.sql: map: Added additional columns that are present in the standard map spreadsheet format (filter, notes). These columns are necessary to make COPY FROM work, because it requires the # of columns to be the same in the input data and the output table.

8157 03/22/2013 07:39 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Moved $(cleanup) from $(exportHeader) to %/install because this is not part of exportHeader's functionality

8156 03/22/2013 07:29 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(mkSrcMap): Use header.csv instead of the header of the CSVs, so that the column list in the map spreadsheet matches the actual DB table

8155 03/22/2013 07:18 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %.sql/run: Change to the directory the file is located in, so that includes (\i) are relative to the file, rather than relative to whatever happens to be the current directory

8154 03/22/2013 07:15 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/install: Always generate a header.csv, even for CSV inputs with their own header. This will include the actual column names in the staging table, which may differ from their names in the CSVs (e.g. the addition of row_num). Note that header.csv is not included in the CSVs list itself, and will not override the header or dialect in them.

8153 03/22/2013 06:09 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added set_col_names()

8152 03/22/2013 06:08 PM Aaron Marcuse-Kubitza

schemas/functions.sql: rename_if_exists(): Also ignore duplicate_column exceptions, which are generated when a column is renamed to itself (as well as when two columns are renamed to the same place)

8151 03/22/2013 06:02 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_names(regclass), which unlike col_names(regtype) returns names in the order they are in the table

8150 03/22/2013 04:56 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added map_values()

8149 03/22/2013 02:50 PM Aaron Marcuse-Kubitza

schemas/functions.sql: map_get(): Fixed bug where can't use STRICT in EXECUTE INTO because there will sometimes be no match, causing a "query returned no rows" error

8148 03/22/2013 02:33 PM Aaron Marcuse-Kubitza

schemas/functions.sql: rename_cols(): Support any renames type with an -> operator

8147 03/22/2013 02:27 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added operator ->(regclass, text)

8146 03/22/2013 01:49 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added map_get()

8145 03/22/2013 01:38 PM Aaron Marcuse-Kubitza

schemas/functions.sql: table2hstore(): Made it STABLE instead of IMMUTABLE because the input table is not constant

8144 03/22/2013 01:36 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added table2hstore()

8143 03/22/2013 01:34 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added reset_map_table()

8142 03/22/2013 12:51 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added truncate()

8141 03/22/2013 12:28 PM Aaron Marcuse-Kubitza

schemas/functions.sql: mk_map_table(): Use the sql language instead of plpgsql because EXECUTE is not used directly, so plpgsql is not actually needed

8140 03/22/2013 12:25 PM Aaron Marcuse-Kubitza

schemas/functions.sql: mk_map_table(): Store map table schema in separate `map` table and extend it using LIKE, for easier maintainability of the map schema

8139 03/21/2013 08:27 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added mk_map_table()

8138 03/21/2013 07:54 AM Aaron Marcuse-Kubitza

schemas/functions.sql: ensure_prefix(): Made it IMMUTABLE instead of STABLE

8137 03/21/2013 07:37 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added rename_cols()

8136 03/21/2013 06:01 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Avoid using :table, :table_str so that the commands in the script can also be run by pasting them into pgAdmin

8135 03/21/2013 02:13 AM Aaron Marcuse-Kubitza

README.TXT: Full database import: Manual steps to run TNRS/remake analytical DB: Added `export version=<version>` to ensure that the import is run into the correct schema. Since these instructions are for running commands separately from the rest of the import, it's important to first ensure that the import environment is set up properly.

8134 03/21/2013 02:08 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Added taxon_trait to ERD

8133 03/21/2013 02:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Regenerated exports

8132 03/21/2013 01:58 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed unused analytical_aggregate table, because analytical_stem provides much more detailed, higher-quality data, both in terms of the number or of rows and the number of columns. analytical_aggregate has also long been out of sync with the analytical DB schema, and it doesn't make sense to spend processing time in make_analytical_db to perform the DISTINCT ON if the table isn't being used. We may revisit analytical_aggregate later once we have ID fields for each entity in the DISTINCT ON and can avoid DISTINCTing on all analytical_aggregate columns.

8131 03/21/2013 01:53 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed unused analytical_aggregate table, because analytical_stem provides much more detailed, higher-quality data, both in terms of the number or of rows and the number of columns. analytical_aggregate has also long been out of sync with the analytical DB schema, and it doesn't make sense to spend processing time in make_analytical_db to perform the DISTINCT ON if the table isn't being used. We may revisit analytical_aggregate later once we have ID fields for each entity in the DISTINCT ON and can avoid DISTINCTing on all analytical_aggregate columns.

8130 03/21/2013 01:27 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Added index on *.CN (autogen IDs)

8129 03/21/2013 01:25 AM Aaron Marcuse-Kubitza

README.TXT: Full database import: Added steps to use `screen` to allow recovering from a closed terminal window

8128 03/21/2013 01:08 AM Aaron Marcuse-Kubitza

inputs/FIA/TREE/postprocess.sql: TREE.unique index: Renamed to TREE.ID because this is on an autogenerated pkey rather than on domain values (for which a set of unique columns has not yet been found and may not exist)

8127 03/21/2013 01:03 AM Aaron Marcuse-Kubitza

inputs/FIA/REF_SPECIES/postprocess.sql: Matched SPECIES_SYMBOL to .SYMBOL. Added .SYMBOL_TYPE for use in joining to REF_PLANT_DICTIONARY.

8126 03/21/2013 12:41 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_UNIT/postprocess.sql

8125 03/21/2013 12:36 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_RESEARCH_STATION/postprocess.sql

8124 03/21/2013 12:17 AM Aaron Marcuse-Kubitza

Added inputs/FIA/COUNTY/postprocess.sql

8123 03/21/2013 12:02 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_PLANT_DICTIONARY/postprocess.sql

8122 03/20/2013 05:16 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Matched COND.HABTYPCD1, COND.HABTYPCD1_PUB_CD to REF_HABTYP_DESCRIPTION

8121 03/20/2013 05:07 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: $(exportHeader): Fixed bug where need to run postprocess.sql before exporting the header, because it can change the column names