Project

General

Profile

Statistics
| Revision:

# Date Author Comment
8163 03/25/2013 03:19 PM Aaron Marcuse-Kubitza

inputs/FIA/: Changed postprocess.sql scripts to import scripts that can be run directly. Added top-level inputs/FIA/import to run all of them together.

8162 03/25/2013 03:05 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Removed trailing whitespace

8161 03/25/2013 02:25 PM Aaron Marcuse-Kubitza

Added lib/import.sh, for use by new, simpler import scripts used by FIA. Note that for now, input.Makefile is still used to create map.csv.

8160 03/22/2013 11:13 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Moved postprocess.sql from $(exportHeader) to %/install because that is not part of the $(exportHeader) functionality. Added %/header.csv and use it in $(exportHeader).

8159 03/22/2013 11:05 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(catSrcs): Fixed bug where need to use $(nonHeaderSrcs) instead of $(srcs) to exclude header.csv

8158 03/22/2013 08:07 PM Aaron Marcuse-Kubitza

schemas/functions.sql: map: Added additional columns that are present in the standard map spreadsheet format (filter, notes). These columns are necessary to make COPY FROM work, because it requires the # of columns to be the same in the input data and the output table.

8157 03/22/2013 07:39 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Moved $(cleanup) from $(exportHeader) to %/install because this is not part of exportHeader's functionality

8156 03/22/2013 07:29 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(mkSrcMap): Use header.csv instead of the header of the CSVs, so that the column list in the map spreadsheet matches the actual DB table

8155 03/22/2013 07:18 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %.sql/run: Change to the directory the file is located in, so that includes (\i) are relative to the file, rather than relative to whatever happens to be the current directory

8154 03/22/2013 07:15 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/install: Always generate a header.csv, even for CSV inputs with their own header. This will include the actual column names in the staging table, which may differ from their names in the CSVs (e.g. the addition of row_num). Note that header.csv is not included in the CSVs list itself, and will not override the header or dialect in them.

8153 03/22/2013 06:09 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added set_col_names()

8152 03/22/2013 06:08 PM Aaron Marcuse-Kubitza

schemas/functions.sql: rename_if_exists(): Also ignore duplicate_column exceptions, which are generated when a column is renamed to itself (as well as when two columns are renamed to the same place)

8151 03/22/2013 06:02 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_names(regclass), which unlike col_names(regtype) returns names in the order they are in the table

8150 03/22/2013 04:56 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added map_values()

8149 03/22/2013 02:50 PM Aaron Marcuse-Kubitza

schemas/functions.sql: map_get(): Fixed bug where can't use STRICT in EXECUTE INTO because there will sometimes be no match, causing a "query returned no rows" error

8148 03/22/2013 02:33 PM Aaron Marcuse-Kubitza

schemas/functions.sql: rename_cols(): Support any renames type with an -> operator

8147 03/22/2013 02:27 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added operator ->(regclass, text)

8146 03/22/2013 01:49 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added map_get()

8145 03/22/2013 01:38 PM Aaron Marcuse-Kubitza

schemas/functions.sql: table2hstore(): Made it STABLE instead of IMMUTABLE because the input table is not constant

8144 03/22/2013 01:36 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added table2hstore()

8143 03/22/2013 01:34 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added reset_map_table()

8142 03/22/2013 12:51 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added truncate()

8141 03/22/2013 12:28 PM Aaron Marcuse-Kubitza

schemas/functions.sql: mk_map_table(): Use the sql language instead of plpgsql because EXECUTE is not used directly, so plpgsql is not actually needed

8140 03/22/2013 12:25 PM Aaron Marcuse-Kubitza

schemas/functions.sql: mk_map_table(): Store map table schema in separate `map` table and extend it using LIKE, for easier maintainability of the map schema

8139 03/21/2013 08:27 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added mk_map_table()

8138 03/21/2013 07:54 AM Aaron Marcuse-Kubitza

schemas/functions.sql: ensure_prefix(): Made it IMMUTABLE instead of STABLE

8137 03/21/2013 07:37 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added rename_cols()

8136 03/21/2013 06:01 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Avoid using :table, :table_str so that the commands in the script can also be run by pasting them into pgAdmin

8135 03/21/2013 02:13 AM Aaron Marcuse-Kubitza

README.TXT: Full database import: Manual steps to run TNRS/remake analytical DB: Added `export version=<version>` to ensure that the import is run into the correct schema. Since these instructions are for running commands separately from the rest of the import, it's important to first ensure that the import environment is set up properly.

8134 03/21/2013 02:08 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Added taxon_trait to ERD

8133 03/21/2013 02:04 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Regenerated exports

8132 03/21/2013 01:58 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed unused analytical_aggregate table, because analytical_stem provides much more detailed, higher-quality data, both in terms of the number or of rows and the number of columns. analytical_aggregate has also long been out of sync with the analytical DB schema, and it doesn't make sense to spend processing time in make_analytical_db to perform the DISTINCT ON if the table isn't being used. We may revisit analytical_aggregate later once we have ID fields for each entity in the DISTINCT ON and can avoid DISTINCTing on all analytical_aggregate columns.

8131 03/21/2013 01:53 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Removed unused analytical_aggregate table, because analytical_stem provides much more detailed, higher-quality data, both in terms of the number or of rows and the number of columns. analytical_aggregate has also long been out of sync with the analytical DB schema, and it doesn't make sense to spend processing time in make_analytical_db to perform the DISTINCT ON if the table isn't being used. We may revisit analytical_aggregate later once we have ID fields for each entity in the DISTINCT ON and can avoid DISTINCTing on all analytical_aggregate columns.

8130 03/21/2013 01:27 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Added index on *.CN (autogen IDs)

8129 03/21/2013 01:25 AM Aaron Marcuse-Kubitza

README.TXT: Full database import: Added steps to use `screen` to allow recovering from a closed terminal window

8128 03/21/2013 01:08 AM Aaron Marcuse-Kubitza

inputs/FIA/TREE/postprocess.sql: TREE.unique index: Renamed to TREE.ID because this is on an autogenerated pkey rather than on domain values (for which a set of unique columns has not yet been found and may not exist)

8127 03/21/2013 01:03 AM Aaron Marcuse-Kubitza

inputs/FIA/REF_SPECIES/postprocess.sql: Matched SPECIES_SYMBOL to .SYMBOL. Added .SYMBOL_TYPE for use in joining to REF_PLANT_DICTIONARY.

8126 03/21/2013 12:41 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_UNIT/postprocess.sql

8125 03/21/2013 12:36 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_RESEARCH_STATION/postprocess.sql

8124 03/21/2013 12:17 AM Aaron Marcuse-Kubitza

Added inputs/FIA/COUNTY/postprocess.sql

8123 03/21/2013 12:02 AM Aaron Marcuse-Kubitza

Added inputs/FIA/REF_PLANT_DICTIONARY/postprocess.sql

8122 03/20/2013 05:16 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Matched COND.HABTYPCD1, COND.HABTYPCD1_PUB_CD to REF_HABTYP_DESCRIPTION

8121 03/20/2013 05:07 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: $(exportHeader): Fixed bug where need to run postprocess.sql before exporting the header, because it can change the column names

8120 03/20/2013 05:02 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: $(exportHeader): export the header before running $(cleanup), because the header is not affected by the data cleanup operations and thus can be generated right away, to allow mapping while the cleanup operations run

8119 03/20/2013 04:30 PM Aaron Marcuse-Kubitza

inputs/FIA/REF_HABTYP_DESCRIPTION/postprocess.sql: Prepare columns for joining with COND

8118 03/20/2013 03:23 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: $(exportHeader): Fixed bug where need to use psql_script_vegbien instead of the psql_verbose_vegbien used by $(psqlAsBien), to avoid echoing commands as part of the exported header

8117 03/20/2013 03:10 PM Aaron Marcuse-Kubitza

Added planning/workflow/(de)normalized_import.mappings.png

8116 03/20/2013 03:04 PM Aaron Marcuse-Kubitza

Added planning/workflow/denormalized_import.png, normalized_import.png

8115 03/20/2013 10:37 AM Aaron Marcuse-Kubitza

web/main/IH/: Added lowercase alias

8114 03/20/2013 10:32 AM Aaron Marcuse-Kubitza

Added web/main/IH/

8113 03/20/2013 10:12 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: Added postprocess target, which runs all the postprocess.sql files

8112 03/20/2013 09:34 AM Aaron Marcuse-Kubitza

inputs/FIA/REF_SPECIES/postprocess.sql: Cast ID column to integer

8111 03/20/2013 08:52 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Cluster tables by their *.unique index for faster joins

8110 03/20/2013 08:51 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Cast ID columns to integer using new functions.set_col_types()

8109 03/20/2013 08:49 AM Aaron Marcuse-Kubitza

bin/psql_verbose_vegbien: Run with client_min_messages = NOTICE to display notices for debugging. This is supposed to be the default, but apparently isn't.

8108 03/20/2013 08:47 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: BIEN commands: $(psqlAsBien): Use psql_verbose_vegbien instead of psql_script_vegbien so that timings and notices are displayed, which is useful for profiling and debugging

8107 03/20/2013 08:32 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_cast and set_col_types()

8106 03/20/2013 07:45 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_ref, col_type()

8105 03/20/2013 06:51 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added cluster_once()

8104 03/20/2013 06:36 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added cluster_index()

8103 03/20/2013 05:55 AM Aaron Marcuse-Kubitza

schemas/functions.sql: create_if_not_exists(): Also handle duplicate_column exceptions

8102 03/20/2013 05:54 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added rename_if_exists()

8101 03/20/2013 05:48 AM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Renamed oldgrowth to COND.oldgrowth so it wouldn't be renamed by to_global_col_names()

8100 03/20/2013 04:28 AM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: Added oldgrowth column as part of the postprocessing instead of as part of the view that left joins the core tables together. This avoids needing to regenerate the oldgrowth field whenever the view is queried or materialized.

8099 03/20/2013 04:01 AM Aaron Marcuse-Kubitza

inputs/FIA/TREE/postprocess.sql: Added index on columns that join to parent tables

8098 03/20/2013 03:00 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Removed table prefix from globally-unique columns that should be joined on

8097 03/20/2013 02:25 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Marked STRICT functions as such

8096 03/20/2013 02:22 AM Aaron Marcuse-Kubitza

schemas/functions.sql: col_global_names(): Treat any column name that contains . as already being globally unique, and don't prepend the table name. This allows renaming the table columns after running col_global_names(), without causing the table name to be re-prepended the next time col_global_names() is run.

8095 03/20/2013 02:09 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added contains()

8094 03/20/2013 02:07 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added create_if_not_exists()

8093 03/20/2013 01:28 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Use functions.to_global_col_names() to ensure that all column names are globally unique. This makes it easy to join the tables together without worrying about column name collisions.

8092 03/20/2013 01:15 AM Aaron Marcuse-Kubitza

inputs/FIA/*/postprocess.sql: Use new functions.create_if_not_exists() to allow re-running postprocess.sql idempotently

8091 03/19/2013 11:48 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: %/install: Use new %.sql/run to run postprocess.sql

8090 03/19/2013 11:47 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: Added %.sql/run to run postprocess.sql, etc. separately from the install targets they are a part of

8089 03/19/2013 11:47 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: Staging tables installation: Added %.sql/run to run postprocess.sql, etc. separately from the install targets they are a part of

8088 03/19/2013 10:43 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added to_global_col_names()

8087 03/19/2013 10:22 PM Aaron Marcuse-Kubitza

schemas/functions.sql: col_global_names(): Use new functions.ensure_prefix() to only add the table name prefix if it doesn't already exist. This makes the function idempotent.

8086 03/19/2013 10:19 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added ensure_prefix()

8085 03/19/2013 10:17 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added has_prefix()

8084 03/19/2013 10:09 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_global_names()

8083 03/19/2013 09:59 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added name(regtype)

8082 03/19/2013 09:43 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added col_names()

8081 03/19/2013 09:27 PM Aaron Marcuse-Kubitza

root Makefile: Installation: Fixed bug where need to run schemas/public/install separately because schemas/install installs only the util schemas

8080 03/19/2013 09:26 PM Aaron Marcuse-Kubitza

root Makefile: Installation: install util schemas (temp functions py_functions) before inputs, so that inputs can use util functions in their postprocess.sql or create.sql scripts. (However, they must not use util functions in views or index functions, because these would be cascadingly deleted whenever the util schemas are reinstalled before an import.)

8079 03/19/2013 08:07 PM Aaron Marcuse-Kubitza

README.TXT: Single datasource import: Added by_col=1 to all commands

8078 03/19/2013 02:28 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: locationRemarks: Remapped to locationnarrative because location.notespublic is a boolean field

8077 03/19/2013 02:05 AM Aaron Marcuse-Kubitza

lib/sql_io.py: mk_errors_table(): Create a unique index on the MD5 of the value and error instead of on the values directly, because some strings are too long to index (e.g. row 2537268 of MO.Specimen causes an error "index row size 3032 exceeds maximum 2712 for index [...] Values larger than 1/3 of a buffer page cannot be indexed")

8076 03/19/2013 12:49 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times

8075 03/16/2013 02:16 PM Aaron Marcuse-Kubitza

bin/map: No mappings warning: Added explanation that this could also be due to no column name matches, and hint to check if you are importing the correct input table

8074 03/16/2013 01:45 PM Aaron Marcuse-Kubitza

inputs/MO/: Renamed Specimen.2/ -> now available Specimen/

8073 03/16/2013 01:42 PM Aaron Marcuse-Kubitza

inputs/MO/: Removed old import in Specimen/

8072 03/16/2013 01:33 PM Aaron Marcuse-Kubitza

Refreshed MO

8071 03/16/2013 12:44 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Fixed bug where empty line needs to be separately returned as [], because csv.reader would interpret it as EOF since the line ending has already been removed

8070 03/16/2013 12:25 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Turn off quoting because TSVs use \-escapes instead of quotes to escape delimeters, newlines, etc.

8069 03/16/2013 11:49 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter.readline(): Surround function in a try block that prints all exceptions, so that debugging information is available if an error occurs when this stream is used as input for psycopg's copy_expert() (COPY FROM)

8068 03/16/2013 06:56 AM Aaron Marcuse-Kubitza

Populated inputs/MO/import_order.txt

8067 03/16/2013 06:46 AM Aaron Marcuse-Kubitza

Refreshed SALVIAS

8066 03/16/2013 06:33 AM Aaron Marcuse-Kubitza

Added web/main/CTFS/

8065 03/16/2013 06:21 AM Aaron Marcuse-Kubitza

inputs/SALVIAS/: Regenerated salvias_*.schema.sql from the MySQL version, to take advantage of my2pg improvements. The placeholder *_index columns which take the place of MySQL's inline index definitions have now been replaced by no-op CHECK constraints, so that there are no longer lots of dummy *_index columns in the map spreadsheets.

8064 03/16/2013 05:52 AM Aaron Marcuse-Kubitza

Added web/main/Redmine/ alias to VegBIEN/Redmine/