/bin/filter_out_ci - Diff - BIEN 3 - NCEAS Projects

« Previous | Next »

Revision 4648

Added by Aaron Marcuse-Kubitza almost 12 years ago

filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.

     #!/usr/bin/env python
     # Finds spreadsheet rows where a column is not in a vocabulary.
     # The vocabulary should not have a header. CSVs without a header are supported.
     # Case- and punctuation-insensitive.
     import csv
-...
         vocab = set()
         stream = open(vocab_path, 'rb')
         reader = csv.reader(stream)
         reader.next() # skip header
         for row in reader: vocab.add(simplify(row[0]))
         stream.close()
         # Filter input
         reader = csv.reader(sys.stdin)
         writer = csv.writer(sys.stdout)
         writer.writerow(reader.next()) # pass through header
         for row in reader:
             term = simplify(row[col_num])
             if term not in vocab: writer.writerow(row)

Also available in: Unified diff

Project

General

Profile

Revision 4648

Added by Aaron Marcuse-Kubitza almost 12 years ago