Project

General

Profile

« Previous | Next » 

Revision 4648

filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.

View differences:

filter_out_ci
1 1
#!/usr/bin/env python
2 2
# Finds spreadsheet rows where a column is not in a vocabulary.
3
# The vocabulary should not have a header. CSVs without a header are supported.
3 4
# Case- and punctuation-insensitive.
4 5

  
5 6
import csv
......
18 19
    vocab = set()
19 20
    stream = open(vocab_path, 'rb')
20 21
    reader = csv.reader(stream)
21
    reader.next() # skip header
22 22
    for row in reader: vocab.add(simplify(row[0]))
23 23
    stream.close()
24 24
    
25 25
    # Filter input
26 26
    reader = csv.reader(sys.stdin)
27 27
    writer = csv.writer(sys.stdout)
28
    writer.writerow(reader.next()) # pass through header
29 28
    for row in reader:
30 29
        term = simplify(row[col_num])
31 30
        if term not in vocab: writer.writerow(row)

Also available in: Unified diff