Project

General

Profile

« Previous | Next » 

Revision 4649

canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header

View differences:

bin/canon
1 1
#!/usr/bin/env python
2 2
# Canonicalizes a spreadsheet column to a vocabulary.
3
# The column header is also canonicalized. CSVs without a header are supported.
3 4
# Unrecognized names are left untouched, permitting successive runs on different
4 5
# vocabularies.
5 6
# Case- and punctuation-insensitive.
......
20 21
    dict_ = {}
21 22
    stream = open(vocab_path, 'rb')
22 23
    reader = csv.reader(stream)
23
    reader.next() # skip header
24 24
    for row in reader: dict_[simplify(row[0])] = row[0]
25 25
    stream.close()
26 26
    
27 27
    # Canonicalize input
28 28
    reader = csv.reader(sys.stdin)
29 29
    writer = csv.writer(sys.stdout)
30
    writer.writerow(reader.next()) # pass through header
31 30
    for row in reader:
32 31
        term = simplify(row[col_num])
33 32
        try: row[col_num] = dict_[term]

Also available in: Unified diff