Revision 4649
Added by Aaron Marcuse-Kubitza over 12 years ago
bin/canon | ||
---|---|---|
1 | 1 |
#!/usr/bin/env python |
2 | 2 |
# Canonicalizes a spreadsheet column to a vocabulary. |
3 |
# The column header is also canonicalized. CSVs without a header are supported. |
|
3 | 4 |
# Unrecognized names are left untouched, permitting successive runs on different |
4 | 5 |
# vocabularies. |
5 | 6 |
# Case- and punctuation-insensitive. |
... | ... | |
20 | 21 |
dict_ = {} |
21 | 22 |
stream = open(vocab_path, 'rb') |
22 | 23 |
reader = csv.reader(stream) |
23 |
reader.next() # skip header |
|
24 | 24 |
for row in reader: dict_[simplify(row[0])] = row[0] |
25 | 25 |
stream.close() |
26 | 26 |
|
27 | 27 |
# Canonicalize input |
28 | 28 |
reader = csv.reader(sys.stdin) |
29 | 29 |
writer = csv.writer(sys.stdout) |
30 |
writer.writerow(reader.next()) # pass through header |
|
31 | 30 |
for row in reader: |
32 | 31 |
term = simplify(row[col_num]) |
33 | 32 |
try: row[col_num] = dict_[term] |
Also available in: Unified diff
canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header