/ - Diff - BIEN 3 - NCEAS Projects

« Previous | Next »

Revision 4649

canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header

     #!/usr/bin/env python
     # Canonicalizes a spreadsheet column to a vocabulary.
     # The column header is also canonicalized. CSVs without a header are supported.
     # Unrecognized names are left untouched, permitting successive runs on different
     # vocabularies.
     # Case- and punctuation-insensitive.
-...
         dict_ = {}
         stream = open(vocab_path, 'rb')
         reader = csv.reader(stream)
         reader.next() # skip header
         for row in reader: dict_[simplify(row[0])] = row[0]
         stream.close()
         # Canonicalize input
         reader = csv.reader(sys.stdin)
         writer = csv.writer(sys.stdout)
         writer.writerow(reader.next()) # pass through header
         for row in reader:
             term = simplify(row[col_num])
             try: row[col_num] = dict_[term]

Also available in: Unified diff