1 |
5491
|
aaronmk
|
*.dmp files are bcp-like dump from GenBank taxonomy database.
|
2 |
|
|
|
3 |
|
|
General information.
|
4 |
|
|
Field terminator is "\t|\t"
|
5 |
|
|
Row terminator is "\t|\n"
|
6 |
|
|
|
7 |
|
|
nodes.dmp file consists of taxonomy nodes. The description for each node includes the following
|
8 |
|
|
fields:
|
9 |
|
|
tax_id -- node id in GenBank taxonomy database
|
10 |
|
|
parent tax_id -- parent node id in GenBank taxonomy database
|
11 |
|
|
rank -- rank of this node (superkingdom, kingdom, ...)
|
12 |
|
|
embl code -- locus-name prefix; not unique
|
13 |
|
|
division id -- see division.dmp file
|
14 |
|
|
inherited div flag (1 or 0) -- 1 if node inherits division from parent
|
15 |
|
|
genetic code id -- see gencode.dmp file
|
16 |
|
|
inherited GC flag (1 or 0) -- 1 if node inherits genetic code from parent
|
17 |
|
|
mitochondrial genetic code id -- see gencode.dmp file
|
18 |
|
|
inherited MGC flag (1 or 0) -- 1 if node inherits mitochondrial gencode from parent
|
19 |
|
|
GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage
|
20 |
|
|
hidden subtree root flag (1 or 0) -- 1 if this subtree has no sequence data yet
|
21 |
|
|
comments -- free-text comments and citations
|
22 |
|
|
|
23 |
|
|
Taxonomy names file (names.dmp):
|
24 |
|
|
tax_id -- the id of node associated with this name
|
25 |
|
|
name_txt -- name itself
|
26 |
|
|
unique name -- the unique variant of this name if name not unique
|
27 |
|
|
name class -- (synonym, common name, ...)
|
28 |
|
|
|
29 |
|
|
Divisions file (division.dmp):
|
30 |
|
|
division id -- taxonomy database division id
|
31 |
|
|
division cde -- GenBank division code (three characters)
|
32 |
|
|
division name -- e.g. BCT, PLN, VRT, MAM, PRI...
|
33 |
|
|
comments
|
34 |
|
|
|
35 |
|
|
Genetic codes file:
|
36 |
|
|
genetic code id -- GenBank genetic code id
|
37 |
|
|
abbreviation -- genetic code name abbreviation
|
38 |
|
|
name -- genetic code name
|
39 |
|
|
cde -- translation table for this genetic code
|
40 |
|
|
starts -- start codons for this genetic code
|
41 |
|
|
|
42 |
|
|
Deleted nodes file (delnodes.dmp):
|
43 |
|
|
tax_id -- deleted node id
|
44 |
|
|
|
45 |
|
|
Merged nodes file (merged.dmp):
|
46 |
|
|
old_tax_id -- id of nodes which has been merged
|
47 |
|
|
new_tax_id -- id of nodes which is result of merging
|
48 |
|
|
|
49 |
|
|
Citations file (citations.dmp):
|
50 |
|
|
cit_id -- the unique id of citation
|
51 |
|
|
cit_key -- citation key
|
52 |
|
|
pubmed_id -- unique id in PubMed database (0 if not in PubMed)
|
53 |
|
|
medline_id -- unique id in MedLine database (0 if not in MedLine)
|
54 |
|
|
url -- URL associated with citation
|
55 |
|
|
text -- any text (usually article name and authors).
|
56 |
|
|
-- The following characters are escaped in this text by a backslash:
|
57 |
|
|
-- newline (appear as "\n"),
|
58 |
|
|
-- tab character ("\t"),
|
59 |
|
|
-- double quotes ('\"'),
|
60 |
|
|
-- backslash character ("\\").
|
61 |
|
|
taxid_list -- list of node ids separated by a single space
|