1
|
*.dmp files are bcp-like dump from GenBank taxonomy database.
|
2
|
|
3
|
General information.
|
4
|
Field terminator is "\t|\t"
|
5
|
Row terminator is "\t|\n"
|
6
|
|
7
|
nodes.dmp file consists of taxonomy nodes. The description for each node includes the following
|
8
|
fields:
|
9
|
tax_id -- node id in GenBank taxonomy database
|
10
|
parent tax_id -- parent node id in GenBank taxonomy database
|
11
|
rank -- rank of this node (superkingdom, kingdom, ...)
|
12
|
embl code -- locus-name prefix; not unique
|
13
|
division id -- see division.dmp file
|
14
|
inherited div flag (1 or 0) -- 1 if node inherits division from parent
|
15
|
genetic code id -- see gencode.dmp file
|
16
|
inherited GC flag (1 or 0) -- 1 if node inherits genetic code from parent
|
17
|
mitochondrial genetic code id -- see gencode.dmp file
|
18
|
inherited MGC flag (1 or 0) -- 1 if node inherits mitochondrial gencode from parent
|
19
|
GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage
|
20
|
hidden subtree root flag (1 or 0) -- 1 if this subtree has no sequence data yet
|
21
|
comments -- free-text comments and citations
|
22
|
|
23
|
Taxonomy names file (names.dmp):
|
24
|
tax_id -- the id of node associated with this name
|
25
|
name_txt -- name itself
|
26
|
unique name -- the unique variant of this name if name not unique
|
27
|
name class -- (synonym, common name, ...)
|
28
|
|
29
|
Divisions file (division.dmp):
|
30
|
division id -- taxonomy database division id
|
31
|
division cde -- GenBank division code (three characters)
|
32
|
division name -- e.g. BCT, PLN, VRT, MAM, PRI...
|
33
|
comments
|
34
|
|
35
|
Genetic codes file:
|
36
|
genetic code id -- GenBank genetic code id
|
37
|
abbreviation -- genetic code name abbreviation
|
38
|
name -- genetic code name
|
39
|
cde -- translation table for this genetic code
|
40
|
starts -- start codons for this genetic code
|
41
|
|
42
|
Deleted nodes file (delnodes.dmp):
|
43
|
tax_id -- deleted node id
|
44
|
|
45
|
Merged nodes file (merged.dmp):
|
46
|
old_tax_id -- id of nodes which has been merged
|
47
|
new_tax_id -- id of nodes which is result of merging
|
48
|
|
49
|
Citations file (citations.dmp):
|
50
|
cit_id -- the unique id of citation
|
51
|
cit_key -- citation key
|
52
|
pubmed_id -- unique id in PubMed database (0 if not in PubMed)
|
53
|
medline_id -- unique id in MedLine database (0 if not in MedLine)
|
54
|
url -- URL associated with citation
|
55
|
text -- any text (usually article name and authors).
|
56
|
-- The following characters are escaped in this text by a backslash:
|
57
|
-- newline (appear as "\n"),
|
58
|
-- tab character ("\t"),
|
59
|
-- double quotes ('\"'),
|
60
|
-- backslash character ("\\").
|
61
|
taxid_list -- list of node ids separated by a single space
|