Project

General

Profile

1
*.dmp files are bcp-like dump from GenBank taxonomy database.
2

    
3
General information.
4
Field terminator is "\t|\t"
5
Row terminator is "\t|\n"
6

    
7
nodes.dmp file consists of taxonomy nodes. The description for each node includes the following
8
fields:
9
	tax_id					-- node id in GenBank taxonomy database
10
 	parent tax_id				-- parent node id in GenBank taxonomy database
11
 	rank					-- rank of this node (superkingdom, kingdom, ...) 
12
 	embl code				-- locus-name prefix; not unique
13
 	division id				-- see division.dmp file
14
 	inherited div flag  (1 or 0)		-- 1 if node inherits division from parent
15
 	genetic code id				-- see gencode.dmp file
16
 	inherited GC  flag  (1 or 0)		-- 1 if node inherits genetic code from parent
17
 	mitochondrial genetic code id		-- see gencode.dmp file
18
 	inherited MGC flag  (1 or 0)		-- 1 if node inherits mitochondrial gencode from parent
19
 	GenBank hidden flag (1 or 0)            -- 1 if name is suppressed in GenBank entry lineage
20
 	hidden subtree root flag (1 or 0)       -- 1 if this subtree has no sequence data yet
21
 	comments				-- free-text comments and citations
22

    
23
Taxonomy names file (names.dmp):
24
	tax_id					-- the id of node associated with this name
25
	name_txt				-- name itself
26
	unique name				-- the unique variant of this name if name not unique
27
	name class				-- (synonym, common name, ...)
28

    
29
Divisions file (division.dmp):
30
	division id				-- taxonomy database division id
31
	division cde				-- GenBank division code (three characters)
32
	division name				-- e.g. BCT, PLN, VRT, MAM, PRI...
33
	comments
34

    
35
Genetic codes file:
36
	genetic code id				-- GenBank genetic code id
37
	abbreviation				-- genetic code name abbreviation
38
	name					-- genetic code name
39
	cde					-- translation table for this genetic code
40
	starts					-- start codons for this genetic code
41

    
42
Deleted nodes file (delnodes.dmp):
43
	tax_id					-- deleted node id
44

    
45
Merged nodes file (merged.dmp):
46
	old_tax_id                              -- id of nodes which has been merged
47
	new_tax_id                              -- id of nodes which is result of merging
48

    
49
Citations file (citations.dmp):
50
	cit_id					-- the unique id of citation
51
	cit_key					-- citation key
52
	pubmed_id				-- unique id in PubMed database (0 if not in PubMed)
53
	medline_id				-- unique id in MedLine database (0 if not in MedLine)
54
	url					-- URL associated with citation
55
	text					-- any text (usually article name and authors).
56
						-- The following characters are escaped in this text by a backslash:
57
						-- newline (appear as "\n"),
58
						-- tab character ("\t"),
59
						-- double quotes ('\"'),
60
						-- backslash character ("\\").
61
	taxid_list				-- list of node ids separated by a single space
    (1-1/1)