README

Term and License:

The DAVID Knowledgebase is FREE to be downloaded by non-commericial users for non-profit use. The download of the DAVID Knowledgement for commercial use by non-profit and for-profit entities requires the NIH license agreement and fees .

The DAVID Knowledgebase Site:

http://david.abcc.ncifcrf.gov/content.jsp?file=/knowledgebase/DAVID_knowledgebase.html

Mini-example-version of DAVID Knowledgebase:


Before moving to actual files which size could be larger, please study the mini-example-version of the databases in order to understand its structure and usage. The mini-example-version contains identical file structures to that in the actual version.
http://david.abcc.ncifcrf.gov/examples/

Applications:

  • For given genes, to access the corresponding heterogeneous functional annotations, which cover  over 50 categories from dozens of public databases,  in a high-throughput manner.
  • For given gene identifiers, to translate to other types of gene identifiers representing the same gene entries in a high-throughput manner.
  • For given annotation terms, to access the corresponding genes in a high-throughput manner.

Some Important Points of the DAVID Knowledgebase:
  • DAVID  Knowledgebase does not  create and own any of the annotation contents. Thus, the annotation contents in DAVID Knowledgebase is free to all users.DAVID Team is not responsible for the accuracy of the annotation contents which come from original resources.
  • The DAVID Knowledgebase is an integrated database by collecting the heterogeneuos annotations from those public data sources, and thereafter integrating them into one centralized  space. DAVID Knowledgebase is only responsible for the integration problems, such as certain annotation-gene assignment not consistent with original data sources.
  • DAVID Gene IDs are created with an unique single-linkage procedure. DAVID Gene ID is non-redundant gene cluster ID which holds many different types of gene identifiers for one single gene entry.
  • DAVID Gene IDs are used as the unique index IDs to link ALL types of gene identifiers and corresponding annotations throughout DAVID Knowledgebase. Thus, DAVID Gene ID, owned by DAVID Team and subjected license requirement (pending, not available yet) to for-profit uses,  plays the central role in the integration.
  • All data including gene identifiers and annotation contents are stored in a sturcture as simple pair-wide flat files. All the files are cross linked with the DAVID Gene IDs. The file names are created based on the original data sources, such as david2entrez_gene.txt or david2goterm_mf_levle1.txt.
  • Each files contain all available contents for all available species regarding the particular annotation categories.
  • All text files are compressed to zipped files. Users need compressing programs, such as winzip, to unzip the files before using them. Files are operating system independant, i.e. the unzipped files can be read in DOS, Windows or Unix/Linux environments with any text editors, such as: MS word; Notepad, EditPlus, more, vi, etc. Some file may be very large.

File Organization and Structures for Downloads*

Main Category Folder
Database Files
Special Comments
 Disease
 DAVID2GENETIC_ASSOCIATION_DB.txt
 DAVID2OMIM_PHENOTYPE.txt

 Functional_Categories  DAVID2COG_KOG_ONTOLOGY.txt 
 DAVID2PIR_SEQ_FEATURE.txt 
 DAVID2SP_COMMENT_TYPE.txt 
 DAVID2SP_PIR_KEYWORDS.txt 
 DAVID2UP_SEQ_FEATURE.txt 

Gene_Tissue_Expression  DAVID2CGAP_EST.txt
 DAVID2CGAP_SAGE.txt
 DAVID2GNP_MICROARRAY_GCRMA.txt
 DAVID2GNP_MICROARRAY_MAS5.txt
 DAVID2UNIGENE_EST_PROFILE.txt
The gene-tissue pair means that the gene highly expressed in that tissue.
General_Annotations DAVID2ALIAS_GENE_SYMBOL.txt
DAVID2CHROMOSOME.txt
DAVID2CYTOBAND.txt
DAVID2GENE_NAME.txt
DAVID2GENE_SYMBOL.txt
DAVID2HOMOLOGOUS_GENE.txt
......

 Literature  DAVID2GENERIF_SUMMARY.txt
 DAVID2HIV_INTERACTION_PUBMED_ID.txt
 DAVID2PUBMED_ID.txt

Main_Accessions DAVID2AFFY_ID.txt
DAVID2ENTREZ_GENE_ID.txt
DAVID2GENPEPT_ACCESSION.txt
DAVID2PIR_ACCESSION.txt
DAVID2PIR_ID.txt
DAVID2PIR_NREF_ID.txt
DAVID2REFSEQ_GENOMIC.txt
DAVID2REFSEQ_MRNA.txt
DAVID2REFSEQ_PROTEIN.txt
DAVID2REFSEQ_RNA.txt
DAVID2UNIGENE.txt
DAVID2UNIPROT_ACCESSION.txt
DAVID2UNIPROT_ID.txt
DAVID2UNIREF100_ID.txt
These files are the key files to be used to map users' ID to DAVID IDs, or to other types of public gene IDs.
Ontologies DAVID2GOTERM_BP_1.txt
DAVID2GOTERM_BP_2.txt
DAVID2GOTERM_BP_3.txt
DAVID2GOTERM_BP_4.txt
DAVID2GOTERM_BP_5.txt
DAVID2GOTERM_BP_ALL.txt
DAVID2GOTERM_CC_1.txt
DAVID2GOTERM_CC_2.txt
DAVID2GOTERM_CC_3.txt
DAVID2GOTERM_CC_4.txt
DAVID2GOTERM_CC_5.txt
DAVID2GOTERM_CC_ALL.txt
DAVID2GOTERM_MF_1.txt
DAVID2GOTERM_MF_2.txt
DAVID2GOTERM_MF_3.txt
DAVID2GOTERM_MF_4.txt
DAVID2GOTERM_MF_5.txt
DAVID2GOTERM_MF_ALL.txt
DAVID2PANTHER_TERM_BP.txt
DAVID2PANTHER_TERM_MF.txt
"xxxx-ALL" contains all the levels of GO terms. Therefore,  "xxx-1,2,3,4,5" files are subsets of the "xxx-ALL" files.
Other_Accessions DAVID2DICTYBASE_ID.txt
DAVID2ECOGENE_ID.txt
DAVID2FLYBASE_ID.txt
DAVID2GENEDB_SPOMBE_ID.txt
DAVID2GLYCOSUITEDB_ID.txt
DAVID2HAMAP_ID.txt
..........

 Pathways DAVID2BBID.txt
DAVID2BIOCARTA.txt
DAVID2EC_NUMBER.txt
DAVID2KEGG_COMPOUND.txt
DAVID2KEGG_PATHWAY.txt
DAVID2KEGG_REACTION.txt
DAVID2PANTHER_PATHWAY.txt

Protein_Domains DAVID2BLOCKS_ID.txt
DAVID2COG_KOG_NAME.txt
DAVID2INTERPRO_NAME.txt
DAVID2PANTHER_FAMILY.txt
DAVID2PANTHER_SUBFAMILY.txt
DAVID2PDB_ID.txt
DAVID2PFAM_NAME.txt
DAVID2PIR_ALN.txt
DAVID2PIR_HOMOLOGY_DOMAIN.txt
DAVID2PIR_SUPERFAMILY_NAME.txt
DAVID2PRINTS_NAME.txt
DAVID2PRODOM_NAME.txt
DAVID2PROSITE_NAME.txt
DAVID2SCOP_ID.txt
DAVID2SMART_NAME.txt
DAVID2TIGRFAMS_NAME.txt

Protein_Interactions DAVID2BIND.txt
DAVID2DIP.txt
DAVID2HIV_INTERACTION.txt
DAVID2HIV_INTERACTION_CATEGORY.txt
DAVID2HPRD_INTERACTION.txt
DAVID2MINT.txt
DAVID2NCICB_CAPATHWAY.txt
DAVID2REACTOME_INTERACTION.txt
DAVID2TRANSFAC_ID.txt

Species
DAVID2TAX.txt
Gene species information.
Gene_Names_Symbols
DAVID2GENE_NAME.txt
DAVID2GENE_SYMBOL.txt
Map DAVID ids to gene names or symbols.

*Note:
  1. Each database file represents an particular annotation source. From the naming convention, users should understand the original sources. For example, DAVID2BIND.txt mean BIND interaction database in DAVID.
  2. The database files are organized into 11 bigger categories (consistent with the interface organization on DAVID Functional Annotation Tool) to facilitate the quick access to the area of users' interests.
  3. The gene-annotation pair in each file mean the parcitular gene associates with the according annotation term.
  4. You probably do not need to download all files. For example, you have 1000 interesting Affy IDs, you want to study the KEGG pathways. For this purpose, you only need download three files: david2affy_id.txt, david2KEGG_Pathway.txt and david2gene_name.txt.
  5. DAVID data files are species independant. Thus, each data files in DAVID Knowledgebase contain all available contents for all available species. If ones are only interested in certain species, they can parse files that  you need in your studies according to david2taxid.txt where contains species information. Or you can directly use the files as it is and ignore the extra information for other species in the files
  6. DAVID Web site provides query interface. If users only need a small set of data, i.e. some annotations for 10 genes, all above information can be queried through the  DAVID Functional Annotation Table that is part of DAVID Functional Annotation Tool


Example 1: Cross Mapping Gene IDs


Task: I have 35439_at,679_at , .... 1000 Affy IDs. I would like to know the corresponding NCBI Entrez IDs, Uniprot Accessions, Gene Name and Gene Symbols.

Solution:

http://david.abcc.ncifcrf.gov/examples/

Step 1: 
  • Map Affy ID 35439_at   to the corresponding DAVID ID with file of Main_Accessions/DAVID2AFFY_ID.txt. We can get pair of DAVID ID <- Affy ID as 2875235 <-35439_at
Step 2: 
  • Map DAVID ID 2875235 to corresponding Entrez ID with file of Main_Accessions/DAVID2ENTREZ_GENE_ID.txt. We can get  pair of DAVID ID to Entrez ID as 2875235  -> 7536.
  • Map DAVID ID 2875235 to corresponding Uniprot Accesion with file of  Main_Accessions/ DAVID2UNIPROT_ACCESSION.txt. We can get pair of DAVID ID to Uniprot Accession as 2875235  ->Q9UEI0.
  • Map DAVID ID 2875235 to corresponding Gene Name wit file of  Gene_Names_Symbols/ DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene Name as 2875235  -> transcription factor ZFM1.
  • Map DAVID ID 2875235 to corresponding Gene Name wit file of  Gene_Names_Symbols/ DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene Symbol as 2875235  -> SF1.
  • By now, with DAVID Knowledgebase, 35439_at  is cross referenced to Entrez  Gene 7536, UniProt Accession Q9UEI0, Gene Name "transcription factor ZFM1", and Gene Symbol "SF1".
Step 3:
  • Repeat Step 1 & Step 2 for rest of Affy IDs.

Example 2: Query annotation contents for a given gene


Task:  I have Affy ID 35439_at, what are the associated terms of Gene Ontology(GO)/Biological Process(BP)/All level?

Solution:

http://david.abcc.ncifcrf.gov/examples/

Step 1: Map Affy ID 35439_at   to the corresponding DAVID ID with file of Main_Accessions/DAVID2AFFY_ID.txt. We can get pair of DAVID ID <- Affy ID as 2875235 <-35439_at
Step 2: Map DAVID ID 2875235 to corresponding Gene Ontology  with file of  Ontologies/ DAVID2GOTERM_BP_ALL.txt. We can get pair of DAVID ID to GOTERM_BP_ALL as 2875235 -> "TRANSCRIPTION, DNA-DEPENDENT"

Edited by DAVID Team on Jan. 2007
http://david.abcc.ncifcrf.gov