readme

README

Terms and License

The DAVID Knowledgebase Site:

DAVID_knowledgebase

Applications:

For given genes, to access the corresponding heterogeneous functional annotations, which cover over 50 categories from dozens of public databases, in a high-throughput manner.
For given gene identifiers, to translate to other types of gene identifiers representing the same gene entries in a high-throughput manner.
For given annotation terms, to access the corresponding genes in a high-throughput manner.

Some Important Points of the DAVID Knowledgebase:

DAVID Knowledgebase does not create and own any of the annotation contents. Thus, the annotation contents in DAVID Knowledgebase is free to all users.DAVID Team is not responsible for the accuracy of the annotation contents which come from original resources.
The DAVID Knowledgebase is an integrated database by collecting the heterogeneuos annotations from those public data sources, and thereafter integrating them into one centralized space. DAVID Knowledgebase is only responsible for the integration problems, such as certain annotation-gene assignment not consistent with original data sources.
DAVID Gene IDs are created with an unique single-linkage procedure. DAVID Gene ID is non-redundant gene cluster ID which holds many different types of gene identifiers for one single gene entry.
DAVID Gene IDs are used as the unique index IDs to link ALL types of gene identifiers and corresponding annotations throughout DAVID Knowledgebase. Thus, DAVID Gene ID, owned by DAVID Team and subjected license requirement (pending, not available yet) to for-profit uses, plays the central role in the integration.
All data including gene identifiers and annotation contents are stored in a sturcture as simple pair-wide flat files. All the files are cross linked with the DAVID Gene IDs. The file names are created based on the original data sources, such as david2entrez_gene.txt or david2goterm_mf_levle1.txt.
Each files contain all available contents for all available species regarding the particular annotation categories.
All text files are compressed to zipped files. Users need compressing programs, such as winzip, to unzip the files before using them. Files are operating system independant, i.e. the unzipped files can be read in DOS, Windows or Unix/Linux environments with any text editors, such as: MS word; Notepad, EditPlus, more, vi, etc. Some file may be very large.

File Organization and Structures for Downloads*

Main Category Folder	Database Files	Special Comments
Disease	DAVID2GENETIC_ASSOCIATION_DB.txt DAVID2OMIM_PHENOTYPE.txt
Functional_Categories	DAVID2COG_KOG_ONTOLOGY.txt DAVID2PIR_SEQ_FEATURE.txt DAVID2SP_COMMENT_TYPE.txt DAVID2SP_PIR_KEYWORDS.txt DAVID2UP_SEQ_FEATURE.txt
Gene_Tissue_Expression	DAVID2CGAP_EST.txt DAVID2CGAP_SAGE.txt DAVID2GNP_MICROARRAY_GCRMA.txt DAVID2GNP_MICROARRAY_MAS5.txt DAVID2UNIGENE_EST_PROFILE.txt	The gene-tissue pair means that the gene highly expressed in that tissue.
General_Annotations	DAVID2ALIAS_GENE_SYMBOL.txt DAVID2CHROMOSOME.txt DAVID2CYTOBAND.txt DAVID2GENE_NAME.txt DAVID2GENE_SYMBOL.txt DAVID2HOMOLOGOUS_GENE.txt ......
Literature	DAVID2GENERIF_SUMMARY.txt DAVID2HIV_INTERACTION_PUBMED_ID.txt DAVID2PUBMED_ID.txt
Main_Accessions	DAVID2AFFY_ID.txt DAVID2ENTREZ_GENE_ID.txt DAVID2GENPEPT_ACCESSION.txt DAVID2PIR_ACCESSION.txt DAVID2PIR_ID.txt DAVID2PIR_NREF_ID.txt DAVID2REFSEQ_GENOMIC.txt DAVID2REFSEQ_MRNA.txt DAVID2REFSEQ_PROTEIN.txt DAVID2REFSEQ_RNA.txt DAVID2UNIGENE.txt DAVID2UNIPROT_ACCESSION.txt DAVID2UNIPROT_ID.txt DAVID2UNIREF100_ID.txt	These files are the key files to be used to map users' ID to DAVID IDs, or to other types of public gene IDs.
Ontologies	DAVID2GOTERM_BP_1.txt DAVID2GOTERM_BP_2.txt DAVID2GOTERM_BP_3.txt DAVID2GOTERM_BP_4.txt DAVID2GOTERM_BP_5.txt DAVID2GOTERM_BP_ALL.txt DAVID2GOTERM_CC_1.txt DAVID2GOTERM_CC_2.txt DAVID2GOTERM_CC_3.txt DAVID2GOTERM_CC_4.txt DAVID2GOTERM_CC_5.txt DAVID2GOTERM_CC_ALL.txt DAVID2GOTERM_MF_1.txt DAVID2GOTERM_MF_2.txt DAVID2GOTERM_MF_3.txt DAVID2GOTERM_MF_4.txt DAVID2GOTERM_MF_5.txt DAVID2GOTERM_MF_ALL.txt DAVID2PANTHER_TERM_BP.txt DAVID2PANTHER_TERM_MF.txt	"xxxx-ALL" contains all the levels of GO terms. Therefore, "xxx-1,2,3,4,5" files are subsets of the "xxx-ALL" files.
Other_Accessions	DAVID2DICTYBASE_ID.txt DAVID2ECOGENE_ID.txt DAVID2FLYBASE_ID.txt DAVID2GENEDB_SPOMBE_ID.txt DAVID2GLYCOSUITEDB_ID.txt DAVID2HAMAP_ID.txt ..........
Pathways	DAVID2BBID.txt DAVID2BIOCARTA.txt DAVID2EC_NUMBER.txt DAVID2KEGG_COMPOUND.txt DAVID2KEGG_PATHWAY.txt DAVID2KEGG_REACTION.txt DAVID2PANTHER_PATHWAY.txt
Protein_Domains	DAVID2BLOCKS_ID.txt DAVID2COG_KOG_NAME.txt DAVID2INTERPRO_NAME.txt DAVID2PANTHER_FAMILY.txt DAVID2PANTHER_SUBFAMILY.txt DAVID2PDB_ID.txt DAVID2PFAM_NAME.txt DAVID2PIR_ALN.txt DAVID2PIR_HOMOLOGY_DOMAIN.txt DAVID2PIR_SUPERFAMILY_NAME.txt DAVID2PRINTS_NAME.txt DAVID2PRODOM_NAME.txt DAVID2PROSITE_NAME.txt DAVID2SCOP_ID.txt DAVID2SMART_NAME.txt DAVID2TIGRFAMS_NAME.txt
Protein_Interactions	DAVID2BIND.txt DAVID2DIP.txt DAVID2HIV_INTERACTION.txt DAVID2HIV_INTERACTION_CATEGORY.txt DAVID2HPRD_INTERACTION.txt DAVID2MINT.txt DAVID2NCICB_CAPATHWAY.txt DAVID2REACTOME_INTERACTION.txt DAVID2TRANSFAC_ID.txt
Species	DAVID2TAX.txt	Gene species information.
Gene_Names_Symbols	DAVID2GENE_NAME.txt DAVID2GENE_SYMBOL.txt	Map DAVID ids to gene names or symbols.

*Note:

Each database file represents an particular annotation source. From the naming convention, users should understand the original sources. For example, DAVID2BIND.txt mean BIND interaction database in DAVID.
The database files are organized into 11 bigger categories (consistent with the interface organization on DAVID Functional Annotation Tool) to facilitate the quick access to the area of users' interests.
The gene-annotation pair in each file mean the parcitular gene associates with the according annotation term.
You probably do not need to download all files. For example, you have 1000 interesting Affy IDs, you want to study the KEGG pathways. For this purpose, you only need download three files: david2affy_id.txt, david2KEGG_Pathway.txt and david2gene_name.txt.
DAVID data files are species independant. Thus, each data files in DAVID Knowledgebase contain all available contents for all available species. If ones are only interested in certain species, they can parse files that you need in your studies according to david2taxid.txt where contains species information. Or you can directly use the files as it is and ignore the extra information for other species in the files
DAVID Web site provides query interface. If users only need a small set of data, i.e. some annotations for 10 genes, all above information can be queried through the DAVID Functional Annotation Table that is part of DAVID Functional Annotation Tool

Example 1: Cross Mapping Gene IDs

Task: I have 35439_at,679_at , .... 1000 Affy IDs. I would like to know the corresponding NCBI Entrez IDs, Uniprot Accessions, Gene Name and Gene Symbols.

Solution:

Step 1:

Map Affy ID 35439_at to the corresponding DAVID ID with file of Main_Accessions/DAVID2AFFY_ID.txt. We can get pair of DAVID ID <- Affy ID as 2875235 <-35439_at

Step 2:

Map DAVID ID 2875235 to corresponding Entrez ID with file of Main_Accessions/DAVID2ENTREZ_GENE_ID.txt. We can get pair of DAVID ID to Entrez ID as 2875235 -> 7536.
Map DAVID ID 2875235 to corresponding Uniprot Accesion with file of Main_Accessions/ DAVID2UNIPROT_ACCESSION.txt. We can get pair of DAVID ID to Uniprot Accession as 2875235 ->Q9UEI0.
Map DAVID ID 2875235 to corresponding Gene Name wit file of Gene_Names_Symbols/ DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene Name as 2875235 -> transcription factor ZFM1.
Map DAVID ID 2875235 to corresponding Gene Name wit file of Gene_Names_Symbols/ DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene Symbol as 2875235 -> SF1.
By now, with DAVID Knowledgebase, 35439_at is cross referenced to Entrez Gene 7536, UniProt Accession Q9UEI0, Gene Name "transcription factor ZFM1", and Gene Symbol "SF1".

Step 3:

Repeat Step 1 & Step 2 for rest of Affy IDs.

Example 2: Query annotation contents for a given gene

Task: I have Affy ID 35439_at, what are the associated terms of Gene Ontology(GO)/Biological Process(BP)/All level?

Solution:

Step 1: Map Affy ID 35439_at to the corresponding DAVID ID with file of Main_Accessions/DAVID2AFFY_ID.txt. We can get pair of DAVID ID <- Affy ID as 2875235 <-35439_at
Step 2: Map DAVID ID 2875235 to corresponding Gene Ontology with file of Ontologies/ DAVID2GOTERM_BP_ALL.txt. We can get pair of DAVID ID to GOTERM_BP_ALL as 2875235 -> "TRANSCRIPTION, DNA-DEPENDENT"

Edited by DAVID Team on Feb. 2022

DAVID