Frequently Asked Questions

1. What is DAVID?
2. What tools does DAVID provide to analyze my gene lists?
3. What accession numbers and gene identifiers does DAVID accept?
4. What file formats can be uploaded/downloaded by DAVID?
5. Who can use DAVID?
6. Where does DAVID's knowledgebase come from and how current is it? 
7. Who do I contact if I find an annotation error?
8. How are genes counted in DAVID Chart Report?
9. Why are there different levels for GO Annotation?
10. What does it mean to have empty chart report?
11. How do I cite DAVID?
12. What is the purpose of the minimum number of hits and maximun p-value thresholds?
13. What is really going on behind the scenes when I chose lets say level 1 compared to Level 5.  What else is being done in Level 5 that is not in Level 1?
14. Is the Domain Charts most beneficial for categorizing ESTs?  How else can I take advantage of this module?
15. What journal articles have cited  DAVID or EASE?
16. Not all of my genes are annotated!  Why?
17. What are the system requirements to run DAVID?
18. Do other site mirror DAVID application?
19. How do my applications take advantage DAVID functions?
20. What the computing technologies are used in DAVID applications to enhance speed?
21. What is the quanlity of tissue expression data in DAVID 2006?
22. What are the choices of population backgrounds in DAVID 2006?
23. Does DAVID limit the maximum number of genes in a list?
24. What is the format requirement to submit a gene list to DAVID?
25. Which DAVID tools is more suitable to answer my questions?
26. Why DAVID gives empty results after I walk away for a while?


1.  What is DAVID?   DAVID 1.x was originally designed as a  web-based functional annotation tool, particularly for gene-enrichment analysis,on DAVID knowledgebase which contains annotations and gene asccessions linked by LocusLink IDs in 2003 version. As the result of continuely improving, DAVID 2.x provides a largest integrated annotation knowledgebase based on newly developed  "DAVID Gene Concept", a graph theory evidence-based method  to agglomerate heterogeneous and widely distributed public databases. It also provides an enhanced set of bioinformatics tools, not only limited to functional annotation, to  systematically summarize the relevant biological patterns from user-classified gene list. Therefore, users can quickly understand the biological themes under the study.   As committed to continuely addressing the challenges of system biology, DAVID  will keep upgrading and more tools are under developing. 

2.  What tools does DAVID provide to analyze my gene list?  DAVID 2.x provides a largest integrated knowledgebase collected from most of common bioinformatic resources (see content section for details). To leverage the knowledgebase, three sets of comprehensive tools had been developed including: Functional Annotation Tool; Gene Accession Conversion Tool; NIAID Pathogen Genome Browser; etc. In Functional Annotation Tool,  it does gene-enrichment analysis, pathway mapping, gene/term similarity search, graphic presentation, homologue match, ID translation, etc.; In Gene Accession Conversion Tool, it converts a  list of gene IDs/accessions to others of your choice with the most comprehensive gene ID mapping repository in DAVID 2.1. The ambiguity or contamination accessions in the list can also be quickly detected and determined by users; In Genome Browser, users can quickly search or navigate their interesting genes which can be further analyzed by submitting to Functional Annotation Tool. Moreover, a couple of new tools, such as Pathway-Centric Microarray Analysis Tool, Gene-Term Functional Map, etc, are under developing.

3.  What accession numbers and gene identifiers does DAVID accept?   DAVID accepts wide-range types of gene accessions/IDs. Users can view all the gene accession options from the drop down selection manu in gene list input page.  

4.  What file formats can be uploaded/downloaded by DAVID?  Plain text (*.txt), tab-delimited files can be uploaded by DAVID.  The first column of your file must contain the gene identifier and the second column may contain an optional value (eg., fold change, p-value, correlation, cluster number, experimental group, etc.). Remove column headings and save the file as a Tab delimited text file. To convert an excel file to this format choose File>Save As> then under save as type choose Text (Tab delimited) (*.txt).  To save your annotated gene list from your browser to your hard drive as an excel file simply choose File>Save As> then type yourfilename.xls and save to your hard drive. You can then open this file in Microsoft excel and perform typical excel-type analysis.

5.  Who can use DAVID?  DAVID is free to use for all users. Please see license section for more details.

6.  Where does DAVID's knowledgebase come from and how current is it?   DAVID 2.x knowledgebase is design around the "DAVID Gene Concept", a graph theory evidence-based method to agglomerate species-specific gene/protein identifiers and their annotations from a variety of public genomic resources (e.g. NCBI, PIR, SWISS-PROT, GO, OMIM, PubMed, KEGG, BIOCARTA, AffyMetrix, TIGR, Pfam, BIND, MINT, DIP, etc.).  The DAVID Gene Concept method groups tens of million of identifiers from over 65,000 species into 1.5 million unique protein/gene records.  The grouping of such identifiers allows agglomeration of a diverse array of functional and sequence annotation, greatly enriching the level of biological information available for a given gene (e.g. gene sequence Ids, protein functional domains, gene ontology, pathways, disease associations, gene general descriptions, protein-protein interactions, literatures, homologues, etc.). However, DAVID does not check the quality or accuracy of all original annotation data, if you happen to find annotation errors please contact the primary source of annotation. For more details of content coverage and collecting date, including the last update, please refer to content section.
                                                                                                                                                                                      

7.  Who do I contact if I find an annotation error?  DAVID's purpose it aggregate biological knowledge into an organized structure that allows the efficient dissemination of functional annotations across genome-scale datasets.  DAVID does not guarantee the quality or accuracy of annotation data, if you happen to find annotation errors please contact the primary source of annotation.  If you feel that the errors may be due to some systematic error in DAVID's methods please contact use at DAVID Bioinformatic Team

8.  How are genes counted in DAVID Chart Report? In DAVID 2.x, all charting tools count the number of unique DAVID gene Ids corresponding to user input gene list user.  This means that if  two  or more of your identifiers represent alternatively spliced forms of the same gene it will only be counted once and reflected in the histograms. This counting method is different from DAVID 1.x in which the user's gene identifiers are counted.

9.  Why are there different levels for GO Annotation?  The structured vocabulary created by the Gene Ontology Consortium is a pseudo-hierarchy, or directed acyclic graph (DAG).  The different levels provided by GoCharts allows users to annotate lists of genes at different levels within the DAG.  Level 1 represents the most general categories and provides the most coverage, whereas Level 5 provides more specific information and less coverage.  Users may also annotate their gene lists with all annotations available at all levels, for some genes there will be more than 5.  Additionally, users can choose to use only the most specific categories by selected terminal nodes.  Of note, the fact that proteins are frequently involved in numerous biological processes is reflected in the Gene Ontology structure.  Thus, genes may be annotated with several categories and be counted in each annotation category by the charting tools. 

10.  What does it mean to have empty chart report?  Empyt chart report  means that there are no annotations passed the specified threshold.  It does not mean that no annotation exists.

11.  How do I cite DAVID?  Glynn Dennis Jr., Brad T. Sherman, Douglas A. Hosack, Jun Yang, Michael W. Baseler, H. Clifford Lane, Richard A. Lempicki.  DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology 2003 4(5): P3.   Please refer to license section for more details.

12. What is the purpose of the minimum number of hits  and maximum p-value thresholds?  One way of looking at it is that the thresholds just allow you to filter the result, that is, just show me categories with 3 or more genes.  If you show all categories including those with only one hit than the charts can get very tall and busy.  In  another  words,  a lot non-specific  results can be show up.  Current default  is  2 and 0.1 respectively.

13. What is really going on behind the scenes when I chose lets say GO level 1 compared to GO Level 5.  What else is being done in GO Level 5 that is not in GO Level 1?  Refer to this link http://genomebiology.com/2003/4/5/P3 to the manuscript describing DAVID for a detailed description and figure.  Briefly, level 1 is a general description whereas level 5 is a more specific description.  The GO vocabulary is a type of hierarchy and thus a term at level 5 is a child of a term at level 1 for a given gene.  The specificity at level 5 comes at a cost though, in that list coverage decreases as you move out the hierarchy.

Example:

Level 1 physiological processes

Level 2 response to external stimulus

Level 3 response to biotic stimulus

Level 4 defense response

Level 5 immune response

in the listing above you can see how the term at level 5 "immune response" can be considered a child of level 4, which in turn is a child of level 3, so on and so on up to level 1.  The cost of using the more informative terms at level 5 is lower coverage of your gene list.  In practice levels 2 and 3 are have a pretty good balance between specificity and coverage.  Also, use the all or terminal node (most specific term available for a gene) to learn more about the genes in your list.

14. Is the Domain Charts most beneficial for categorizing ESTs?  How else can I take advantage of this module?  Domaincharts would indeed be useful to group ESTs.  A typical procedure would be to go through GoCharts first and find an interesting and well-represented biological process, say "signal transduction".  Then drill down into the molecular function of the genes involved in that process to see kinases, receptors, transcription factors, etc.  Lastly, users can go to DomainCharts and try to identify highly represented protein domains like kinase domains, zinc fingers, etc that may be relevant to the interesting groupings revealed by GoCharts.  All of this is an exploratory, iterative process that helps users become intimately familiar with their gene lists, thus facilitating decisions regarding where to focus.

15.  What journal articles have cited DAVID or EASE?


16.  Not all of my genes are annotated!  Why?  The reason for this is that the functional annotation of genomes is incomplete and the particular types of annotation that any given gene may have can differ.  For example, when using DAVID you may find a gene that has GO classifications and no functional summary text, while another gene has functional summary text and no GO classifications, while others will have no annotation what so ever.  This is why the database behind DAVID is keeping updated, giving researchers access to the current state of functional annotation, which indeed is always changing. Another reason is that some user input identifiers (particularly some Affy IDs) are blur to be mapped to any known genes.


17.
What are the system requirements to run DAVID?
Please refere to system requirement session.


18. Do any other sites site mirror DAVID applications ?  Differet versions of DAVID are being hosted at two servers  http://david.abcc.ncifcrf.gov  and http://david.niaid.nih.gov

19. How to take advantage of DAVID functional analysis modules ?  DAVID provides a set of APIs for outside applications to directly interact with DAVID calculation and visualization engines. Please refer to Deep Linking Session for details.

20. What  computing technologies are used in  DAVID 2006 applications to enhance speed?  DAVID 2006 uses tomcat 5.2  in red hat linux  as web server. All calculation engines and dynamic pages were done with Java/JSP. Extensive object-oriented programming techniques are involved in  DAVID development, such as  html template and style sheet, to ensure the quality and flexibility of the work. Older versions of DAVIDs relied on Oracle database  for necessory annotation information query. The common speed bottle neck of all DAVID applications are due to large amount of data query and IO.  Since DAVID 2006, DAVID starts using  Java Remote Method Invocation (RMI) as a replacement of Oracle for annotation information IO. This change largely increases the performance speed for all calculation engines in DAVID applications because it turns large amount data IO from disk-oriented way to memory-oriented fashion.

21. What is the quanlity of tissue expression data in DAVID 2006?  DAVID integrated the most popular and world-class tissue expression data from GNF-Affy, CGAP-SAGE, CGAP-EST and Unigene-EST. Together with DAVID functional annotation engines, investigators can quickly indentify  the most enriched gene expression patterns cross hundreds of  normal/disease tissues for any given gene lists. It  could facilitate the biomarker identification and gene expression pattern discovery. However, due to the higher noise nature of high throughput gene expression data,  the results consistent cross multiple resources should give better confidence.

22. What are the choices of population background in DAVID enrichment analysis?  The enrichment analysis is indeed to compare the annotation composition in your gene list to that of a population background genes. In this sense, the selection of a population background will affect result significantly. Unfortunately, there is no gold background to win all the situations of various studies. DAVID default population background in enrichment calculation is the corresponding genome-wide genes with at least one annotation in the analyzing cetegories. The default background is a good choice for the studies in genome-wide scope or close to genome-wide scope. Since DAVID 2006, more background choices have been added in DAVID applications, which are Affymetrix chip and Illumina chip backgrounds and user input customized background. The pre-built Affymetrix chip and Illumina chip backgrounds can be selected through "background" tab on top of Gene List Manager panel. Affymetrix chip and Illumina chip backgrounds will be a better choice for a gene list derived from Affymetrix microarray or Illumina studies, respectively; Similarly to submitting a gene list, users can input a customized population background of a gene list by choosing "background" radio button in step 3 on the input tab. Customized background will be a better choice for studies far below genome-wide scope, such as 500 paper array.

23. Does DAVID limit the maximum number of genes in a list? The goal of DAVID's design is to be able to efficiently annotate a list consisting of 3000 genes. Supported by advanced computing technolgies, all DAVID tools have been tested with 3000-gene goal and have shown to return results from a few seconds to no more than one  minute. If running time longer than a minute, you should repeat you web call or check something else to make sure things right. If you have trouble, please contact DAVID Bioinformatic Team for help. Moreover, DAVID tools can run much larger number of  genes in a list  than 3000-gene goal. Most of the DAVID tools, except Functional Classification Tool, have no input limit until you can not get your results returned by DAVID.  We suggested that you try your larger gene list outside DAVID traffic peak time (10 am - 5 pm EST). Please let us know your experience with larger gene list  analysis on DAVID Forum.  If you try to download genome-wide annotation data from DAVID web for your bioinformatics projects, please refer to Download  Session or ask help from  DAVID Bioinformatic Team.

24. What is the format requirement for my input gene list? You can either load a gene list from a file or paste a gene list to the text box. DAVID was designed to accept the data starting from the first row without hearder (i.e. accession). The gene list has to be in a format of one gene in one row and only the first column is considered in the analysis. DAVID is case insensitive for all the accessions/IDs. Since DAVID list manager is centrilized, the format requirement to submit a gene list are the same for ALL DAVID tools. In addition, the submitted gene lists could be used as customized background genes in the enrichment analysis based on your choice at step 3. The indication of a successful submission is that you should see the corresponding gene lists listed by list tab or background tab. Moreover, an expected gene # should also associated with the gene lists.

Example:

1000_at
1001_at
1002_at


In addition, DAVID pre-built two Demo_lists for users who do not have gene list and want to test DAVID applications. You just simply click on the links of Demo_list 1 or Demo_list 2 on top of submission box to start the analysis. Following are the information regarding the two Demo_list2:

About Demo List 1: One hundred sixty-four genes found to be upregulated in CD4+/CD62L- T cells relative to CD4+/CD62L+ T cells.

Cutting edge: L-selectin (CD62L) expression distinguishes small resting memory CD4+ T cells that preferentially respond to recall antigen.
Hengel RL, Thaker V, Pavlick MV, Metcalf JA, Dennis G Jr, Yang J, Lempicki RA, Sereti I, Lane HC.


J Immunol 2003 Jan 1;170(1):28-32


Naive CD4+ T cells use L-selectin (CD62L) expression to facilitate immune surveillance. However, the reasons for its expression on a subset of memory CD4+ T cells are unknown. We show that memory CD4+ T cells expressing CD62L were smaller, proliferated well in response to tetanus toxoid, had longer telomeres, and expressed genes and proteins consistent with immune surveillance function. Conversely, memory CD4+ T cells lacking CD62L expression were larger, proliferated poorly in response to tetanus toxoid, had shorter telomeres, and expressed genes and proteins consistent with effector function. These findings suggest that CD62L expression facilitates immune surveillance by programming CD4+ T cell blood and lymph node recirculation, irrespective of naive or memory CD4+ T cell phenotype.

About Demo List 2: Four hundred three genes found to be induced in peripheral blood mononuclear cells incubated with purified HIV envelope proteins.

HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication.
Cicala C, Arthos J, Selig SM, Dennis G Jr, Hosack DA, Van Ryk D, Spangler ML, Steenbeke TD, Khazanie P, Gupta N, Yang J, Daucher M, Lempicki RA, Fauci AS.


Proc Natl Acad Sci U S A 2002 Jul 9;99(14):9380-5


Certain HIV-encoded proteins modify host-cell gene expression in a manner that facilitates viral replication. These activities may contribute to low-level viral replication in nonproliferating cells. Through the use of oligonucleotide microarrays and high-throughput Western blotting we demonstrate that one of these proteins, gp120, induces the expression of cytokines, chemokines, kinases, and transcription factors associated with antigen-specific T cell activation in the absence of cellular proliferation. Examination of transcriptional changes induced by gp120 in freshly isolated peripheral blood mononuclear cells and monocyte-derived-macrophages reveals a broad and complex transcriptional program conducive to productive infection with HIV. Observations include the induction of nuclear factor of activated T cells, components of the RNA polymerase II complex including TFII D, proteins localized to the plasma membrane, including several syntaxins, and members of the Rho protein family, including Cdc 42. These observations provide evidence that envelope-mediated signaling contributes to the productive infection of HIV in suboptimally activated T cells.

 
25. Which DAVID tools to choose? Following table may help you make decisions.
DAVID Tool Map

26. Why DAVID gives empty results after I walk away for a while?   The session timeout of DAVID web was  set to 30 min.  In another words,  if your web browser has no activities with DAVID site for 30 min,  all your web session information will be flood. The only way to resume your work is to re-submit your gene list to DAVID web site.