Obtaining the sequence from whole genome sequencing is not the end of a genome project, however finding and attaching the structural elements and its related function are the next major steps, which is called “Genome Annotation”. Annotation is the process of adding pertinent information about the raw DNA sequences to the genome or process of attaching biological information to DNA or Protein sequences by describing different regions of the code and identifying which regions can be called genes and thereby its products and functions. This include spotting locations of genes, total number of genes, coding regions, intron-exon structure, start and stop codons, intron lengths, alternative splicing, SNP’s, InDels and untranslated regions (UTRs) as well as and determining what those genes do along with the gene product and functional information. Once a genome is sequenced, it needs to be annotated to make sense of it. It consists of two main steps:
1. Identifying elements on the genome- gene structure prediction
2. Attaching biological information to these elements- gene function prediction
There are various parts within the gene with different functions, some may code for protein, others may contain regulatory information, some may form introns and will not be translated and their function is still unclear. The diagram shown below represents fragment of DNA, with single hypothetical gene. Each region has to be annotated from DNA sequences based on similarity searches or literature reviews.
Obviously computer programs are essential to this process; however, human brains are often required to evaluate computer-generated gene models. Several Automatic annotation tools are available that are highly accurate. Annotation tools can perform all this by computer analysis, as opposed to manual annotation which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
Structural annotation consists of the identification of genomic elements.
- Open reading frame and their localisation
- gene structure
- coding regions
- location of regulatory motifs
Functional annotation consists of attaching biological information to genomic elements.
- biochemical function
- biological function
- involved regulation and interactions
- expression
These steps may involve both biological experiments and in silico analysis. A variety of software tools have been developed to permit scientists to view and share genome annotations. The basic level of annotation uses BLAST for finding similarities, and then annotating genomes based on that. However, much additional information is available to annotation platform nowadays. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases such as Ensembl rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline. Manuals provided here will teach users the basics of gene annotation and provides access to Phytophthora genome which can be annotated. Manuals will give you information regarding computational annotation and annotation by literature mining. a List of manuals that explain more about genome annotation is available in Genome annotation tab. Concepts of genome annotation can be found in the provided tutorials. Opportunities to practice using the Annotation Tool are also provided. External links to other instructional websites are also provided as additional resources.