Introduction

CpGAVAS2018: Chloroplast Genome Analysis Pipeline for Annotation, Visualization and GenBank Submission and Others.

During the past years, the technical advances in 2nd and 3rd DNA Sequencing Technologies (NGS) have resulted in a flux of plastome being sequenced. The CPGAVAS has been widely used for annotation of the plastome sequences. However, to satisfy the needs for extensive analysis of plastom features. We have updated the original CPGAVAS2012 web server to allow a more streamlined analysis workflow to meet highthrough primary analysis needs. Detailed functions will be described below, with those new functions highlighted in red.

The Main Functions of CPGAVAS

1. "AnnotateGenome"- CPGAVAS can take in a completely sequenced genome and return three sets of results: a) the annotation results in GFF3 form at, b) a circular map for the annotation, and c) the basic analysis results of the genome. From here on, the user can download the GFF3 file and edit it using editing software such as Apollo. Furthermore, the following steps/functions have been upgraded or added:

(1) all proteins and CDS from ~3000 plastomes reported in public database has been clustered, aligned and curated to give a better reference proteom.
(2) an ORF finding step is added to the pipeline in order to increase the correct identification of the start and end of the predicted proteins.
(3) the algorithms of merging overlapping "gene island" has been simplized.
(4) identification of Microsatelite or Simple sequence repeats (SSR) has been added to the pipeline.
(5) identification of longer repeats has been added to the pipeline.
(6) identification of Simple Nucleotide Polymorphismss (SNPs) has been added to the pipeline.
(7) identification of RNA-Editing sites has been added to the pipeline.
Please be noted that functions 6 and 7 are only available for locally installed pipeline as large fastq files are needed for the analyses.

2. "AnnotateGene" - A genome might have abnormal features, such as extremely short exons (6 to 9 bases long), trans-splicing genes, and others. The CPGAVAS genome annotation pipeline has not been able to consistently identify thes e features correctly. This page allows users to blast against particular genes in order to facilitate the identification of these features.

3. "ViewAnnotatioinResults"- This module allows the retrieval and examination of the annotation results.

4. "UpdateAnnotatioinResults"- The manually curated gene annotation information in GFF3 format file can be re-analyzed using this function. It will reproduce the circular map and the analysis results.

5. "QuickDraw" - This module has been upgraded to draw a circle map showing SNPs, RNA-editing sites and repeat elements identifed in the plastome.

6. "PrepareDataBaseSubmission" - Following the instructions provided on this page, the user can generate the files for submission of the sequences to GenBank or EMBL.

7. "ExtractSeq" - With the availability of more than 3000 plastomes, phylogenomic analyses can be used to understand taxonomic relationship between the newly obtained plastomes and those having aleady been sequences. We have clustered, aligned and curated the plastid protein sequences and constructed clusters of orthologs. This module allows a user to retrieve the sequences for a list of plastid genes for a list of species .

The Overall Workflow of CPGAVAS

The input for CPGAVAS is a chloroplast DNA sequence and the output include the gene models in GFF3 format, circular map image, analysis results and files for GenBank submission. A workflow is shown below

workflow for CPGAVAS

Last updated: January 2nd, 2019.
For questions and comments, please send email to cliu@implad.ac.cn or cliu6688@yahoo.com.

Center for Bioinformatics
Institute of Medicinal Plant Development
PeKing Union Medical College
Chinese Academy of Medical Sciences
Address: No. 151, Malianwa North Road, Haidian District, Beijing 100093, P.R.China