Thesis

Genome-wild Analysis of Alternative Splicing across Marine Algae

Alternative splicing allows for the generation of different protein isoforms such that during gene expression a single gene sequence is translated into multiple protein sequence. It greatly increases the biodiversity of proteins encoded by the genome, and plays an important role in the normal function of living organisms. Common types of alternative splicing events include intron retention, exon skipping, alternative donor and alternative acceptor. Isoforms differ depending upon the different combinations of sequences that arise because of these alternative splicing events.
 The main purpose of our research was to analyze alternative splicing events across different marine algae. Marine algae were chosen because they are ecologically important and have a long history of being used as a resource for food, medicine, and industrial applications. Little is known about alternative splicing, however, in marine algae. In this research, we applied bioinformatics tools such as Tophat2, Cufflinks, AStalavista, and BLAST to analyze the genome sequences and annotations, and short reads of RNA-Seq data. These data were downloaded from public database such as JGI and NCBI. We also wrote Python program to gather various statistics of alternative splicing events, for example, the number of each type of alternative splicing events indicated that intron retention was the most common type of alternative splicing events. Among Emiliania huxleyi ( Ehux), Chlamydomonas reinhardtii (Chlamy), Guillardia theta (Guith), Bigelowiella natans (Bigna) and Aureococcus anophagefferens (Auran), the number of alternative splicing events range from about two thousand to about thirty thousand. Across all five species, most of the predicted alternatively spliced genes had just two isoforms, indicating only had a single alternative splicing event occurred. Surprisingly, nearly half of the alternatively spliced genes were without gene ontology (GO) annotations, indicating many may be novel genes. The GO annotations were also analyzed to determine the degree alternatively spliced genes are conserved across species. Of the 1684 alternatively spliced genes with unique molecular function GO terms, 442 terms appeared only in one species, 300 terms were shared by all five species, and more than 182 terms shared across four species.

Alternative splicing allows for the generation of different protein isoforms such that during gene expression a single gene sequence is translated into multiple protein sequence. It greatly increases the biodiversity of proteins encoded by the genome, and plays an important role in the normal function of living organisms. Common types of alternative splicing events include intron retention, exon skipping, alternative donor and alternative acceptor. Isoforms differ depending upon the different combinations of sequences that arise because of these alternative splicing events. The main purpose of our research was to analyze alternative splicing events across different marine algae. Marine algae were chosen because they are ecologically important and have a long history of being used as a resource for food, medicine, and industrial applications. Little is known about alternative splicing, however, in marine algae. In this research, we applied bioinformatics tools such as Tophat2, Cufflinks, AStalavista, and BLAST to analyze the genome sequences and annotations, and short reads of RNA-Seq data. These data were downloaded from public database such as JGI and NCBI. We also wrote Python program to gather various statistics of alternative splicing events, for example, the number of each type of alternative splicing events indicated that intron retention was the most common type of alternative splicing events. Among Emiliania huxleyi ( Ehux), Chlamydomonas reinhardtii (Chlamy), Guillardia theta (Guith), Bigelowiella natans (Bigna) and Aureococcus anophagefferens (Auran), the number of alternative splicing events range from about two thousand to about thirty thousand. Across all five species, most of the predicted alternatively spliced genes had just two isoforms, indicating only had a single alternative splicing event occurred. Surprisingly, nearly half of the alternatively spliced genes were without gene ontology (GO) annotations, indicating many may be novel genes. The GO annotations were also analyzed to determine the degree alternatively spliced genes are conserved across species. Of the 1684 alternatively spliced genes with unique molecular function GO terms, 442 terms appeared only in one species, 300 terms were shared by all five species, and more than 182 terms shared across four species.

Relationships

Items