Thesis

Developing a prediction model for the Emiliania huxleyi promoters

As more genomes are being annotated it is important to accurately identify
 these regions. Vast amounts of biological data are being produced, so Promoter
 Prediction Programs (PPP) are required to automate the process of promoter
 identification. Computational methods of promoter prediction will greatly improve
 a genome's annotation. This study focuses on promoter prediction in the
 Emiliania Huxleyi genome.
 Emiliania Huxleyi (E. Huxelyi), the most abundant coccolithophorid, has
 served as a model system for environmental studies pertaining to global carbon cycling,
 oceanic sulfur bio-transformations, and paleoclimatology [20] [25]. Previous
 studies on other model organisms have shown that core promoter regions contain
 distinctive characteristics for individual organisms, while also sharing some common
 cis-regulatory motifs[8]. Within an organism these regulatory motifs are not
 strictly preserved for every promoter region[12].
 Since previous analysis of E. Huxelyi, both an annotated genome has been
 released and many algorithms have been applied to promoter prediction with
 improved results. We utilized chemical and physical properties and draw from
 new prediction techniques to develop a method of promoter prediction. We use
 the method described by [4] to compare our results against state of the art PPPs.
 The results of the research include confirming previous results, notably that E.
 Huxelyi does not share any common motifs with Arabidopsis and Drosophilia and
 that novel transcriptional control sequences may exist. This research also resulted
 in a new PPP. During cross validation this PPP performed very well. However
 when benchmarked[4], the PPP produced a high number offalse positives. Future
 work would include comparing more organisms against E. Huxelyi to find shared
 motifs as well as refining the PPP to reduce false positives.
 Keywords: Emiliania Huxleyi, Bioinformatics, Computer Science, Promoter Prediction,
 Machine Learning

As more genomes are being annotated it is important to accurately identify these regions. Vast amounts of biological data are being produced, so Promoter Prediction Programs (PPP) are required to automate the process of promoter identification. Computational methods of promoter prediction will greatly improve a genome's annotation. This study focuses on promoter prediction in the Emiliania Huxleyi genome. Emiliania Huxleyi (E. Huxelyi), the most abundant coccolithophorid, has served as a model system for environmental studies pertaining to global carbon cycling, oceanic sulfur bio-transformations, and paleoclimatology [20] [25]. Previous studies on other model organisms have shown that core promoter regions contain distinctive characteristics for individual organisms, while also sharing some common cis-regulatory motifs[8]. Within an organism these regulatory motifs are not strictly preserved for every promoter region[12]. Since previous analysis of E. Huxelyi, both an annotated genome has been released and many algorithms have been applied to promoter prediction with improved results. We utilized chemical and physical properties and draw from new prediction techniques to develop a method of promoter prediction. We use the method described by [4] to compare our results against state of the art PPPs. The results of the research include confirming previous results, notably that E. Huxelyi does not share any common motifs with Arabidopsis and Drosophilia and that novel transcriptional control sequences may exist. This research also resulted in a new PPP. During cross validation this PPP performed very well. However when benchmarked[4], the PPP produced a high number offalse positives. Future work would include comparing more organisms against E. Huxelyi to find shared motifs as well as refining the PPP to reduce false positives. Keywords: Emiliania Huxleyi, Bioinformatics, Computer Science, Promoter Prediction, Machine Learning

Relationships

Items