Detecting Promoter Sequences using Deep Learning

Promoter sequences are the main regulatory elements of gene expression. The accurate prediction of promoters remains a challenge because the key DNA regulatory regions have variable structures, but their recognition by computer algorithms is fundamental for understanding gene expression patterns, cell specificity, and development. In this study, we utilize deep learning modules such as Convolutional Neural Networks (CNN) and Long Short Time Memory (LSTM) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. In this study, we apply advanced approaches to identify promoters on four different organisms: human, two types of bacteria (Escherichia coli and Bacillus subtilis) and plant (Arabidopsis) sequences. The proposed model contains diverse types of hyper-parameters, which are selected using validation accuracies. Based on the values of the hyper-parameters, we generate multiple models with distinct sets of parameters. The promoter test datasets are evaluated on all the best models generated, and the evaluation statistics such as sensitivity, specificity and correlation coefficients are calculated . The developed models demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieved higher accuracy compared to the previously developed promoter prediction programs.