Promoter Strength Predictor

View the Dataset In Use and how the Application Works.

add

So How Does Our Application Work ?

This website serves as a Python based machine learning platform to predict the strength of σ70 core promoters in Escherichia coli in a manner that subverts the need for tedious experiments and is cost effective.
The biopython routines were used to construct the PSSM (Biopython)

Here the user will have to enter the -10 and -35 regions in the input boxes and the platform will return the strength of the input promoter relative to the strength of the strongest Anderson promoter. The dynamic model is optinal and the user can click "Predict" once the Sequences are entered.

The advent of genetic engineering over the last few decades has opened up new avenues for biologists. One of them has been expressing proteins in organisms of their choice and altering the level of protein expression. Escherichia coli has been the lab workhorse and ideal model organism for many years, given the ease with which it can be grown in a lab, its extensive characterization of a variety of strains and high level of safety. The sigma 70 promoters in the organism are ubiquitously used by genetic engineers to initiate transcription. Yet, their characterization in a lab remains to be a time consuming and expensive process.

This website serves as a Python based machine learning platform to predict the strength of sigma seventy core promoters in Escherichia coli in a manner that subverts the need for tedious experiments and is cost effective.

Here multi-variate linear regression has been used where the parameters were optimized with gradient descent. The training data set used here is the Anderson promoter collection developed and characterized by the Anderson lab at UC Berkeley. This particular set was chosen given that this is highly used by members of the academia and teams that participate in the iGEM(international Genetically Engineered Machines Competition). The corollary is that given the highly characterized nature of data, these collection of promoters are more robust, hence training the linear model with these promoters will lead to a model whose predictions can be expected to be robust for other promoters that have not been as extensively characterized as the Anderson promoters.

The input variables here are the -10 and -35 motifs present in the promoters. A Position Specific Scoring Matrix(PSSM) is constructed to capture a generative model of the -10 and -35 promoter regions. These scores were then regressed using gradient descent to minimize a cost function against a set of relative promoter strengths provided by the Anderson lab. The cost function was minimized when gradient descent ran 10000 times at a learning rate of 0.015. As an addendum, Leave One Out Cross Validation(LOOCV) was performed on the model and an optimized R2(correlation co-efficient) was calculated to be 0.70,indicating the goodness of fit.

What Is The Dataset Being Used ?

This is the Dataset of -35 and -10 Sequences along with their repective strengths which is being used to make the Predictions.

Promoter ID -35 Sequence -10 Sequence PSSM -35 PSSM -10 Actual Strength ln(Actual Strength)
BBa_J23100 TTGACG TACAGT 8.80 7.05 1.0 0.0
BBa_J23101 TTTACA TATTAT 8.50 8.65 0.7 -0.356674943939
BBa_J23102 TTGACA TACTGT 8.94 7.79 0.86 -0.150822889735
BBa_J23103 CTGATA GATTAT 5.76 7.60 0.01 -4.60517018599
BBa_J23104 TTGACA TATTGT 8.94 8.22 0.72 -0.328504066972
BBa_J23105 TTTACG TACTAT 8.36 8.22 0.24 -1.42711635564
BBa_J23106 TTTACG TATAGT 8.36 7.48 0.47 -0.755022584278
BBa_J23107 TTTACG TATTAT 8.36 8.05 0.36 -1.02165124753
BBa_J23108 CTGACA TATAAT 7.16 7.91 0.51 -0.673344553264
BBa_J23109 TTTACA GACTGT 8.50 6.73 0.04 -3.21887582487
BBa_J23110 TTTACG TACAAT 8.36 7.48 0.33 -1.10866262452
BBa_J23111 TTGACG TATAGT 8.80 7.48 0.58 -0.544727175442
BBa_J23112 CTGATA GATTAT 5.76 7.60 0.0 -4.60517018599
BBa_J23113 CTGATG GATTAT 5.76 7.60 0.01 -4.60517018599
BBa_J23114 TTTATG TACAAT 6.96 7.48 0.1 -2.30258509299
BBa_J23115 TTTATA TACAAT 7.10 7.48 0.15 -1.89711998489
BBa_J23116 TTGACA GACTAT 8.94 7.17 0.16 -1.83258146375
BBa_J23117 TTGACA GATTGT 8.94 7.17 0.06 -2.81341071676
BBa_J23118 TTGACG TATTGT 8.80 8.22 0.56 -0.579818495253

Who We Are ?

Ashok Palaniappan

Ashok Palaniappan is very interested in applying computational thinking to solve difficult biological problems. He has demonstrated expertise in the development of computational methods, notably the use of Fourier spectrum analysis to detect periodicity in evolutionary conservation of protein secondary structure.

He has shown ability to develop novel approaches for difficult biological problems, notably the identification of stage-specific biomarkers in colon cancer tumorigenesis, progression and metastasis. Using the differential ligand affinity and free energy of binding, he and co-worker were able to computationally analyze the role of P-glycoprotein polymorphisms in patient resistance to therapy.

Ashok obtained his PhD from the University of Illinois at Urbana-Champaign, USA (2005). He is Senior Assistant Professor in the School of Chemical and Biotechnology, Sastra University, Thanjavur 613401.  1. A. Palaniappan, E. Jakobsson,  Fourier analysis of conservation patterns in protein secondary structure, Computat Struct Biotechnol J 2017, 15, 265-271. 2. A. Palaniappan, K. Ramar, S. Ramalingam, Computational identification of novel stage specific biomarkers in colorectal cancer progression. PLOS ONE 2016, 11(5): e0156665. doi:10.1371/journal.pone.0156665  3. S. Varghese, A. Palaniappan, Computational studies of P-glycoprotein polymorphisms in antiepileptic drug resistance mechanism, bioRxiv 2016, 095059; doi: https://doi.org/10.1101/095059.

Ramit B

Ramit is an undergraduate student in his final year studying Biotechnology Engineering at Sri Venkateswara College of Engineering, Anna University. He is looking to carve out a career in synthetic biology given the exciting possibilities that the field has. He was an integral part of his colleges iGEM team in 2016. He is the team of leader of his college’s 2017 iGEM team.

After learning about the exciting prospects of Machine learning algorithms, he intends to apply them to biological data sets to make meaningful inferences from them or build tools that can save time and money in the lab for Biologists. He believes that a true product of engineering is often one that is born out of a mixture of different fields and that combining the prowess of two burgeoning fields of the 21st century – Synthetic biology and Machine Learning could lead to exciting products and services in the future.

Keshav Aditya RP

Keshav Aditya is currently a final year undergraduate student studying Computer Science Engineering at Sri Venkateswara College of Engineering, Anna University. He is a part of his colleges 2017 iGEM team. He is very passionate about programming and is looking to become a software developer. Apart from this he also an excellent sportman.

He is a Full-Stack Web Developer and Platform Independent Mobile Application Developer. He has also deployed Machine Learning algorithms for various problems. He's starting to explore new and exciting avenues such as deep learning and looks to implement and work with sophisticated learning algorithms in the future.

Contact Us


Ashok : +91 8056037107
Ramit B : +91 9940149332
Keshav Aditya R.P :+91 7299926896


Ashok: aplnppn@gmail.com
Ramit B: ramitb@rocketmail.com
Keshav Aditya R.P: keshavaditya26896@gmail.com