A Predikin Tutorial
This guide will walk you through the process of using the Predikin website.
Sequences are uploaded in FASTA format. You may either upload a file or paste sequences into the text area on the submission form (note that if you paste sequences into the text area you must also include FASTA headers for each sequence).
The uploaded file should contain both kinases and potential substrates; although, how many of each is up to you. You may submit a file with one kinase and many potential substrates if you'd like to identify the best substrate for your kinase, or you can submit many kinases and just one substrate if you want to find out which kinase is most likely to phosphorylate your protein. Of course, you can also submit any combination between the two extremes.
If your FASTA database is less than 2MB is size, Predikin will return its predictions to you via the web. After a pause (that can last a few minutes for large databases) you will be redirected to a page that displays your results. For an explanation of the information on this page see below. However, if you FASTA database is larger than 2MB your prediction job will be placed in a queue, and your results will be emailed to you when ready. Please note that the time for queued jobs depends on the size of your database and the number of jobs queued before yours. Also, it is important to provide a valid email address, otherwise we will not be able to send you your results and they may be lost.
Predikin Options
The following options are available:
- Confidence:
select the minimum confidence level of sites used to build scoring matrices.
- Substitution matrix:
Select the matrix used to define what constitutes a similar SDR.
- Scoring methods:
If your FASTA database is less than 2MB in size you can request any combination of the three available scoring methods, SDR, KSD or PANTHER. However, if your job is queued only the SDR method will be used. This is because of the considerable time it takes to make predictions using the KSD and PANTHER methods.
- Email address:
If your submission is placed in the Predikin queue, we will need an address to send your results to.
Predikin Web Output
The web page that presents Predikin results in divided into three sections:
- Summary
A summary of the kinases and substrates found in your FASTA database.
- Predictions
If you have JavaScript enabled in your browser, your predictions will be presented in a table that look like this,

The table can be sorted by any column by clicking the arrow at the top of each column. Multiple sort and selection criteria can be used by holding down the Ctrl button while making selections. The columns in the table are
- Kinase: the ID of the kinase (taken from the FASTA header).
- Domain: A protein may contain more than one kinase domain, they are numbered sequentially.
- Substrate: the ID of the substrate (taken from the FASTA header).
- Position: the residue number of the phosphorylated residue in the substrate.
- TM Helix: 1 indicates that the phosphorylated residue is in a TM helix as predicted by TMHMM.
- Disordered: 1 indicates that the phosphorylated residue is in a disordered region as predicted by DisEMBL.
- Method: the scoring method used to make the prediction (one of SDR, KSD or PANTHER).
- Score: the Predikin score for the potentially phosphorylation site.
You can save all your predictions to a text file by following the "Export predictions to text file" link.
- Kinase Details
Details of all the kinases identified in your FASTA database. The most important parts of this section are the frequency and weight matrices. There will be one of each for each scoring method used (SDR, KSD and PANTHER). The weight matrices are the are what Predikin uses to score potential phosphorylation sites. Red cells in a frequency matrix indicate that the total frequency for a row is zero. If this occurs, the matrix is incomplete and Predikin will refuse to build a weight matrix. This means that no predictions will be made for the kinase. To download any of the matrices click on the "Matrix as text file" link.
Predikin Results File Format
If Predikin queues your submission, you will be emailed a link where you can download your result files. The results consist of three files:
- predictions.txt
A tab separated file containing the prediction results. It has the following format
cla4 1 cla4 11 0 1 NKISDND Ser/Thr SDR 62.33 cla4 1 cla4 26 0 1 RPPSSNS Ser/Thr SDR 77.11 cla4 1 cla4 27 0 1 PPSSNSQ Ser/Thr SDR 67.39 cla4 1 cla4 33 0 1 QGRTCYN Ser/Thr SDR 78.29 cla4 1 cla4 38 0 1 YNQTQPI Ser/Thr SDR 66.60 cla4 1 cla4 42 0 0 QPITKLM Ser/Thr SDR 76.88 cla4 1 cla4 46 0 1 KLMSQLD Ser/Thr SDR 66.75 cla4 1 cla4 51 0 1 LDLTSAS Ser/Thr SDR 63.04 cla4 1 cla4 54 0 1 TSASHLG Ser/Thr SDR 74.54 cla4 1 cla4 58 0 1 HLGTSTS Ser/Thr SDR 70.22
where each column has the following format
- Kinase: The ID of the kinase (taken from the FASTA header).
- Domain: A protein may contain more than one kinase domain, they are numbered sequentially.
- Substrate: The ID of the substrate (taken from the FASTA header).
- Position: The residue number of the phosphorylated residue in the substrate.
- TM Helix: 1 indicates that the phosphorylated residue is in a TM helix as predicted by TMHMM.
- Disordered: 1 indicates that the phosphorylated residue is in a disordered region as predicted by DisEMBL.
- Heptapeptide: The sequence surrounding the potential phosphorylation site. The phosphorylated residue is the centre residue.
- Kinase typeThe type of kinase, one of Ser/Thr, CMGC or Tyr.
- Method: The scoring method used to make the prediction (one of SDR, KSD or PANTHER).
- Score: The Predikin score for the potentially phosphorylation site.
- matrices.txt
The frequency and weight matrices for each kinases. It has the following format,
#FMAT cla4 1 SDR 1 0 0 0 0 0 0 0 4 1 0 1 0 1 4 2 0 0 0 1 6 0 1 3 2 3 0 0 3 1 0 1 3 2 11 8 3 2 0 1 4 0 4 3 5 13 2 2 6 9 3 5 3 3 6 14 6 2 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 99 0 0 1 0 0 1 1 2 0 1 3 4 4 1 0 1 0 2 1 0 3 0 0 0 0 0 0 2 0 0 3 1 3 0 0 1 0 0 1 1 2 0 1 3 0 2 4 2 7 3 2 5 6 1 6 4 3 0 13 3 4 0 2 #WMAT cla4 1 SDR 0.09 -0.90 -2.43 -2.74 -2.12 -2.70 -1.60 -2.65 1.71 -0.77 -1.36 0.81 -2.11 0.90 2.01 0.98 -2.08 -2.69 -0.22 0.70 0.90 -1.63 -1.22 -0.22 -0.11 -0.18 -2.33 -3.38 -0.21 -2.18 -2.09 -0.60 0.41 0.29 1.85 1.31 0.44 -0.69 -0.94 -0.72 -0.53 -2.07 -0.42 -1.09 0.18 0.89 -0.44 -1.49 -0.19 -0.30 0.28 0.48 -0.46 -0.07 0.11 1.17 0.46 -1.53 0.94 0.82 -4.26 -2.62 -4.15 -4.46 -3.84 -4.42 -3.32 -4.37 -4.45 -5.11 -3.08 -3.53 -3.83 -3.44 -4.15 3.16 3.25 -4.41 -1.94 -2.43 -2.81 -1.17 -0.36 -0.67 0.80 -2.98 0.47 0.80 1.11 0.45 0.71 -2.09 -0.04 -2.00 0.49 -0.46 -2.35 0.76 -0.49 -2.20 -2.54 -0.90 -2.43 -2.74 1.38 -2.70 -1.60 1.40 -0.10 0.65 -1.36 -1.81 0.52 -1.72 -2.43 0.10 0.55 0.81 -0.22 0.70 -0.45 -1.84 -0.84 -0.28 -0.53 0.50 0.49 -1.06 0.03 -0.39 -0.54 1.18 0.35 0.37 -3.37 1.53 0.01 -0.23 -1.16 -0.34 #END
- kinase_details.txt
Contains details of each kinase identified in the FASTA database. Currently, it only contains the kinase type. It has the following format.
cla4 1 Ser/Thr cla4 2 CMGC cla4 3 Tyr