Predikin Prediction Server

A Predikin Tutorial

This guide will walk you through the process of using the Predikin website.

Sequences are uploaded in FASTA format. You may either upload a file or paste sequences into the text area on the submission form (note that if you paste sequences into the text area you must also include FASTA headers for each sequence).

The uploaded file should contain both kinases and potential substrates; although, how many of each is up to you. You may submit a file with one kinase and many potential substrates if you'd like to identify the best substrate for your kinase, or you can submit many kinases and just one substrate if you want to find out which kinase is most likely to phosphorylate your protein. Of course, you can also submit any combination between the two extremes.

If your FASTA database contains fewer than 50 sequences and is less than 2MB in size, Predikin will return its predictions to you via the web. After a pause (that can last a few minutes for large databases) you will be redirected to a page that displays your results. For an explanation of the information on this page see below. However, if you FASTA database contains more than 50 sequences or is larger than 2MB your prediction job will be placed in a queue, and your results will be emailed to you when ready. Please note that the time for queued jobs depends on the size of your database and the number of jobs queued before yours. Also, it is important to provide a valid email address, otherwise we will not be able to send you your results and they may be lost.

Predikin Options

The following options are available:

Predikin Web Output

The web page that presents Predikin results in divided into three sections:

  1. Summary

    A summary of the kinases and substrates found in your FASTA database.

  2. Predictions

    If you have JavaScript enabled in your browser, your predictions will be presented in a table that look like this,

    The table can be sorted by any column by clicking the arrow at the top of each column. Multiple sort and selection criteria can be used by holding down the Ctrl button while making selections. The columns in the table are

    • Kinase: the ID of the kinase (taken from the FASTA header).
    • Domain: A protein may contain more than one kinase domain, they are numbered sequentially.
    • Substrate: the ID of the substrate (taken from the FASTA header).
    • Position: the residue number of the phosphorylated residue in the substrate.
    • TM Helix: 1 indicates that the phosphorylated residue is in a TM helix as predicted by TMHMM.
    • Disordered: 1 indicates that the phosphorylated residue is in a disordered region as predicted by DisEMBL.
    • Method: the scoring method used to make the prediction (one of SDR, KSD or PANTHER).
    • Score: the Predikin score for the potentially phosphorylation site.

    You can save all your predictions to a text file by following the "Export predictions to text file" link.

  3. Kinase Details

    Details of all the kinases identified in your FASTA database. The most important parts of this section are the frequency and weight matrices. There will be one of each for each scoring method used (SDR, KSD and PANTHER). The weight matrices are the are what Predikin uses to score potential phosphorylation sites. Red cells in a frequency matrix indicate that the total frequency for a row is zero. If this occurs, the matrix is incomplete and Predikin will refuse to build a weight matrix. This means that no predictions will be made for the kinase. To download any of the matrices click on the "Matrix as text file" link.

Predikin Results File Format

If Predikin queues your submission, you will be emailed a link where you can download your result files. The results consist of three files:

  1. predictions.txt

    A tab separated file containing the prediction results. It has the following format

    cla4     1      cla4      11    0       1       NKISDND Ser/Thr SDR     62.33
    cla4     1      cla4      26    0       1       RPPSSNS Ser/Thr SDR     77.11
    cla4     1      cla4      27    0       1       PPSSNSQ Ser/Thr SDR     67.39
    cla4     1      cla4      33    0       1       QGRTCYN Ser/Thr SDR     78.29
    cla4     1      cla4      38    0       1       YNQTQPI Ser/Thr SDR     66.60
    cla4     1      cla4      42    0       0       QPITKLM Ser/Thr SDR     76.88
    cla4     1      cla4      46    0       1       KLMSQLD Ser/Thr SDR     66.75
    cla4     1      cla4      51    0       1       LDLTSAS Ser/Thr SDR     63.04
    cla4     1      cla4      54    0       1       TSASHLG Ser/Thr SDR     74.54
    cla4     1      cla4      58    0       1       HLGTSTS Ser/Thr SDR     70.22
    

    where each column has the following format

    • Kinase: The ID of the kinase (taken from the FASTA header).
    • Domain: A protein may contain more than one kinase domain, they are numbered sequentially.
    • Substrate: The ID of the substrate (taken from the FASTA header).
    • Position: The residue number of the phosphorylated residue in the substrate.
    • TM Helix: 1 indicates that the phosphorylated residue is in a TM helix as predicted by TMHMM.
    • Disordered: 1 indicates that the phosphorylated residue is in a disordered region as predicted by DisEMBL.
    • Heptapeptide: The sequence surrounding the potential phosphorylation site. The phosphorylated residue is the centre residue.
    • Kinase typeThe type of kinase, one of Ser/Thr, CMGC or Tyr.
    • Method: The scoring method used to make the prediction (one of SDR, KSD or PANTHER).
    • Score: The Predikin score for the potentially phosphorylation site.
  2. matrices.txt

    The frequency and weight matrices for each kinases. It has the following format,

    #FMAT cla4 1 SDR
      1   0   0   0   0   0   0   0   4   1   0   1   0   1   4   2   0   0   0   1
      6   0   1   3   2   3   0   0   3   1   0   1   3   2  11   8   3   2   0   1
      4   0   4   3   5  13   2   2   6   9   3   5   3   3   6  14   6   2   2   7
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 127  99   0   0   1
      0   0   1   1   2   0   1   3   4   4   1   0   1   0   2   1   0   3   0   0
      0   0   0   0   2   0   0   3   1   3   0   0   1   0   0   1   1   2   0   1
      3   0   2   4   2   7   3   2   5   6   1   6   4   3   0  13   3   4   0   2
    #WMAT cla4 1 SDR
     0.09 -0.90 -2.43 -2.74 -2.12 -2.70 -1.60 -2.65  1.71 -0.77 -1.36  0.81 -2.11  0.90  2.01  0.98 -2.08 -2.69 -0.22  0.70
     0.90 -1.63 -1.22 -0.22 -0.11 -0.18 -2.33 -3.38 -0.21 -2.18 -2.09 -0.60  0.41  0.29  1.85  1.31  0.44 -0.69 -0.94 -0.72
    -0.53 -2.07 -0.42 -1.09  0.18  0.89 -0.44 -1.49 -0.19 -0.30  0.28  0.48 -0.46 -0.07  0.11  1.17  0.46 -1.53  0.94  0.82
    -4.26 -2.62 -4.15 -4.46 -3.84 -4.42 -3.32 -4.37 -4.45 -5.11 -3.08 -3.53 -3.83 -3.44 -4.15  3.16  3.25 -4.41 -1.94 -2.43
    -2.81 -1.17 -0.36 -0.67  0.80 -2.98  0.47  0.80  1.11  0.45  0.71 -2.09 -0.04 -2.00  0.49 -0.46 -2.35  0.76 -0.49 -2.20
    -2.54 -0.90 -2.43 -2.74  1.38 -2.70 -1.60  1.40 -0.10  0.65 -1.36 -1.81  0.52 -1.72 -2.43  0.10  0.55  0.81 -0.22  0.70
    -0.45 -1.84 -0.84 -0.28 -0.53  0.50  0.49 -1.06  0.03 -0.39 -0.54  1.18  0.35  0.37 -3.37  1.53  0.01 -0.23 -1.16 -0.34
    #END
    
  3. kinase_details.txt

    Contains details of each kinase identified in the FASTA database. Currently, it only contains the kinase type. It has the following format.

    cla4 1 Ser/Thr
    cla4 2 CMGC
    cla4 3 Tyr
    
© 2009-2011 University of Queensland