Predikin Prediction Server

A Guide to Substrate Determining Residues

This page is a guide to substrate-determining residues, or SDRs: what they are, how Predikin determines them and how they are used in substrate prediction.

What is a Substrate Determining Residues?

A Substrate Determining Residue (SDR) is a conserved amino acid residue, located in the catalytic domain of a serine/threonine protein kinase, which determines whether a protein is a likely substrate for the kinase.

When a kinase binds to a substrate, the substrate amino acid residues at positions -3 to +3 relative to the phosphorylated residue make contact with SDRs in a binding pocket at the surface of the kinase. The nature of the SDRs determines which residues are most likely to be found around the phosphorylation site - i.e. which residues "fit" best in the binding pocket. The binding pocket therefore makes a major contribution to the specificity of the kinase for different substrates.

OK, so which residues in the kinase are SDRs?

By examining X-ray structures of serine/threonine kinases bound to substrate heptapeptides, we have determined which amino acid residues influence each of the -3 to +3 substrate positions. The results are slightly different for the so-called CMGC kinases. CMGC kinases include cyclin-dependent, MAP, glycogen synthase 3 and CK2-related kinases. They phosphorylate serine/threonine residues and show a strong preference for proline at the substrate +1 position.

SDRs are named by their position relative to one of six semi-conserved motifs found in the kinase catalytic domain. From the N-terminus these motifs are GXG, AMK, GEL, PEN, DFG and APE. Hence the SDR "GEL+4" is the residue 4 positions C-terminal to the "G" in the "GEL" motif.

The 3 images below are taken from Figure 2 of our article Substrate specificity of protein kinases and computational prediction of substrates. They illustrate how a substrate binds to protein kinase A and the role of SDRs at each position in the substrate.

Substrate binding in protein kinase A. (A) Schematic representation of the binding sites of the substrate side-chains, with the specificity-determining residues (SDRs) listed in each subsite. The subsites are coloured: S-3, red; S-2, yellow; S-1, green; S0, orange-red; S+1, dark blue; S+2, magenta; S+3, light blue. The same colour scheme for the subsites is used in (B) and (C). (B) Interactions of the heptapeptide region of the substrate (grey; sequence RRASIHD) with the SDRs, coloured according to the subsite. (C) surface representation highlighting the individual subsites, coloured as in (A) and a heptapeptide region of the substrate (black; sequence RRASHID).

How does Predikin find the SDRs?

Predikin uses a program called hmmsearch, part of the HMMER package, to align an input sequence to a HMM profile model of the kinase catalytic domain. The position of the SDRs can then be determined from the alignment. Shown below is a sample alignment of a serine/threonine kinase with the kinase HMM. The 6 semi-conserved motifs are highlighted red, the corresponding derived SDRs in green:

                   *->YellkklGkGaFGkVylardkktgrlvAiKvik.......erilrEi
                      Y+l +++G+G++G+Vy+a++k+t+++vAiK++  +++++ ++i+ Ei
  gi|6319328    25    YHLKQVIGRGSYGVVYKAINKHTDQVVAIKEVVyendeelNDIMAEI 71   

                   kiLkk.dHPNIVkLydvfed.dklylVmEyceGdlGdLfdllkkrgrrgl
                   ++Lk+ +H NIVk++++++ + +ly+ +Eyc +  G+L++l+ +  ++  
  gi|6319328    72 SLLKNlNHNNIVKYHGFIRKsYELYILLEYCAN--GSLRRLISRSSTG-- 117  

                   rkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvKlaD

                      lsE+e + y+ Q+l++L+YLH  g+iHRD+K +NiLL+ +++vKlaD
  gi|6319328   118 ---LSEnESKTYVTQTLLGLKYLHGEGVIHRDIKAANILLSAdnTVKLAD 164  

                   FGlArql....ttfvGTpeYmAPEvl...gYgkpavDiWSlGcilyEllt
                   FG++++++++  t+ GT+++mAPE+l+++g ++  +DiWSlG ++ E+lt
  gi|6319328   165 FGVSTIVnssaLTLAGTLNWMAPEILgnrGAST-LSDIWSLGATVVEMLT 213  

                   GkpPFp..qldlifkkig..........SpeakdLikklLvkdPekRlta
                     pP+++ +++ i+  + +++++++++ S+++kd+++k++vk+  kR+ta
  gi|6319328   214 KNPPYHnlTDANIYYAVEndtyyppssfSEPLKDFLSKCFVKNMYKRPTA 263  

                   .eaLedeldikaHPff<-*
                   +++L+       H ++   
  gi|6319328   264 dQLLK-------HVWI    272

Summary of SDRs

SDRs for serine/threonine and CMGC kinases are summarised below:

Substrate positionSDRs
 Ser/ThrCMGC
-3GEL+1, GEL+3, GEL+4GEL+1, GEL+3, PEN+1
-2APE-2, APE-3, APE-5APE-2, APE-3, APE-5
-1GXG+2, GXG+3GXG+2, GXG+3
+1APE-1, APE-4, DFG+3APE-1, APE-4, DFG+3
+2*KEloop 12-17: AMK+10, AMK+11, AMK+12
KEloop 18-20: AMK+12, AMK+13, AMK+14
KEloop < 12 or > 20: E-5, E-6, E-7
GXG+3, GXG+4
+3APE-8, APE-9APE-9, APE-10

* The SDRs that influence the +2 position in substrates of serine/threonine kinases are dependent on a variable termed the KE loop length. This is the length of the region between the conserved AMK motif and a conserved Glu residue C-terminal of AMK. The SDRs for +2 are located based on KE loop lengths of 12-17, 18-20, "short" (< 12) or "long" (> 20).

How does Predikin use SDRs to predict substrates?

Basically, having determined the SDRs in your sequence, it searches a database, finds kinases with SDRs that are similar, retrieves substrates for those kinases and builds a scoring matrix. Please see the "Predikin: how it works" FAQ entry.

What about tyrosine kinases?

We have not determined SDRs for tyrosine kinases for several reasons. First, it is not clear that tyrosine kinase substrates exhibit the same level of conservation at the -3 to +3 positions as is observed for serine/threonine kinases. This may be because the large size of the Tyr residue results in a less-specific binding pocket. Second, there are very few available structures of tyrosine kinases bound to heptapeptide substrates.

Predikin provides two methods for generating substrate frequency matrices based on kinase family, the KSD and PANTHER methods, that are applicable to tyrosine kinases.

© 2009-2011 University of Queensland