A Guide to Substrate Determining Residues
This page is a guide to substrate-determining residues, or SDRs: what they are, how Predikin determines them and how they are used in substrate prediction.
What is a Substrate Determining Residues?
A Substrate Determining Residue (SDR) is a conserved amino acid residue, located in the catalytic domain of a serine/threonine protein kinase, which determines whether a protein is a likely substrate for the kinase.
When a kinase binds to a substrate, the substrate amino acid residues at positions -3 to +3 relative to the phosphorylated residue make contact with SDRs in a binding pocket at the surface of the kinase. The nature of the SDRs determines which residues are most likely to be found around the phosphorylation site - i.e. which residues "fit" best in the binding pocket. The binding pocket therefore makes a major contribution to the specificity of the kinase for different substrates.
OK, so which residues in the kinase are SDRs?
By examining X-ray structures of serine/threonine kinases bound to substrate heptapeptides, we have determined which amino acid residues influence each of the -3 to +3 substrate positions. The results are slightly different for the so-called CMGC kinases. CMGC kinases include cyclin-dependent, MAP, glycogen synthase 3 and CK2-related kinases. They phosphorylate serine/threonine residues and show a strong preference for proline at the substrate +1 position.
SDRs are named by their position relative to one of six semi-conserved motifs found in the kinase catalytic domain. From the N-terminus these motifs are GXG, AMK, GEL, PEN, DFG and APE. Hence the SDR "GEL+4" is the residue 4 positions C-terminal to the "G" in the "GEL" motif.
The 3 images below are taken from Figure 2 of our article Substrate specificity of protein kinases and computational prediction of substrates. They illustrate how a substrate binds to protein kinase A and the role of SDRs at each position in the substrate.



Substrate binding in protein kinase A. (A) Schematic representation of the binding sites of the substrate side-chains, with the specificity-determining residues (SDRs) listed in each subsite. The subsites are coloured: S-3, red; S-2, yellow; S-1, green; S0, orange-red; S+1, dark blue; S+2, magenta; S+3, light blue. The same colour scheme for the subsites is used in (B) and (C). (B) Interactions of the heptapeptide region of the substrate (grey; sequence RRASIHD) with the SDRs, coloured according to the subsite. (C) surface representation highlighting the individual subsites, coloured as in (A) and a heptapeptide region of the substrate (black; sequence RRASHID).
How does Predikin find the SDRs?
Predikin uses a program called hmmsearch, part of the HMMER package, to align an input sequence to a HMM profile model of the kinase catalytic domain. The position of the SDRs can then be determined from the alignment. Shown below is a sample alignment of a serine/threonine kinase with the kinase HMM. The 6 semi-conserved motifs are highlighted red, the corresponding derived SDRs in green:
*->YellkklGkGaFGkVylardkktgrlvAiKvik.......erilrEi
Y+l +++G+G++G+Vy+a++k+t+++vAiK++ +++++ ++i+ Ei
gi|6319328 25 YHLKQVIGRGSYGVVYKAINKHTDQVVAIKEVVyendeelNDIMAEI 71
kiLkk.dHPNIVkLydvfed.dklylVmEyceGdlGdLfdllkkrgrrgl
++Lk+ +H NIVk++++++ + +ly+ +Eyc + G+L++l+ + ++
gi|6319328 72 SLLKNlNHNNIVKYHGFIRKsYELYILLEYCAN--GSLRRLISRSSTG-- 117
rkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvKlaD
lsE+e + y+ Q+l++L+YLH g+iHRD+K +NiLL+ +++vKlaD
gi|6319328 118 ---LSEnESKTYVTQTLLGLKYLHGEGVIHRDIKAANILLSAdnTVKLAD 164
FGlArql....ttfvGTpeYmAPEvl...gYgkpavDiWSlGcilyEllt
FG++++++++ t+ GT+++mAPE+l+++g ++ +DiWSlG ++ E+lt
gi|6319328 165 FGVSTIVnssaLTLAGTLNWMAPEILgnrGAST-LSDIWSLGATVVEMLT 213
GkpPFp..qldlifkkig..........SpeakdLikklLvkdPekRlta
pP+++ +++ i+ + +++++++++ S+++kd+++k++vk+ kR+ta
gi|6319328 214 KNPPYHnlTDANIYYAVEndtyyppssfSEPLKDFLSKCFVKNMYKRPTA 263
.eaLedeldikaHPff<-*
+++L+ H ++
gi|6319328 264 dQLLK-------HVWI 272
Summary of SDRs
SDRs for serine/threonine and CMGC kinases are summarised below:
| Substrate position | SDRs | |
|---|---|---|
| Ser/Thr | CMGC | |
| -3 | GEL+1, GEL+3, GEL+4 | GEL+1, GEL+3, PEN+1 |
| -2 | APE-2, APE-3, APE-5 | APE-2, APE-3, APE-5 |
| -1 | GXG+2, GXG+3 | GXG+2, GXG+3 |
| +1 | APE-1, APE-4, DFG+3 | APE-1, APE-4, DFG+3 |
| +2* | KEloop 12-17: AMK+10, AMK+11,
AMK+12 KEloop 18-20: AMK+12, AMK+13, AMK+14 KEloop < 12 or > 20: E-5, E-6, E-7 | GXG+3, GXG+4 |
| +3 | APE-8, APE-9 | APE-9, APE-10 |
* The SDRs that influence the +2 position in substrates of serine/threonine kinases are dependent on a variable termed the KE loop length. This is the length of the region between the conserved AMK motif and a conserved Glu residue C-terminal of AMK. The SDRs for +2 are located based on KE loop lengths of 12-17, 18-20, "short" (< 12) or "long" (> 20).
How does Predikin use SDRs to predict substrates?
Basically, having determined the SDRs in your sequence, it searches a database, finds kinases with SDRs that are similar, retrieves substrates for those kinases and builds a scoring matrix. Please see the "Predikin: how it works" FAQ entry.
What about tyrosine kinases?
We have not determined SDRs for tyrosine kinases for several reasons. First, it is not clear that tyrosine kinase substrates exhibit the same level of conservation at the -3 to +3 positions as is observed for serine/threonine kinases. This may be because the large size of the Tyr residue results in a less-specific binding pocket. Second, there are very few available structures of tyrosine kinases bound to heptapeptide substrates.
Predikin provides two methods for generating substrate frequency matrices based on kinase family, the KSD and PANTHER methods, that are applicable to tyrosine kinases.