Overview[edit | edit source]
A dot matrix picture provides a global picture of local similarities between two sequences. They are appropriate:
- for comparing large sequences (several 1000 residues)
- if one does not know in advance whether two sequences share detectable similarity or which parts of the sequences are related to each other.
They are useful for:
- detection of repeats within protein sequences
- detection of shared domains between protein sequences
Exercise[edit | edit source]
We propose the use of this free and public java applet to familiarize yourself with, and try out plotting a dot-matrix:
Once the applet has loaded you should press the "Input" button and insert the protein sequences of your choice. A good database, with a big collection of protein sequences, is SwissProt. But for this exercise we will provide you with three ready pastable protein sequences:
>sp|P06239|LCK_HUMAN (Name=LCK;)Proto-oncogene tyrosine-protein kinase LCK GCGCSSHPEDDWMENIDVCENCHYPIVPLDGKGTLLIRNGSEVRDPLVTYEGSNPPASPLQDNLVIALHSYEPSHDGDLG FEKGEQLRILEQSGEWWKAQSLTTGQEGFIPFNFVAKANSLEPEPWFFKNLSRKDAERQLLAPGNTHGSFLIRESESTAG SFSLSVRDFDQNQGEVVKHYKIRNLDNGGFYISPRITFPGLHELVRHYTNASDGLCTRLSRPCQTQKPQKPWWEDEWEVP RETLKLVERLGAGQFGEVWMGYYNGHTKVAVKSLKQGSMSPDAFLAEANLMKQLQHQRLVRLYAVVTQEPIYIITEYMEN GSLVDFLKTPSGIKLTINKLLDMAAQIAEGMAFIEERNYIHRDLRAANILVSDTLSCKIADFGLARLIEDNEYTAREGAK FPIKWTAPEAINYGTFTIKSDVWSFGILLTEIVTHGRIPYPGMTNPEVIQNLERGYRMVRPDNCPEELYQLMRLCWKERP EDRPTFDYLRSVLEDFFTATEGQYQPQP
>sp|P16333|NCK1_HUMAN (Name=NCK1;..)Cytoplasmic protein NCK1 (NCK adaptor ... MAEEVVVVAKFDYVAQQEQELDIKKNERLWLLDDSKSWWRVRNSMNKTGFVPSNYVERKNSARKASIVKNLKDTLGIGKV KRKPSVPDSASPADDSFVDPGERLYDLNMPAYVKFNYMAEREDELSLIKGTKVIVMEKCSDGWWRGSYNGQVGWFPSNYV TEEGDSPLGDHVGSLSEKLAAVVNNLNTGQVLHVVQALYPFSSSNDEELNFEKGDVMDVIEKPENDPEWWKCRKINGMVG LVPKNYVTVMQNNPLTSGLEPSPPQCDYIRPSLTGKFAGNPWYYGKVTRHQAEMALNERGHEGDFLIRDSESSPNDFSVS LKAQGKNKHFKVQLKETVYCIGQRKFSTMEELVEHYKKAPIFTSEQGEKLYLVKHLS
>sp|P15498|VAV_HUMAN (Name=VAV1;..)Vav proto-oncogene.[Homo sapiens] MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVNLRPQMSQFLCLKNIRTFLS TCCEKFGLKRSELFEAFDLFDVQDFGKVIYTLSALSWTPIAQNRGIMPFPTEEESVGDEDIYSGLSDQIDDTVEEDEDLY DCVENEEAEGDEIYEDLMRSEPVSMPPKMTEYDKRCCCLREIQQTEEKYTDTLGSIQQHFLKPLQRFLKPQDIEIIFINI EDLLRVHTHFLKEMKEALGTPGAANLYQVFIKYKERFLVYGRYCSQVESASKHLDRVAAAREDVQMKLEECSQRANNGRF TLRDLLMVPMQRVLKYHLLLQELVKHTQEAMEKENLRLALDAMRDLAQCVNEVKRDNETLRQITNFQLSIENLDQSLAHY GRPKIDGELKITSVERRSKMDRYAFLLDKALLICKRRGDSYDLKDFVNLHSFQVRDDSSGDRDNKKWSHMFLLIEDQGAQ GYELFFKTRELKKKWMEQFEMAISNIYPENATANGHDFQMFSFEETTSCKACQMLLRGTFYQGYRCHRCRASAHKECLGR VPPCGRHGQDFPGTMKKDKLHRRAQDKKRNELGLPKMEVFQEYYGLPPPPGAIGPFLRLNPGDIVELTKAEAEQNWWEGR NTSTNEIGWFPCNRVKPYVHGPPQDLSVHLWYAGPMERAGAESILANRSDGTFLVRQRVKDAAEFAISIKYNVEVKHIKI MTAEGLYRITEKKAFRGLTELVEFYQQNSLKDCFKSLDTTLQFPFKEPEKRTISRPAVGSTKYFGTAKARYDFCARDRSE LSLKEGDIIKILNKKGQQGWWRGEIYGRVGWFPANYVEEDYSEYC
In the dialog box that appears you should, firstly, insert a brief name in link with the protein to simply remember which protein is which and, secondly, copy-paste one of the protein sequences proposed above, or one found in the SwissProt database. Be careful when copy-pasting to not include the first line describing the protein and start your selection only on the first amino-acid letter.
Now, press the "OK" button; the dialog box should blank permitting you to repeat the process with an other sequence. Indeed, unless you want to compare one sequence with itself, which is not very interesting and only produces recurrence plots, you should have a second sequence loaded too.
Finally press the "Compute" button to actually draw the plot and explore it.
Questions[edit | edit source]
Do any of these proteins seem related to each other? What are the common regions? Does this make sense with the deduced repeat and domain architectures that are annotated in the SwissProt database for these three proteins (links provided)?