Assignment 2

Research and obtain a protein sequence then blast the sequence against the NCBI database to get a list of proteins from other organisms. Next, pare the list down to 10 species, compile a single FASTA file, perform an alignment, establish the genetic distances, then produce a tree.

I selected transketolase as my protein of interest. This protein is important for all living organisms and plays key roles in multiple pathways. Here is a ribbon structure of the protein and a diagram of its mechanism (source):

The following is the amino acid sequence for the transketolase of Saccharomyces cerevisiae from the online database Uniprot.org.

sp|P23254|TKT1_YEAST Transketolase 1 OS=Saccharomyces cerevisiae
MTQFTDIDKLAVSTIRILAVDTVSKANSGHPGAPLGMAPAAHVLWSQMRMNPTNPDWINR
DRFVLSNGHAVALLYSMLHLTGYDLSIEDLKQFRQLGSRTPGHPEFELPGVEVTTGPLGQ
GISNAVGMAMAQANLAATYNKPGFTLSDNYTYVFLGDGCLQEGISSEASSLAGHLKLGNL
IAIYDDNKITIDGATSISFDEDVAKRYEAYGWEVLYVENGNEDLAGIAKAIAQAKLSKDK
PTLIKMTTTIGYGSLHAGSHSVHGAPLKADDVKQLKSKFGFNPDKSFVVPQEVYDHYQKT
ILKPGVEANNKWNKLFSEYQKKFPELGAELARRLSGQLPANWESKLPTYTAKDSAVATRK
LSETVLEDVYNQLPELIGGSADLTPSNLTRWKEALDFQPPSSGSGNYSGRYIRYGIREHA
MGAIMNGISAFGANYKPYGGTFLNFVSYAAGAVRLSALSGHPVIWVATHDSIGVGEDGPT
HQPIETLAHFRSLPNIQVWRPADGNEVSAAYKNSLESKHTPSIIALSRQNLPQLEGSSIE
SASKGGYVLQDVANPDIILVATGSEVSLSVEAAKTLAAKNIKARVVSLPDFFTFDKQPLE
YRLSVLPDNVPIMSVEVLATTCWGKYAHQSFGIDRFGASGKAPEVFKFFGFTPEGVAERA
QKTIAFYKGDKLISPLKKAF

This amino acid sequence was then tested using the BLASTp search tool on the NCBI web page. As there was extremely high levels of sequence similarity, the NCBI Protein database was queried for the term “transketolase” and ten proteins were selected in order to represent a variety of organisms. The FASTA files for the proteins were downloaded and compiled in the following list:

>Deinococcus maricopensis
MSVEQLSVNTIRTLSIDGVQQANSGHPGAPLGAAPMAYVLWQDFLRFNPKNPTWPGRDRFVLSPGHASML
IYSLLHLTGYDMSLDELKNFRQWGSKTPGHPEFFHTDGLDATTGPLGQGAAMTVGMAIAEAHLAARYNRP
EHEVFDNYTYAIVSDGDLQEGVNHEVASLAGHLKLHKLIWLYDDNDVQLDTATSKTFTDDTTKRYESYGW
NVLMVEDGNDLQAIRDAIKTAQTSDKPTLIRVKTVIGFGSPRAGTSKAHGEPLGADGVAATKEALGWTYP
PFTVPDEVRAHMDATERGAQFEAQWQAKQDAYRAAHPDLAAELDTMLQRGLPADLADKLPTFDVGGKALA
TRAASGKVINAVAESVPGLMGGSADLSGSTKTTIEAQGAMQPGDMGQRNVYFGVREFGMSAIANGMSLYG
GLRPMVGTFLVFADYLKPALRLSALQMQPVIYVLTHDSIGLGEDGPTHQPIEQIASLRATPHTHVYRPAD
ANETAAVWQMALERKDGPSVLALSRQDLPILPRNASGVRKGAYVVRDAEHAQVILIATGSEVAVALEGAD
ALASEGIGARVVSMPSMEVFREQDRSYIDSILTPGVKRVAIEAASPLGWHEWTGADGAVIAMQGFGASAP
AKTLYEKFGFSAQNIVKVVKGLL

>Pseudanabaena sp.
MAVATQTLEQLCINTIRFLSIDAVEKAKSGHPGLPMGAAPMAFTLFDRYLKFNPKNPKWVDRDRFVLSAG
HGCMLQYSLLHLTGYDSVPLDQIKQFRQWGSVTPGHPENFETAGVEVTTGPLGQGVGNAVGLAIAEAHLA
ARFNKPGHNIVDHYTYVILGDGCNMEGVASEAASLAGHLKLGKLIMMYDDNHISIDGSTDLAFTEDVGKR
YEAYGWHVQYVKEGNEDLDGIAKAIEAAKTISDKPSLIVVTTTIGYGSPGKAGTAGVHGAALGGDEVVAT
RKNLGWEYEPFEVPEDALKRFRTAIDKGATAEAAWNDRFAAYEKAYPAEAAQFKQMTAGELPDGWQKALE
PIKQNEKSTRLLSQDCLNALMPVLPGLLGGSADLAHSNMTVLKDYPDFQAGTYAGRNFRFGVREHGMGAV
LNGMDLHGGLVPYGATFLVFADYMRGAIRLSALSETGVIYIMTHDSIMLGEDGPTHQPVETLASLRAIPN
LLVLRPADANETVGSYEVAIASRKRPSLLAFTRQGVKNLAGTSSEGVKKGGYTVVEAANPDLILIATGSE
LALAVNAAESLKGEGKSVRVVSLPCWKLFDEQPQAYRDSVLTPGTKRVSVEASASFGWHKYVGSEGATVS
IDTFGASAPGPTCYKEFGFTVENVIATCKKVLG

>Cyanobacterium aponinum
MVVATQSIQELCINAIRFLSIDGVEKAKSGHPGLPMGAAPMAFVLWDQFMKFNPKNPQWFNRDRFILSAG
HGSMLQYSLLHLYGYDSVTIEDIKQFRQWKSKTPGHPENFVTAGIEVTTGPLGQGIANGVGLALAEAHLA
AKFNKPDATIVDHYTYVILGDGCNMEGISGEAASLAGHWGLGKLIALYDDNHISIDGSTDIAFTEDVCKR
YEAYGWHVQHVENGNTDLEAIASAIEAAKAVTDKPSLIKVTTTIGYGSPNKADTAGVHGAALGADEVALT
RKELGWNYDPFVVPEEVYNHFHKAIERGASLQAEWEETFATYKTKYPAEATEFENQISGKLPENWADCLP
SYTPEDKALASRKHSEICLNAIAPVLPQLVGGSADLTHSNLTEIHCSGDFQKGAYENRNIHFGVREHAMG
AICNGIALHNSGLIPYGATFLVFTDYMRNSIRLSALSEAKVIWVMTHDSIALGEDGPTHQPVEHVMSLRM
IPDLLVFRPADGNETSGAYKVAIEADKTPSLMALTRQGLPNLAGSSIDAVAKGGYVLSCGFAPEELDLIL
IGTGSEVGLCVEAAEKLKAEGLKVRVVSMPCVELFDQQDEAYKESVLPKSVKKRISVEAGVTYGWERFVG
DEGVCIGINTFGASAPGGVVMEKFGFTVDNVVAQAKAILG

>Mus musculus
MEGYHKPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRNPHNDR
FVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSLGQGLGAACGMAYTGK
YFDKASYRVYCMLGDGEVSEGSVWEAMAFAGIYKLDNLVAIFDINRLGQSDPAPLQHQVDIYQKRCEAFG
WHTIIVDGHSVEELCKAFGQAKHQPTAIIAKTFKGRGITGIEDKEAWHGKPLPKNMAEQIIQEIYSQVQS
KKKILATPPQEDAPSVDIANIRMPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSEL
FKKEHPDRFIECYIAEQNMVSIAVGCATRDRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVS
IGEDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYSNNEDFQVG
QAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDRKLILDSARATKGRILTVE
DHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELLKMFGIDKDAIVQAVKGLVTKG

>Arabidopsis thaliana
MASTSSLALSQALLARAISHHGSDQRGSLPAFSGLKSTGSRASASSRRRIAQSMTKNRSLRPLVRAAAVE
TVEPTTDSSIVDKSVNSIRFLAIDAVEKAKSGHPGLPMGCAPMAHILYDEVMRYNPKNPYWFNRDRFVLS
AGHGCMLLYALLHLAGYDSVQEDLKQFRQWGSKTPGHPENFETPGIEVTTGPLGQGIANAVGLALAEKHL
AARFNKPDAEVVDHYTYAILGDGCQMEGISNEACSLAGHWGLGKLIAFYDDNHISIDGDTEIAFTENVDQ
RFEALGWHVIWVKNGNTGYDEIRAAIKEAKTVTDKPTLIKVTTTIGYGSPNKANSYSVHGAALGEKEVEA
TRNNLGWPYEPFQVPDDVKSHWSRHTPEGATLESDWSAKFAAYEKKYPEEASELKSIITGELPAGWEKAL
PTYTPESPGDATRNLSQQCLNALAKVVPGFLGGSADLASSNMTLLKAFGDFQKATPEERNLRFGVREHGM
GAICNGIALHSPGLIPYCATFFVFTDYMRGAMRISALSEAGVIYVMTHDSIGLGEDGPTHQPIEHIASFR
AMPNTLMFRPADGNETAGAYKIAVTKRKTPSILALSRQKLPHLPGTSIEGVEKGGYTISDDSSGNKPDVI
LIGTGSELEIAAQAAEVLRKDGKTVRVVSFVCWELFDEQSDEYKESVLPSDVSARVSIEAASTFGWGKIV
GGKGKSIGINSFGASAPAPLLYKEFGITVEAVVDAAKSFF

>Xenopus laevis
MADYHKPDQQTLQALRDTANRLRVLSIKATSAAGSGHPTSCCSAAEIMSVLFFHTMKYKPKDPRNPNNDR
FVMSKGHAAPILYAAWSEAGFLQESELLNLRKLDSILEGHPVPKQEFVDVATGSLGQGLGAACGMAYTGK
FFDKASYRVFCLLGDGEVSEGSVWEAMAFAGFYKLDNLVAIFDVNRLGQSDPAPLQHKVEVYQKRCEAFG
WHSVVVDGHSVEELCKAFCHVKNQPTAIIAKTFKGKGISGVEDKENWHGKPLPKELAEQSIKEIEGKIQS
KKKLSPALPVEDAPVISIKNIKMPSPPSYKLGEKIATRKAYGLALAKLGHANDRVIALDGDTKNSTFSEL
FKKEHPGRYIECYIAEQNMVSVAIGSTTRDRTVAFASAFATFFSRAYDQIRMAAISESNINLCGSHCGVS
IGEDGPSQMGLEDLAMFRAVPTATVFYPSDAVSTEKAVELAANTKGICFIRTSRPEDAVIYSSTEEFKIG
HAKVVAQNKDDQVTVIGAGVTLHEALAAAEQLKKEKIHIRVIDPFTIKPLDKKTIVENAKATNGHIITVE
DHYHEGGIGEAVAAAVVGVPGITLKSLAVSHVPRSGKPTELLRMFEIDKEAIVAAVKGLVSHATNSK

>Sinorhizobium meliloti
MNVSQQIGPRAAASERSMADAIRFLSMDAVEKANSGHPGMPMGMADAVTVLFNRFIRIDPSHPDWPDRDR
FVLSAGHGSMLLYSLHHLIGFADMPMAELSSFRQLGSKTAGHPEYGHALGIETTTGPLGQGISTAVGMAI
AEQMMAARFGSALCNHFTYVVAGDGCLQEGISHEAIDLAGHLKLRKLVVLWDDNRISIDGSTDLSTSMNQ
LARFRAAGWDAQAVDGHDPDAAAKAIERARRTRKPSLIACRTRIGKGAASMEGSHKTHGAALGEKEIAAT
REKLGWPHPPFFVPPEIKAAWEKVATRGRTAREAWEIRLDASRSKKRYEQTVERQLDGEVGDLLARFRGA
HRTRATKVATRQASQMALEVINGATALTIGGSADLTGSNLTLTSQTQPISPGNFKGRYLHYGIREHGMAA
AMNGIALHGGFIPYGGTFLVFSDYARGAMRLSALMGLPVIYVLTHDSIGLGEDGPTHQPVEHLAMLRATP
NLNVFRPADIIETAECWEIAIGEKNTPSVLALSRQALPMLRRTDGNENLSALGAYVLREARGDRDITLLA
TGSEVEIAVAAAERLQAEERIAAAVVSMPCWEKFEAQDAAYQRQVLGDAPRIAIEAAGRLGWDRWMGPDS
AFVGMTGFGASAPAGDLYRHFGITADHVVAEALELLRRACPETPPIGARTGKPVAHIVRSSEEA

>Homo sapiens
MESYHKPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKSQDPRNPHNDR
FVLSKGHAAPILYAVWAEAGFLAEAELLNLRKISSDLDGHPVPKQAFTDVATGSLGQGLGAACGMAYTGK
YFDKASYRVYCLLGDGELSEGSVWEAMAFASIYKLDNLVAILDINRLGQSDPAPLQHQMDIYQKRCEAFG
WHAIIVDGHSVEELCKAFGQAKHQPTAIIAKTFKGRGITGVEDKESWHGKPLPKNMAEQIIQEIYSQIQS
KKKILATPPQEDAPSVDIANIRMPSLPSYKVGDKIATRKAYGQALAKLGHASDRIIALDGDTKNSTFSEI
FKKEHPDRFIECYIAEQNMVSIAVGCATRNRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVS
IGEDGPSQMALEDLAMFRSVPTSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYNNNEDFQVG
QAKVVLKSKDDQVTVIGAGVTLHEALAAAELLKKEKINIRVLDPFTIKPLDRKLILDSARATKGRILTVE
DHYYEGGIGEAVSSAVVGEPGITVTHLAVNRVPRSGKPAELLKMFGIDRDAIAQAVRGLITKA

>Plasmodium cynomolgi strain B
MNGEIDQKCINEIRMLSAELPLKANSGHQGAPIGCAPIAHILWAYVMNYYNEDTKWMNRDRFVLSNGHAS
ALLYTMLYLTKQGLTMEDLKNFRQLESLTPGHPEKHITKGVEVTTGPLGQGASNAVGMAICAHNLAEKYN
TKEFPIFDNYIYAMCGRLILLYDDNKITIDGNTELSFTENIEKKFEALKWEVRKVANGNTDFEGILTQIE
EAKKNTKQPSLIIVQTACGYGTKVEGTCKSHGLALKEEDLKKAKLFFGLDPEKQFHISEEVKKFYENIVQ
KKKENYLKWKKMFCDFTVQYPEKAQEIMRRFSKELPHNWVEVLPKYTTLDAPGATRNLSGVALNCINKVL
PELIGGSADLTESNCTALKEEKDICRDSFANKYIRYGVREHGMVAITNGIYAYGGFEPFCATFLNFYTYA
FGALRLAALSQYHIFCIATHDSVELGEDGPTHQPVEVLALLRATPNLNVIRPADGNEVSGAYLCHFTNPK
TPTLVALCRNKVPHLKNTSAEQVLKGAYVLEDFDNANDKQKVILAGCGSELHLCFDAKKILTEQHNLNVR
IVSFPSWNLFRQQSEDYQQSVMMHHDAKVVRFYIEPASTYGFDTYFNVYLGINQFGYSAPKNKIWEHLGF
TSENIVCKVLAYIKAKMHP

>Plasmodium falciparum
MNIMDNEIDTKCINEIRMLSAELPLEAKSGHQGAPIGCAPIAHILWSYVMNYYNEDTKWINRDRFILSNG
HASALLYTMLYLTEQGLSMEDLKSFRQFGSLTPGHPENHITKGVEVTTGPLGQGASNAVGMAIAAHNLAD
KYNTEEHKIFDNYVYAICGDGCMQEGVFCEAASLAGHLGLGRLILLYDDNKITIDGNTDLSFTENIEKKF
EALNWEVRRVEDGNKDYKKILHEIEQGKKNLQQPTLIIVRTACGFGTKVEGTCKSHGLALNDEDLKNAKS
FFGLDPQKKFHISDEVKEFYKNVIQKKKENYIKWKNMFDDFSLKYPQVSQEIIRRFQNDLPNNWKDALPK
YTPKDAPGATRNLSGIVLNSINKIFPELIGGSADLSESNCTSLKEENDIKKNSYGNKYIRFGVREHGMVA
ITNGLYAYGGFKPYCGTFLNFYTYAFGALRLAALSNHHILCIATHDSVELGEDGPTHQPIEVLSLLRSTP
NLNIIRPADGNEVSGAYLSHFSNPHTPTVIALCRNKVPHLNNTQPEQVLKGAYILEDFDTSNNPKVILTG
SGSELHLCFEAKEILKNQHQLNVRIVSFPSWTLFKKQPEDYQYSVMMHNHPNLPRFYIEPASTHGFDTYF
NVYIGINQFGYSAPKNKIWEHLGFTPENIVQKVLAFMKNKLK

This list of AA sequences was then run through the following pipeline on the BCBB section of the Mobyle @Pasteur web portal.

This pipeline utilized the aligner Clustaw-multialign, the distance matrix computing program protdist, a phylogenetic analyzer neighbor, and finally a tree viewer newicktops. The tree below is the output when the 10 variations of transketolase are run through the pipeline.

It should be noted that one of the programs truncates the names in the output file. To correct the tree, the names were repaired in the input file for the final tree that was displayed.

Home

b2gof15/students/bradbows/class_sessions/2015.10.1.txt · Last modified: 2015/10/13 13:18 by bradbows
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki