Research and obtain a protein sequence then blast the sequence against the NCBI database to get a list of proteins from other organisms. Next, pare the list down to 10 species, compile a single FASTA file, perform an alignment, establish the genetic distances, then produce a tree.
I selected transketolase as my protein of interest. This protein is important for all living organisms and plays key roles in multiple pathways. Here is a ribbon structure of the protein and a diagram of its mechanism (source):
The following is the amino acid sequence for the transketolase of Saccharomyces cerevisiae from the online database Uniprot.org.
sp|P23254|TKT1_YEAST Transketolase 1 OS=Saccharomyces cerevisiae
MTQFTDIDKLAVSTIRILAVDTVSKANSGHPGAPLGMAPAAHVLWSQMRMNPTNPDWINR DRFVLSNGHAVALLYSMLHLTGYDLSIEDLKQFRQLGSRTPGHPEFELPGVEVTTGPLGQ GISNAVGMAMAQANLAATYNKPGFTLSDNYTYVFLGDGCLQEGISSEASSLAGHLKLGNL IAIYDDNKITIDGATSISFDEDVAKRYEAYGWEVLYVENGNEDLAGIAKAIAQAKLSKDK PTLIKMTTTIGYGSLHAGSHSVHGAPLKADDVKQLKSKFGFNPDKSFVVPQEVYDHYQKT ILKPGVEANNKWNKLFSEYQKKFPELGAELARRLSGQLPANWESKLPTYTAKDSAVATRK LSETVLEDVYNQLPELIGGSADLTPSNLTRWKEALDFQPPSSGSGNYSGRYIRYGIREHA MGAIMNGISAFGANYKPYGGTFLNFVSYAAGAVRLSALSGHPVIWVATHDSIGVGEDGPT HQPIETLAHFRSLPNIQVWRPADGNEVSAAYKNSLESKHTPSIIALSRQNLPQLEGSSIE SASKGGYVLQDVANPDIILVATGSEVSLSVEAAKTLAAKNIKARVVSLPDFFTFDKQPLE YRLSVLPDNVPIMSVEVLATTCWGKYAHQSFGIDRFGASGKAPEVFKFFGFTPEGVAERA QKTIAFYKGDKLISPLKKAF
This amino acid sequence was then tested using the BLASTp search tool on the NCBI web page. As there was extremely high levels of sequence similarity, the NCBI Protein database was queried for the term “transketolase” and ten proteins were selected in order to represent a variety of organisms. The FASTA files for the proteins were downloaded and compiled in the following list:
>Deinococcus maricopensis MSVEQLSVNTIRTLSIDGVQQANSGHPGAPLGAAPMAYVLWQDFLRFNPKNPTWPGRDRFVLSPGHASML IYSLLHLTGYDMSLDELKNFRQWGSKTPGHPEFFHTDGLDATTGPLGQGAAMTVGMAIAEAHLAARYNRP EHEVFDNYTYAIVSDGDLQEGVNHEVASLAGHLKLHKLIWLYDDNDVQLDTATSKTFTDDTTKRYESYGW NVLMVEDGNDLQAIRDAIKTAQTSDKPTLIRVKTVIGFGSPRAGTSKAHGEPLGADGVAATKEALGWTYP PFTVPDEVRAHMDATERGAQFEAQWQAKQDAYRAAHPDLAAELDTMLQRGLPADLADKLPTFDVGGKALA TRAASGKVINAVAESVPGLMGGSADLSGSTKTTIEAQGAMQPGDMGQRNVYFGVREFGMSAIANGMSLYG GLRPMVGTFLVFADYLKPALRLSALQMQPVIYVLTHDSIGLGEDGPTHQPIEQIASLRATPHTHVYRPAD ANETAAVWQMALERKDGPSVLALSRQDLPILPRNASGVRKGAYVVRDAEHAQVILIATGSEVAVALEGAD ALASEGIGARVVSMPSMEVFREQDRSYIDSILTPGVKRVAIEAASPLGWHEWTGADGAVIAMQGFGASAP AKTLYEKFGFSAQNIVKVVKGLL >Pseudanabaena sp. MAVATQTLEQLCINTIRFLSIDAVEKAKSGHPGLPMGAAPMAFTLFDRYLKFNPKNPKWVDRDRFVLSAG HGCMLQYSLLHLTGYDSVPLDQIKQFRQWGSVTPGHPENFETAGVEVTTGPLGQGVGNAVGLAIAEAHLA ARFNKPGHNIVDHYTYVILGDGCNMEGVASEAASLAGHLKLGKLIMMYDDNHISIDGSTDLAFTEDVGKR YEAYGWHVQYVKEGNEDLDGIAKAIEAAKTISDKPSLIVVTTTIGYGSPGKAGTAGVHGAALGGDEVVAT RKNLGWEYEPFEVPEDALKRFRTAIDKGATAEAAWNDRFAAYEKAYPAEAAQFKQMTAGELPDGWQKALE PIKQNEKSTRLLSQDCLNALMPVLPGLLGGSADLAHSNMTVLKDYPDFQAGTYAGRNFRFGVREHGMGAV LNGMDLHGGLVPYGATFLVFADYMRGAIRLSALSETGVIYIMTHDSIMLGEDGPTHQPVETLASLRAIPN LLVLRPADANETVGSYEVAIASRKRPSLLAFTRQGVKNLAGTSSEGVKKGGYTVVEAANPDLILIATGSE LALAVNAAESLKGEGKSVRVVSLPCWKLFDEQPQAYRDSVLTPGTKRVSVEASASFGWHKYVGSEGATVS IDTFGASAPGPTCYKEFGFTVENVIATCKKVLG >Cyanobacterium aponinum MVVATQSIQELCINAIRFLSIDGVEKAKSGHPGLPMGAAPMAFVLWDQFMKFNPKNPQWFNRDRFILSAG HGSMLQYSLLHLYGYDSVTIEDIKQFRQWKSKTPGHPENFVTAGIEVTTGPLGQGIANGVGLALAEAHLA AKFNKPDATIVDHYTYVILGDGCNMEGISGEAASLAGHWGLGKLIALYDDNHISIDGSTDIAFTEDVCKR YEAYGWHVQHVENGNTDLEAIASAIEAAKAVTDKPSLIKVTTTIGYGSPNKADTAGVHGAALGADEVALT RKELGWNYDPFVVPEEVYNHFHKAIERGASLQAEWEETFATYKTKYPAEATEFENQISGKLPENWADCLP SYTPEDKALASRKHSEICLNAIAPVLPQLVGGSADLTHSNLTEIHCSGDFQKGAYENRNIHFGVREHAMG AICNGIALHNSGLIPYGATFLVFTDYMRNSIRLSALSEAKVIWVMTHDSIALGEDGPTHQPVEHVMSLRM IPDLLVFRPADGNETSGAYKVAIEADKTPSLMALTRQGLPNLAGSSIDAVAKGGYVLSCGFAPEELDLIL IGTGSEVGLCVEAAEKLKAEGLKVRVVSMPCVELFDQQDEAYKESVLPKSVKKRISVEAGVTYGWERFVG DEGVCIGINTFGASAPGGVVMEKFGFTVDNVVAQAKAILG >Mus musculus MEGYHKPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRNPHNDR FVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSLGQGLGAACGMAYTGK YFDKASYRVYCMLGDGEVSEGSVWEAMAFAGIYKLDNLVAIFDINRLGQSDPAPLQHQVDIYQKRCEAFG WHTIIVDGHSVEELCKAFGQAKHQPTAIIAKTFKGRGITGIEDKEAWHGKPLPKNMAEQIIQEIYSQVQS KKKILATPPQEDAPSVDIANIRMPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSEL FKKEHPDRFIECYIAEQNMVSIAVGCATRDRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVS IGEDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYSNNEDFQVG QAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDRKLILDSARATKGRILTVE DHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELLKMFGIDKDAIVQAVKGLVTKG >Arabidopsis thaliana MASTSSLALSQALLARAISHHGSDQRGSLPAFSGLKSTGSRASASSRRRIAQSMTKNRSLRPLVRAAAVE TVEPTTDSSIVDKSVNSIRFLAIDAVEKAKSGHPGLPMGCAPMAHILYDEVMRYNPKNPYWFNRDRFVLS AGHGCMLLYALLHLAGYDSVQEDLKQFRQWGSKTPGHPENFETPGIEVTTGPLGQGIANAVGLALAEKHL AARFNKPDAEVVDHYTYAILGDGCQMEGISNEACSLAGHWGLGKLIAFYDDNHISIDGDTEIAFTENVDQ RFEALGWHVIWVKNGNTGYDEIRAAIKEAKTVTDKPTLIKVTTTIGYGSPNKANSYSVHGAALGEKEVEA TRNNLGWPYEPFQVPDDVKSHWSRHTPEGATLESDWSAKFAAYEKKYPEEASELKSIITGELPAGWEKAL PTYTPESPGDATRNLSQQCLNALAKVVPGFLGGSADLASSNMTLLKAFGDFQKATPEERNLRFGVREHGM GAICNGIALHSPGLIPYCATFFVFTDYMRGAMRISALSEAGVIYVMTHDSIGLGEDGPTHQPIEHIASFR AMPNTLMFRPADGNETAGAYKIAVTKRKTPSILALSRQKLPHLPGTSIEGVEKGGYTISDDSSGNKPDVI LIGTGSELEIAAQAAEVLRKDGKTVRVVSFVCWELFDEQSDEYKESVLPSDVSARVSIEAASTFGWGKIV GGKGKSIGINSFGASAPAPLLYKEFGITVEAVVDAAKSFF >Xenopus laevis MADYHKPDQQTLQALRDTANRLRVLSIKATSAAGSGHPTSCCSAAEIMSVLFFHTMKYKPKDPRNPNNDR FVMSKGHAAPILYAAWSEAGFLQESELLNLRKLDSILEGHPVPKQEFVDVATGSLGQGLGAACGMAYTGK FFDKASYRVFCLLGDGEVSEGSVWEAMAFAGFYKLDNLVAIFDVNRLGQSDPAPLQHKVEVYQKRCEAFG WHSVVVDGHSVEELCKAFCHVKNQPTAIIAKTFKGKGISGVEDKENWHGKPLPKELAEQSIKEIEGKIQS KKKLSPALPVEDAPVISIKNIKMPSPPSYKLGEKIATRKAYGLALAKLGHANDRVIALDGDTKNSTFSEL FKKEHPGRYIECYIAEQNMVSVAIGSTTRDRTVAFASAFATFFSRAYDQIRMAAISESNINLCGSHCGVS IGEDGPSQMGLEDLAMFRAVPTATVFYPSDAVSTEKAVELAANTKGICFIRTSRPEDAVIYSSTEEFKIG HAKVVAQNKDDQVTVIGAGVTLHEALAAAEQLKKEKIHIRVIDPFTIKPLDKKTIVENAKATNGHIITVE DHYHEGGIGEAVAAAVVGVPGITLKSLAVSHVPRSGKPTELLRMFEIDKEAIVAAVKGLVSHATNSK >Sinorhizobium meliloti MNVSQQIGPRAAASERSMADAIRFLSMDAVEKANSGHPGMPMGMADAVTVLFNRFIRIDPSHPDWPDRDR FVLSAGHGSMLLYSLHHLIGFADMPMAELSSFRQLGSKTAGHPEYGHALGIETTTGPLGQGISTAVGMAI AEQMMAARFGSALCNHFTYVVAGDGCLQEGISHEAIDLAGHLKLRKLVVLWDDNRISIDGSTDLSTSMNQ LARFRAAGWDAQAVDGHDPDAAAKAIERARRTRKPSLIACRTRIGKGAASMEGSHKTHGAALGEKEIAAT REKLGWPHPPFFVPPEIKAAWEKVATRGRTAREAWEIRLDASRSKKRYEQTVERQLDGEVGDLLARFRGA HRTRATKVATRQASQMALEVINGATALTIGGSADLTGSNLTLTSQTQPISPGNFKGRYLHYGIREHGMAA AMNGIALHGGFIPYGGTFLVFSDYARGAMRLSALMGLPVIYVLTHDSIGLGEDGPTHQPVEHLAMLRATP NLNVFRPADIIETAECWEIAIGEKNTPSVLALSRQALPMLRRTDGNENLSALGAYVLREARGDRDITLLA TGSEVEIAVAAAERLQAEERIAAAVVSMPCWEKFEAQDAAYQRQVLGDAPRIAIEAAGRLGWDRWMGPDS AFVGMTGFGASAPAGDLYRHFGITADHVVAEALELLRRACPETPPIGARTGKPVAHIVRSSEEA >Homo sapiens MESYHKPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKSQDPRNPHNDR FVLSKGHAAPILYAVWAEAGFLAEAELLNLRKISSDLDGHPVPKQAFTDVATGSLGQGLGAACGMAYTGK YFDKASYRVYCLLGDGELSEGSVWEAMAFASIYKLDNLVAILDINRLGQSDPAPLQHQMDIYQKRCEAFG WHAIIVDGHSVEELCKAFGQAKHQPTAIIAKTFKGRGITGVEDKESWHGKPLPKNMAEQIIQEIYSQIQS KKKILATPPQEDAPSVDIANIRMPSLPSYKVGDKIATRKAYGQALAKLGHASDRIIALDGDTKNSTFSEI FKKEHPDRFIECYIAEQNMVSIAVGCATRNRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVS IGEDGPSQMALEDLAMFRSVPTSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYNNNEDFQVG QAKVVLKSKDDQVTVIGAGVTLHEALAAAELLKKEKINIRVLDPFTIKPLDRKLILDSARATKGRILTVE DHYYEGGIGEAVSSAVVGEPGITVTHLAVNRVPRSGKPAELLKMFGIDRDAIAQAVRGLITKA >Plasmodium cynomolgi strain B MNGEIDQKCINEIRMLSAELPLKANSGHQGAPIGCAPIAHILWAYVMNYYNEDTKWMNRDRFVLSNGHAS ALLYTMLYLTKQGLTMEDLKNFRQLESLTPGHPEKHITKGVEVTTGPLGQGASNAVGMAICAHNLAEKYN TKEFPIFDNYIYAMCGRLILLYDDNKITIDGNTELSFTENIEKKFEALKWEVRKVANGNTDFEGILTQIE EAKKNTKQPSLIIVQTACGYGTKVEGTCKSHGLALKEEDLKKAKLFFGLDPEKQFHISEEVKKFYENIVQ KKKENYLKWKKMFCDFTVQYPEKAQEIMRRFSKELPHNWVEVLPKYTTLDAPGATRNLSGVALNCINKVL PELIGGSADLTESNCTALKEEKDICRDSFANKYIRYGVREHGMVAITNGIYAYGGFEPFCATFLNFYTYA FGALRLAALSQYHIFCIATHDSVELGEDGPTHQPVEVLALLRATPNLNVIRPADGNEVSGAYLCHFTNPK TPTLVALCRNKVPHLKNTSAEQVLKGAYVLEDFDNANDKQKVILAGCGSELHLCFDAKKILTEQHNLNVR IVSFPSWNLFRQQSEDYQQSVMMHHDAKVVRFYIEPASTYGFDTYFNVYLGINQFGYSAPKNKIWEHLGF TSENIVCKVLAYIKAKMHP >Plasmodium falciparum MNIMDNEIDTKCINEIRMLSAELPLEAKSGHQGAPIGCAPIAHILWSYVMNYYNEDTKWINRDRFILSNG HASALLYTMLYLTEQGLSMEDLKSFRQFGSLTPGHPENHITKGVEVTTGPLGQGASNAVGMAIAAHNLAD KYNTEEHKIFDNYVYAICGDGCMQEGVFCEAASLAGHLGLGRLILLYDDNKITIDGNTDLSFTENIEKKF EALNWEVRRVEDGNKDYKKILHEIEQGKKNLQQPTLIIVRTACGFGTKVEGTCKSHGLALNDEDLKNAKS FFGLDPQKKFHISDEVKEFYKNVIQKKKENYIKWKNMFDDFSLKYPQVSQEIIRRFQNDLPNNWKDALPK YTPKDAPGATRNLSGIVLNSINKIFPELIGGSADLSESNCTSLKEENDIKKNSYGNKYIRFGVREHGMVA ITNGLYAYGGFKPYCGTFLNFYTYAFGALRLAALSNHHILCIATHDSVELGEDGPTHQPIEVLSLLRSTP NLNIIRPADGNEVSGAYLSHFSNPHTPTVIALCRNKVPHLNNTQPEQVLKGAYILEDFDTSNNPKVILTG SGSELHLCFEAKEILKNQHQLNVRIVSFPSWTLFKKQPEDYQYSVMMHNHPNLPRFYIEPASTHGFDTYF NVYIGINQFGYSAPKNKIWEHLGFTPENIVQKVLAFMKNKLK
This list of AA sequences was then run through the following pipeline on the BCBB section of the Mobyle @Pasteur web portal.
This pipeline utilized the aligner Clustaw-multialign, the distance matrix computing program protdist, a phylogenetic analyzer neighbor, and finally a tree viewer newicktops. The tree below is the output when the 10 variations of transketolase are run through the pipeline.
It should be noted that one of the programs truncates the names in the output file. To correct the tree, the names were repaired in the input file for the final tree that was displayed.