Table of Contents

1. Introduction to Bioinformatics

With the rapid development of Molecular Biology a massive amount of biological data including DNA, RNA and protein data began to accumulate during the last few decades. The development of powerful technologies such as NGS (Next Generation Sequencing), Microarray technologies, RNA Seq, etc. enabled a boost in biological data. As a consequence, biologists needed an efficient way of manipulating these data such that they can store them as well as retrieve them when ever needed. Biologists have used computers as important tools since early 1960s (a decade before sequencing technologies are developed) to handle accumulating protein data generated by protein biochemistry (Hagen et al. 2000). Various genome projects such as HGP (Human Genome Project) further necessitate computer power to analyze data as well as databases to store data. As a result a novel field, “Bioinformatics”, emerged.

The NCBI (National Center for Biotechnology Information) bioinformatics is defined as follows:

“Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information.”

2. Databases

There are three major databases function as DNA data repositories while there are other databases specifically developed to store protein sequences, protein structures, small RNA sequences, etc. Few of them are given below:

1. DNA Databases:

2. Protein Sequence Databases:

3. Protein Structure Databases:

4. Small RNA Databases:

Other than these common databases there are organism specific databases, where a massive amount of information can be retrieved. For example researchers working with human systems have benefited with the development of Human Genome Browser. Scientists working with famous fruit fly Drosophila melanogaster make use of FlyBase. Model plant Arabidopsis has its own database, TAIR. Likewise there is a C. elegans specific database, WormBase. These organism specific databases helped to narrow down searches.

These databases have been a major driving force for the rapid development of the field of Molecular Biology. As biologists, now its clear that we have an enormous amount of resources to help simplify biological queries that we are investigating. But above all, most importantly we need to know how to use these resources in a useful manner.

3. Use of Bioinformatics to Solve Biological Queries

a) Finding the Identity

As a researcher I recently isolated, purified and sequenced a piece of DNA from a clinical sample of a patient having a particular neurological disease. The length of the DNA sequence is 65542 bp. A part of the sequence is given below.

TCCCGTGGAAACCAGGAGTCCCTTGGTGCAGACAGCTCTTCCTACTTTCC
CATGCAGTTCTTTTGTGCGACTTTGAGGGGCTCGTGAATGATTTCTAAAT
GTGTGCCTGCTGAGGCGAGCCGCACAGGGAGGGAGGAACCCAGCCGAGCC
GTGCCAGAGGAAGCCAACAGGATCCTAGCAGTGCGGGAGCTGGCTCAGCT
CTTGCATGCAGTTTTTGAAGTCAGCAAAACAGAAACCAAATTACTATCAT
ATTATGCTGGTGGAAGATCAAGAAGAGGGGACTCTACACCAGTTTAATTA
CTGTGAGAGATGCAGCGAGTCACAGAATAACAAATGTATCTCATGTGTGG
ACCCTGAAGACAAATGTAAGTTCTCATGCCGCTATATTTTATTGCTGTGT
AATTTTCTTTCCGGTTTGAAATCATGCTTGGCCAACATGTAATCATTTCA
ATGAGAATTTCCAGGGAGGAAAGTTGTCTGCTAATCTTTACTTAAGACTT
TTTTGTTTTCCTTTTATTAGCTAAGCAACATTATAGGAGCTGAAATTCCT
GACAGCAGCTGTGGCAATTCAGCTTAAGAATGGCTGAGAACTGTAACCCA
AAGTACATCCAATTACTATGGGATTAACACTGGATGTATTTTTAATTGAC
TTTCTTAATGTAGAATGTGTACATCCCCACTGTTTCTGATTGCATGCTAT
TTTAATAATACTGTTGCTAAACTAGTACCATCGGCATAACCAACAAAATG
AGATATAGTTAAACAAGAGTCCCAGTAGTTATAAAACTTTTCTTCTTTGT
CCAGGACATTTATCTTCCCGAGCGCTCAAAAAAAACCCTGCAACCTCTAT
GCTAAAAGTTCATTCTGCTTTTTTGTCCTCGGTTTGGTGAGAAAATAATA
AAACCAAACAGTGGACTCTCCTAAAATTGTGAATGAAGAAAACTTACAGC
CACCACAGTTCAGTTCTTTAACTATCATTGTAATAATGGAAGACAAAAAT
CCAGCCCCGGGAGAACAGCATGTACACCAGCCTCAGTGTTACAGAGTGTG
GGTACATCAAGGTGAATGGTGAGCAGAAACTATAACCTGTTAGTCCT...
...ATGGTAATTTTTTAAAGTGCTACCGTAGCCAAATTGAACTAAGTCAC
TGTACTGCTTTCAGCAAAGGGTGCTCCTCCCATTTGTGCATCAATGAGAC
ATATTTATAAAGTGCTAAATTATTCTGTGCCATATGTAACAAATACAGTG
AAGATTATTTTATGAACTTATTTTAATCAAGGCGATGCTAAAAGTTTTCA
AGAAAGGATAAATAACTGTAAATAAAGTAGACTCAAAAA 65542

The full sequence can be found in this PDF file: bio.pdf

In order to begin analysis of the sequence a BLAST search was done using NCBI BLAST.

Interestingly the sequence perfectly aligned with the human chromosome number 13 where a G-protein coupled receptor, HTR2A (Serotonin receptor), is located.

The same sequence was used to perform a BLAT genome search using the Human Genome Browser. BLAT genome search similar to NCBI BLAST has designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.

The genome browser view of the sequence is given below:

Therefore according to BLAST and BLAT search results the DNA sequence can encode for the protein HTR2A (Serotonin/5-Hydroxytryptamine receptor). But to further support this hypothesis, a gene prediction tool such as GENIE, GENSCAN, GeneMark or Augustus can be used.

b) Gene Prediction to Prove the Hypothesis

GENSCAN program helps to predict the locations and exon-intron structures of genes in genomic sequences from a variety eukaryotes (e.g. vertebrates, Arabidopsis, Maize) (for more information click here). As the sequence is derived from humans, the program ran under vertebrate settings.

The predicted gene structure and the peptide sequence is given below.

To identify the protein, the predicted peptide sequence was used to do a NCBI BLAST search. BLAST search results are as follows:

Excitingly, the predicted protein sequence aligned with high similarity to human 5-hydroxytryptamine receptor (HTR2A). But the NCBI GenPept entry shows that the protein is only 471 amino acids long where as the GenScan predicted peptide sequence is 607 amino acids.

To check the difference two protein sequences were aligned using BLAST.

The dot-plot revealed the difference between two amino acid sequences. Clearly the NCBI GenPept entry lacks the amino acid sequence from 206 to 345 of the amino acid sequence predicted by GenScan.

Augustus web server is a recently developed faithful web platform which can be used as a tool to identify genes within a piece of DNA. The importance of the Augustus web platform is that, it considers alternative transcripts which can be produced as a result of alternative splicing. The same sequence was used to scan for genes using Augustus. There are a number of different source organisms (both prokaryotes and eukaryotes) available in this server and as the sequence is derived from human gene prediction was done with Homo sapiens settings.

The predicted coding sequence, amino acid sequence and the graphical output is given below:

atgcagtttttgaagtcagcaaaacagaaaccaaattactatcatattatgctggtggaagatcaagaagaggggactctacaccagtttaa
ttactgtgagagatgcagcgagtcacagaataacaaatgtatctcatgtgtggaccctgaagacaaatccgtagtgattattctaactattg
ctggaaacatactcgtcatcatggcagtgtccctagagaaaaagctgcagaatgccaccaactatttcctgatgtcacttgccatagctgat
atgctgctgggtttccttgtcatgcccgtgtccatgttaaccatcctgtatgggtaccggtggcctctgccgagcaagctttgtgcagtctg
gatttacctggacgtgctcttctccacggcctccatcatgcacctctgcgccatctcgctggaccgctacgtcgccatccagaatcccatcc
accacagccgcttcaactccagaactaaggcatttctgaaaatcattgctgtttggaccatatcagtaggtatatccatgccaataccagtc
tttgggctacaggacgattcgaaggtctttaaggaggggagttgcttactcgccgatgataactttgtcctgatcggctcttttgtgtcatt
tttcattcccttaaccatcatggtgatcacctactttctaactatcaagtcactccagaaagaagctactttgtgtgtaagtgatcttggca
cacgggccaaattagcttctttcagcttcctccctcagagttctttgtcttcagaaaagctcttccagcggtcgatccatagggagccaggg
tcctacacaggcaggaggactatgcagtccatcagcaatgagcaaaaggcatgcaaggtgctgggcatcgtcttcttcctgtttgtggtgat
gtggtgccctttcttcatcacaaacatcatggccgtcatctgcaaagagtcctgcaatgaggatgtcattggggccctgctcaatgtgtttg
tttggatcggttatctctcttcagcagtcaacccactagtctacacactgttcaacaagacctataggtcagccttttcacggtatattcag
tgtcagtacaaggaaaacaaaaaaccattgcagttaattttagtgaacacaataccggctttggcctacaagtctagccaacttcaaatggg
acaaaaaaagaattcaaagcaagatgccaagacaacagataatgactgctcaatggttgctctaggaaagcagcattctgaagaggcttcta
aagacaatagcgacggagtgaatgaaaaggtaaactag
MQFLKSAKQKPNYYHIMLVEDQEEGTLHQFNYCERCSESQNNKCISCVDPEDKSVVIIL
TIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWPLPSK
LCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVG
ISMPIPVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKE
ATLCVSDLGTRAKLASFSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACK
VLGIVFFLFVVMWCPFFITNIMAVICKESCNEDVIGALLNVFVWIGYLSSAVNPLVYTL
FNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYKSSQLQMGQKKNSKQDAKTTDN
DCSMVALGKQHSEEASKDNSDGVNEKVN

The predicted protein sequence was used to do a BLAST search and the sequence perfectly aligned with the human HTR2A entry of GenPept.

Another BLAST search was done to align GenPept entry and the Augustus derived protein sequence. The resulted dot-plot revealed that they align each other with high similarity.

Therefore, with GenScan and Augustus gene predictions together with BLAST searches confirmed the previous hypothesis, the isolated DNA product can encodes for a potential protein, HTR2A.

c) Structure of the Protein Product

As mentioned above PDB (Protein Data Bank) is the repository for experimentally (X-ray crystallography, NMR, MS, etc.) protein structures. To check whether a resolved protein structure for HTR2A is available in PDB, a key word search was done. No resolved protein structures were found in PDB.

Therefore an homology modeling using Swiss Model was done. Three protein structures were generated using three different templates. They had different homologies to the template. A summary report can be found here. Three models are given below.

Additionally, the Protein Model Portal reports five models for human HTR2A.

d) Protein-Protein Interactions, Other binding Partners, PTMs and so on

UniProtKB entry of human HTR2A reports an enormous amount of information regarding the protein HTR2A. UniProtKB is a great resource to learn about a particular protein as it contains various important information including binding partners of a protein of interest, post-translational modifications, domains and motifs present, phylogenetic information, etc.

BioGrid 3.4 reports 4 main binding partners of HTR2A and their associations are represented by the protein-protein interaction database String as follows:

Once a 3D structure is resolved its important to identify the presence of any PTMs (Post Translational Modifications). A web resource available for the above task is PhosphoSitePlus. PhosphoSitePlus helps to identify variety of different PTMs on a protein such as phosphorylations, methylations, ubiquitinations, etc. According to it HTR2A has several modification sites and are diagramed in the following figure (Source: PhosphoSitePlus):

PyMol view of the Rat HTR2A model given in the PhosphoSitePlus.

UniProtKB provides links to phylogenomic databases such as PhylomeDB and TreeFam where phylogenetic relationships are revealed. Ensembl showed domains present in the protein.

The Human Protein Atlas reports the tissue specific expression of the protein.

The PaxDB which is a comprehensive absolute protein abundance database, which contains whole genome protein abundance information across organisms and tissues can be used to find the abundance of the protein of interest.

e) Genetic Variants

To identify the presence of any genetic variants of HTR2A NCBI dbSNP was used.

NCBI dbSNP reports 15514 SNPs in the HTR2A gene. One such genetic variation is caused by an SNP rs6308. The SNP causes a transition, C → T, causing missense mutations in the resulting protein. Following diagrams show the gene view of the HTR2A and the SNPs, rs6308.

In one incident the SNP changes the amino acid 363 to valine where the wild amino acid is an alanine. The new amino acid sequence was used for homology modeling using Swiss Model. The resulting three models are given below:

MQFLKSAKQKPNYYHIMLVEDQEEGTLHQFNYCERCSESQNNKCISCVDPEDKSVVIIL
TIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWPLPSK
LCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVG
ISMPIPVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKE
ATLCVSDLGTRAKLASFSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACK
VLGIVFFLFVVMWCPFFITNIMAVICKESCNEDVIGALLNVFVWIGYLSSAVNPLVYTL
FNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYKSSQLQMGQKKNSKQDAKTTDN
DCSMVALGKQHSEEASKDNSDGVNEKVN

1000 genome browser in fact provides insight to population diversity of the interested genetic variation.

f) Gene Expression Analysis

Rapid development in functional genomics with the development of various techniques necessitate the development of repositories for these functional genomic data. One such data repository is GEO (Gene Expression Omnibus). GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. There are three major features of GEO.

1. Provide a versatile database to efficiently store functional genomic data

2. Simple submissions and well annotated data deposits from the research community

3. Provide user friendly mechanisms to query, locate, review or retrieve data

GEO made up of three sub databases: GEO DataSets, GEO Profiles and GEO2R. The GEO DataSets database stores original submitter-supplied records (Series, Samples and Platforms) as well as curated DataSets. Whereas the expression profiles of curated GEO DataSets are stored in GEO Profiles. An interactive web tool, GEO2R, allows users to compare two or more groups of Samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions (How to use GEO2R?).

Therefore GEO can be used to investigate functional genomic aspects of the gene.

Before running GEO2R analysis, value distribution must be observed. The value distribution of the above entry and the results of the analysis are given below.

g) Phylogeny

Proteins are the functional units of life. The fundamental belief in all Molecular Biologists is that all proteins are related in one way or the other. Molecular phylogeny is a field grown on this aspect. The protein sequence of the identified HTR2A retrieved from the Human Genome Browser was used to do a BLAST search and peptide sequences correspond to HTR2A of other few species were retrieved.

>Homo_sapiens
MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEEASKDNSDGVNEKVSCV

>Pan_troglodytes 
MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDIIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEEASKDNSDGVNEKVSCV

>Colobus_angolensis 
MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCFSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKNNSDGVNEKVSCV

>Mandrillus_leucophaeus 
MDILCEENTSLSSTTNSLMQLNEDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHRDPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKDNSDGVNEKVSCV

>Pongo_abelii 
MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFSLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKDNSDGVNEKVSCV

>Macaca_mulatta
MDILCEENTSLSSTTNSLMQLNEDTRLYSNDFNSGEANTSDAFNWTVESENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHRDPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKDNSDGVNEKVSCV

>Cercocebus atys 
MDILCEENTSLSSTTNSLMQLNEDTRLYSNDFNSGETNTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHRDPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKDNSDGVNEKVSCV

>Chlorocebus sabaeus 
MDILCEENTSLSSTTNSLMQFSDDTRLYSNDLNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLH
LQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYR
WPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPI
PVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLAS
FSFLPQSSLSSEKLFQRSIHRDPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICK
ESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYK
SSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEDASKDNSDGVNEKVSCV

First these sequences were used to perform a multiple sequence alignment using ClustalW. Then to observe the phylogenetic relationship of these sequences the following pipeline generated on BCBB.

h) Motif Finding

Sequence or structural motifs in DNA and proteins are important in various ways. There are many web based tools developed to identify these entities in a given sequence of DNA or protein. One such easy to use tool is XXmotif. A part of the isolated DNA sequence was used to identify any motifs within that sequence.

b2gof15/students/jbuddika/final_project.txt · Last modified: 2015/11/03 01:45 by jbuddika
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki