MSMEG_3240

Mycobacterium smegmatis is an unusual bacterium which has no outer membrane, but has many qualities which are dissimilar to Gram positive bacteria. In addition to its peptidoglycan cell wall, bacteria of the Mycobacterium genus have a waxy layer of lipids called mycolic acids surrounding them. This extra layer makes them very difficult to lyse and for compounds such as antibiotics to reach their targets. This makes pathogens like Mycobacterium tuberculosis particularly difficult to expunge. Mycobacterium tuberculosis is very slow growing, taking around a month to grow colonies, so the use of another organism to model it is appealing. The organism which is used, Mycobacterium smegmatis, has over 2000 genes with sequence similarity to Mycobacterium tuberculosis but can generate colonies in only three days. Below is a comparison of the Mycobacterium smegmatis MC2 155 genome comparing the relative positions of genes to those in Mycobacterium tuberculosis H37Rv, common lab strains of both organisms. It was generated using the CoGe SynMap tool.

The protein DevR in Mycobacterium Tuberculosis has been identified as a response regulator, but has otherwise not been well characterized. Fortuantely, the protein MSMEG_3240 from Mycobacterium smegmatis has a high degree of sequence similarity to DevR and can be used for preliminary work. MSMEG_3240 shares several domains with response regulators associated with nitrogen sensing including a receiving domain and a C-terminal DNA binding domain. This is shown by in the graphic display of the genome from NCBI's gene database featured below.

The nucleotide and amino acid sequences are shown below:

Nucleotide:
GTGATGACCGATGCGGCACCCACGGTGATGGTGGTCGACGACCACCCGATCTGGCGAGATGCCGTCGCCC
GCGACCTCGCCGACGACGGTTTCGACGTCGTCGCCACGGCCGACGGCGTCGCGTCGGCGTCCCGCCGTGC
CGCGGTGGTCCGCCCCGACGTGGTCCTGATGGACATGCGCCTCGGCGACGGTTCCGGGGCTCAGGCCACC
GCGGAGGTGCTCGCGGTCTCACCGCGGTCGCGCGTGCTGGTGCTGTCGGCCTCCGACGAACGCGACGACG
TGCTGCAGGCGGTCAAGGCAGGCGCCACGGGATATCTGGTGAAGAGCGCATCGAGAACCGAACTCGCCGA
CGCGGTCCGCGCCACCGCGGAGGGCCGCGCGGTCTTCACCCCCGGTCTGGCGGGACTGGTGCTGGGGGAG
TATCGGCGCATCGCGCAACAACCGGCACAGGAGGGGCCCGCGACGCCCACCCTCACCGAACGCGAGACCG
AGATCCTGCGGTATGTGGCGAAAGGCCTGACGGCCAAACAGATCGCCGCGCGTCTTTCGCTGAGTCACCG
CACCGTGGAGAACCACGTGCAGGCGACGTTCCGCAAGCTCCAGGTCGCCAACCGGGTCGAACTCGCCCGC
TACGCGATAGAACACGGGCTGGACGAGTAG

Amino Acid:
MMTDAAPTVMVVDDHPIWRDAVARDLADDGFDVVATADGVASASRRAAVVRPDVVLMDMRLGDGSGAQAT
AEVLAVSPRSRVLVLSASDERDDVLQAVKAGATGYLVKSASRTELADAVRATAEGRAVFTPGLAGLVLGE
YRRIAQQPAQEGPATPTLTERETEILRYVAKGLTAKQIAARLSLSHRTVENHVQATFRKLQVANRVELAR
YAIEHGLDE

The structure of this protein is not known; however, it has 35% sequence similarity to the protein DevR from Mycobacterium smegmatis, so it can be compared to that protein with Swiss Model. The orange sections of the image below are areas predicted to have little structural similarity to MSMEG_3240 and blue areas are predicted to have high similarity. While most of MSMEG_3240 has a high degree of similarity to DevR, the alpha helix and turns near the top of the image diverge significantly from the sequence.

When the sequence for MSMEG_3240 from NCBI's gene database is BLASTed against the NCBI database a number of proteins with sequence similarity are found, most of them from various Mycobactrium strains. When the first ten results from different species are aligned with the Mobyl @Pasteur clustalw-multialign tool and the subsequent subsequence alignment is input into the ClustalW2 Phylogeny program, the following phylogenetic tree is generated.

When the amino acid sequence is put into the genome.jp MOTIF finder the motifs shown below were found. They primarily are helix-turn-helix motifs, which bind DNA. These DNA binding motifs, combined with a putative response regulator, indicate that the protein may be the response regulator part of a two component regulatory system. MSMEG_3240 would likely activate the expression of certain genes when its unknown sensor detected some indicator of a hypoxic environment, as DevR does in Mycobacterium tuberculosis.

To see how the gene's open reading frame on the NCBI database compares to those generated by online programs, the sequence for MSMEG_3240 from NCBI's gene database was taken along with the nucleotides from 3kbp upstream to 7kbp downstream of the gene and this sequence was input in the Augustus and GeneMarkS. The sequence in Augustus used the S. aureus organism demonstartion as it is the only gram positive choice and M. smegmatis is most similar to gram positive bacteria, reported on both strands, and with few alternative transcripts. With GeneMarkS the settings used were Prokaryotic, LST, protein sequence output.

Below are the outputs from the Augustus and GeneMarkS respectively. Neither program found the MSMEG_3240 open reading frame; however, this is not particularly surprising. Both programs were using different genus of bacteria as their reference strains. In addition to this, Mycobacterium are unusual even among bacteria so it is unsurprising that the prediction of genes in them may be different. The final reason for the fact that it at not have been found is that it uses an GTG (valine) start site, which the program may not be as good at predicting although it did find several other open reading frames with GTG start codons.

# This output was generated with AUGUSTUS (version 3.1.0).
# AUGUSTUS is a gene prediction tool written by Mario Stanke (mario.stanke@uni-greifswald.de),
# Oliver Keller, Stefanie König and Lizzy Gerischer.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initialising the parameters using config directory /data/www/augustus/augustus/config/ ...
# s_aureus version. Using species specific transition matrix: /data/www/augustus/augustus/config/species/s_aureus/s_aureus_trans_shadow_bacterium.pbl
# Using species specific overlap length distribution: /data/www/augustus/augustus/config/species/s_aureus/s_aureus_ovlp_len.pbl
# admissible start codons and their probabilities: ATA(0), ATC(0), ATG(0.834), ATT(0), CTG(0), GTG(0.0746), TTG(0.091)
# Looks like /data/www/augustus/tmp/AUG-1753485020/input.fa is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 10660, name = unnamed-1) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 1 on both strands
# start gene g1
unnamed-1	AUGUSTUS	gene	312	1028	0.86	+	.	g1
unnamed-1	AUGUSTUS	transcript	312	1028	0.86	+	.	g1.t1
unnamed-1	AUGUSTUS	start_codon	312	314	.	+	0	transcript_id "g1.t1"; gene_id "g1";
unnamed-1	AUGUSTUS	single	312	1028	0.86	+	0	transcript_id "g1.t1"; gene_id "g1";
unnamed-1	AUGUSTUS	CDS	312	1028	0.86	+	0	transcript_id "g1.t1"; gene_id "g1";
unnamed-1	AUGUSTUS	stop_codon	1026	1028	.	+	0	transcript_id "g1.t1"; gene_id "g1";
# coding sequence = [gtgcatcaaccatctggagaccgtgagtgcggggcggctctacgtcgacggacaactcgtcggctaccgcgaacgcggc
# ggcaaactgcacgagatgaagccgtccgacgtggccaaacagcgtcgcgacgtcggaatggtgttccagcacttcaacttgttcccgcaccgcaccgc
# gctggccaacatcatcgaggcgcccatcaaggtcaagggcgtcaagaagaaggaggccatcgaccgggcccgcgatctgctcaaccaggtgggtctgg
# cggacaaggccgaggcctacccggcgcagctgtcgggtggtcagcaacagcgcgtggccatcgcgcgcgcgctcgcgatgaaccccaagctcatgctg
# ttcgacgagcccacctcggcgctggaccccgaactcgtcggcgatgtcctcggcgtgatgaagaagctcgcctccgagggcatgaccatggtggtggt
# cactcacgagatgggtttcgcgcgcgaggtcgccgacaagctggtcttcatggacggcggcgtcatcgtcgagagcggcgatccccgcgaggtcatgg
# caaacccgaaacacgaacggacaaaagccttcctgtccaaggtgatgtagcccgtcgggtagcgtcggacgggtggcgacaactcgttccgacgtctt
# catcagcaccgcggagctcatccagctgctcgcggcaggcggcccggtga]
# protein sequence = [MHQPSGDRECGAALRRRTTRRLPRTRRQTARDEAVRRGQTASRRRNGVPALQLVPAPHRAGQHHRGAHQGQGRQEEGG
# HRPGPRSAQPGGSGGQGRGLPGAAVGWSATARGHRARARDEPQAHAVRRAHLGAGPRTRRRCPRRDEEARLRGHDHGGGHSRDGFRARGRRQAGLHGR
# RRHRRERRSPRGHGKPETRTDKSLPVQGDVARRVASDGWRQLVPTSSSAPRSSSSCSRQAAR]
# end gene g1
###
# start gene g2
unnamed-1	AUGUSTUS	gene	2035	2433	0.81	+	.	g2
unnamed-1	AUGUSTUS	transcript	2035	2433	0.81	+	.	g2.t1
unnamed-1	AUGUSTUS	start_codon	2035	2037	.	+	0	transcript_id "g2.t1"; gene_id "g2";
unnamed-1	AUGUSTUS	single	2035	2433	0.81	+	0	transcript_id "g2.t1"; gene_id "g2";
unnamed-1	AUGUSTUS	CDS	2035	2433	0.81	+	0	transcript_id "g2.t1"; gene_id "g2";
unnamed-1	AUGUSTUS	stop_codon	2431	2433	.	+	0	transcript_id "g2.t1"; gene_id "g2";
# coding sequence = [atgcgcgacggtctatctgaccgggtcccgcaggcgcgggacgtgggtgatcgccgagctcgtcgtggtggtcgcgctg
# atgctgtcgacggagctggtggcgtccgaacagtggatcgccgacaaccagtcctggccgacgacgctgtgggcgaccaacgccaccatctcggtggc
# gttgcacttcggcccgatcgggggcatgtccgccgggctcgcggtgatggcgacggtcgcgctgctcaagggccatgtgagcgtcaacctcggccgca
# acgccaccatcgtgatcgagctcgcggtcggtctggctgtcgggatggccgcgcagaccgcgcggcgcgcgcacgccgaactggaacgcgccgtgcga
# ctctcggcggccctggaggaacgtga]
# protein sequence = [MRDGLSDRVPQARDVGDRRARRGGRADAVDGAGGVRTVDRRQPVLADDAVGDQRHHLGGVALRPDRGHVRRARGDGDG
# RAAQGPCERQPRPQRHHRDRARGRSGCRDGRADRAARARRTGTRRATLGGPGGT]
# end gene g2
###
# start gene g3
unnamed-1	AUGUSTUS	gene	3473	3670	0.7	-	.	g3
unnamed-1	AUGUSTUS	transcript	3473	3670	0.7	-	.	g3.t1
unnamed-1	AUGUSTUS	stop_codon	3473	3475	.	-	0	transcript_id "g3.t1"; gene_id "g3";
unnamed-1	AUGUSTUS	single	3473	3670	0.7	-	0	transcript_id "g3.t1"; gene_id "g3";
unnamed-1	AUGUSTUS	CDS	3473	3670	0.7	-	0	transcript_id "g3.t1"; gene_id "g3";
unnamed-1	AUGUSTUS	start_codon	3668	3670	.	-	0	transcript_id "g3.t1"; gene_id "g3";
# coding sequence = [atgagtaggactactcgtccagcccgtgttctatcgcgtagcgggcgagttcgacccggttggcgacctggagcttgcg
# gaacgtcgcctgcacgtggttctccacggtgcggtgactcagcgaaagacgcgcggcgatctgtttggccgtcaggcctttcgccacataccgcagga
# tctcggtctcgcgttcggtga]
# protein sequence = [MSRTTRPARVLSRSGRVRPGWRPGACGTSPARGSPRCGDSAKDARRSVWPSGLSPHTAGSRSRVR]
# end gene g3
###
# start gene g4
unnamed-1	AUGUSTUS	gene	3859	5580	0.86	+	.	g4
unnamed-1	AUGUSTUS	transcript	3859	5580	0.86	+	.	g4.t1
unnamed-1	AUGUSTUS	start_codon	3859	3861	.	+	0	transcript_id "g4.t1"; gene_id "g4";
unnamed-1	AUGUSTUS	single	3859	5580	0.86	+	0	transcript_id "g4.t1"; gene_id "g4";
unnamed-1	AUGUSTUS	CDS	3859	5580	0.86	+	0	transcript_id "g4.t1"; gene_id "g4";
unnamed-1	AUGUSTUS	stop_codon	5578	5580	.	+	0	transcript_id "g4.t1"; gene_id "g4";
# coding sequence = [gtggtccccgcaggcgcctcagcagccgccccggcagtggaccccgcaaccggttgcgccggtcgccccggtggccgtg
# cctgcccggcaaccggacaccccgcccaagccccgctcggaagggtggatcggcaaggtactggcgatggccggtgtcgcagtgacgctcgtcggcgt
# ggtgtcgctgctggtgctggccgcacaggccggcatcctgcgccccgaggtgcgagtggcggcaggggcggccctggccgtggcgctggtggccgtcg
# cgatgtggctcgaccgccgtcccggtggccgggtcggggcggtcgcactcgcggcgaccggtgtggccgcggcgtacatggacgtgatcgcggtgacg
# gccatctacgaatgggtcccggccccggtgggtttggcgctggccggggtcgtcggggcctgcggcctgatgctggcgcggtggtggggctccgagca
# actcgggttgctggtgttcgtgccgctgatcgcgttggcgccggtgatcaccgacggcgtgacgctgctgctgatcggtttcatgctggcgctgtcgg
# cggcgtcgcttccggtgcagttcggccgggactggttctggctgcacgccgcgcgcacggccgcggtgacgatcccgctgctcgtcgcactggtctcg
# gccgcgatcggcgggcgcgaggatctccggctggccctggtgtgtgcgctggcggcggtgcctgcgctcgtgggcggtgtgacggtgtcgcgcttcag
# caccaggccggtggcgaccaccgtggtctcggcgctgggaacggttccgctgctgtgtgtttcggccaccgcggaccgcgtgcctgccgtcctgctga
# tcgccgggctggcggccgcggcgctggccgtcgcggccatcggggaccggctgccgggcatcggggcgccggtgcgccgggtgtgggccgcaacctcg
# gcggcggccgctctgatcgccgtgctcgtggcgttcgacggcacggtggccgcgccggtgctgctggcgatgtcgatcgcgatcgcggtgggcgggca
# gcgcgatccggtcgggcgttgcgccgcaatcggtttcgcgttgatcggcgcgatgttctacctcgaccacgcggcgccggccatgctcgtcgaggcga
# caccgctcgacggcccgactgtcgcctcggtcgtgatcggcagtgtgatgctgatcggtgccgcggccgcgaacggctggacatggtcgcggacggtg
# tccgataccgaggttgtgcgcctggtgtgggtcgcggtatcagcggtgatcggatacgcggcgaccgcgctgaccgtcacggtcggggtggcgctggg
# cggggcggaggtgggctttttggccgggcatatggccgcaacgctcagttggatcgtggccgcggccctggcgttcggatacgccgcacgacgtccgg
# gcgcatcgcggtcggtgctgatcggcgggggactggtgctggtggccgcggcgacgggcaagctgttcctgttcgacctcggcacgctggacggcatg
# taccgcgtcgtgctgttcatcgtgggcgggctggtgctgctgggaatgggcgcgggttatgcacggtttctggcccagcagtccgacggccggtcgga
# tgcgcaaccgggaacggatcacgaggcccactcgacgtgacgtggcagggctcacaaagagcaaatttggaatga]
# protein sequence = [MVPAGASAAAPAVDPATGCAGRPGGRACPATGHPAQAPLGRVDRQGTGDGRCRSDARRRGVAAGAGRTGRHPAPRGAS
# GGRGGPGRGAGGRRDVARPPSRWPGRGGRTRGDRCGRGVHGRDRGDGHLRMGPGPGGFGAGRGRRGLRPDAGAVVGLRATRVAGVRAADRVGAGDHRR
# RDAAADRFHAGAVGGVASGAVRPGLVLAARRAHGRGDDPAARRTGLGRDRRARGSPAGPGVCAGGGACARGRCDGVALQHQAGGDHRGLGAGNGSAAV
# CFGHRGPRACRPADRRAGGRGAGRRGHRGPAAGHRGAGAPGVGRNLGGGRSDRRARGVRRHGGRAGAAGDVDRDRGGRAARSGRALRRNRFRVDRRDV
# LPRPRGAGHARRGDTARRPDCRLGRDRQCDADRCRGRERLDMVADGVRYRGCAPGVGRGISGDRIRGDRADRHGRGGAGRGGGGLFGRAYGRNAQLDR
# GRGPGVRIRRTTSGRIAVGADRRGTGAGGRGDGQAVPVRPRHAGRHVPRRAVHRGRAGAAGNGRGLCTVSGPAVRRPVGCATGNGSRGPLDVTWQGSQ
# RANLE]
# end gene g4
###
# start gene g5
unnamed-1	AUGUSTUS	gene	5630	6031	1	-	.	g5
unnamed-1	AUGUSTUS	transcript	5630	6031	1	-	.	g5.t1
unnamed-1	AUGUSTUS	stop_codon	5630	5632	.	-	0	transcript_id "g5.t1"; gene_id "g5";
unnamed-1	AUGUSTUS	single	5630	6031	1	-	0	transcript_id "g5.t1"; gene_id "g5";
unnamed-1	AUGUSTUS	CDS	5630	6031	1	-	0	transcript_id "g5.t1"; gene_id "g5";
unnamed-1	AUGUSTUS	start_codon	6029	6031	.	-	0	transcript_id "g5.t1"; gene_id "g5";
# coding sequence = [gtgcagcagatcggccgtgctgggatcctcggcgtccaccgcgtcgtggacgcgccggatggtgtcgaccgtggcgttg
# atacgggtggtgatgaggtcgacgacatcggctgtgctgcgctcgaacgcggggaattccggcagcgtcgtggtggccgccacggtgtcggaacggcc
# gtcgggaacggcgtccagcgcgcgcatccgttcggcgatcgtgtcgctgccctcgcgcgcgaagtcgaccaactcgtcgagctgcaggtgcaggtcac
# ggaagttgctgcccaccacgttccaatgggcctgtttgccctgcagggacagctcgatcaggtcgacgagaaccttctggaggttgccgccgaactcc
# ggtgtggcatggaaaccttggatatctga]
# protein sequence = [MQQIGRAGILGVHRVVDAPDGVDRGVDTGGDEVDDIGCAALERGEFRQRRGGRHGVGTAVGNGVQRAHPFGDRVAALA
# REVDQLVELQVQVTEVAAHHVPMGLFALQGQLDQVDENLLEVAAELRCGMETLDI]
# end gene g5
###
# start gene g6
unnamed-1	AUGUSTUS	gene	6252	6842	0.79	+	.	g6
unnamed-1	AUGUSTUS	transcript	6252	6842	0.79	+	.	g6.t1
unnamed-1	AUGUSTUS	start_codon	6252	6254	.	+	0	transcript_id "g6.t1"; gene_id "g6";
unnamed-1	AUGUSTUS	single	6252	6842	0.79	+	0	transcript_id "g6.t1"; gene_id "g6";
unnamed-1	AUGUSTUS	CDS	6252	6842	0.79	+	0	transcript_id "g6.t1"; gene_id "g6";
unnamed-1	AUGUSTUS	stop_codon	6840	6842	.	+	0	transcript_id "g6.t1"; gene_id "g6";
# coding sequence = [gtggcgcaaacggaggcacatgtcgtcgggcacctggatgcgcccttcgacatcggtgctctccatacgcgaggcgaca
# ttgaccgcgtcaccccacacgtcgtagaagaaccggcgcgcaccgaccaccccggccaccaccggtccggcggccaggccgatccgcagtggtacgcg
# cctgccttcgggatcggtgagatcggcgacggccgcggccatgtcgagcgcgagtgccgcgagcgcctcggcgtggtcggtgcggggctcggggatgc
# cgccgaccaccatgtacgaatcgccgctggtcttgaccttctccaggcagtgctgctcgacgagcgcatcgagatcggtgtagagcgtgtcgaggaac
# cgcaccagatcacacggcgcggtctcgctggcgcgcttggtgtagccggcgatgtcggcgaacaggatcgaggcgtcgtcgtatcggtcggcgatgat
# ggtgcgcgccgggtctttgagccgtgtcgcgatcgtggcgggaagaatgttcgcgagcaacttctccgagcgctggtactcggcctccatcgcgtcct
# cggcgcgcgcgatctcgcgtag]
# protein sequence = [MAQTEAHVVGHLDAPFDIGALHTRGDIDRVTPHVVEEPARTDHPGHHRSGGQADPQWYAPAFGIGEIGDGRGHVEREC
# RERLGVVGAGLGDAADHHVRIAAGLDLLQAVLLDERIEIGVERVEEPHQITRRGLAGALGVAGDVGEQDRGVVVSVGDDGARRVFEPCRDRGGKNVRE
# QLLRALVLGLHRVLGARDLA]
# end gene g6
###
# command line:
# /data/www/augustus/augustus/bin/augustus --species=s_aureus --strand=both --singlestrand=false --genemodel=partial --codingseq=on --sample=100 --keep_viterbi=true --alternatives-from-sampling=true --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=2 /data/www/augustus/tmp/AUG-1753485020/input.fa --exonnames=on
GeneMark.hmm PROKARYOTIC (Version 3.26)
Date: Tue Oct 13 14:01:24 2015
Sequence file name: seq.fna
Model file name: GeneMark_hmm_heuristic.mod
RBS: false
Model information: Heuristic_model_for_genetic_code_11_and_GC_69

FASTA definition line: empty-fasta-def-line
Predicted genes
   Gene    Strand    LeftEnd    RightEnd       Gene     Class
    #                                         Length
    1        +          <2         148          147        1
    2        +         145         930          786        1
    3        +         953        1813          861        1
    4        +        1868        3004         1137        1
    5        +        3004        3660          657        1
    6        +        3704        5545         1842        1
    7        +        5609        6094          486        1
    8        -        6099        6911          813        1
    9        +        8173        8310          138        1
   10        +        8643        9269          627        1
   11        +        9547      >10659         1113        1




>gene_1|GeneMark.hmm|48_aa|+|2|148	>empty-fasta-def-line
TWYLVITSILMVGQYYLERYYSRGASRKLTTKQLEALAKAQTVGEAHP

>gene_2|GeneMark.hmm|261_aa|+|145|930	>empty-fasta-def-line
VTTAEATSGDYMVRAESVCKNFGALKVLRGVTLNVSKGQVLVLVGPSGSGKSTFLRCINH
LETVSAGRLYVDGQLVGYRERGGKLHEMKPSDVAKQRRDVGMVFQHFNLFPHRTALANII
EAPIKVKGVKKKEAIDRARDLLNQVGLADKAEAYPAQLSGGQQQRVAIARALAMNPKLML
FDEPTSALDPELVGDVLGVMKKLASEGMTMVVVTHEMGFAREVADKLVFMDGGVIVESGD
PREVMANPKHERTKAFLSKVM

>gene_3|GeneMark.hmm|286_aa|+|953|1813	>empty-fasta-def-line
VATTRSDVFISTAELIQLLAAGGPVTLLDVRWTLAEPNGEQAYLDGHLPGAVYVSLDDEL
ADHTVRGRGRHPLPSGRHLEAAARRWGVRDGVPTVVYDDWNRAGSARAWWCLTAAGISGV
RILDGGLGAWVAGGGGVETGPVTPEPGDVRVVHDDLYRGALPTLTADDVQSAAALIDARA
PERFRGEVEPVDPVAGHVPGAVNLPSTGLLNPDGTLRDEAQVRALLADRGVDDTGDTAVG
AYCGSGVTAALTVAGLAAAGVDAALFPGSWSEWVCDPGRPVARGEK

>gene_4|GeneMark.hmm|378_aa|+|1868|3004	>empty-fasta-def-line
VQREPDPVTPLWRAAQGFRLLSCLYALGFHIAITDDLRRPVLGWVLFAGLIVWSAACATV
YLTGSRRRGTWVIAELVVVVALMLSTELVASEQWIADNQSWPTTLWATNATISVALHFGP
IGGMSAGLAVMATVALLKGHVSVNLGRNATIVIELAVGLAVGMAAQTARRAHAELERAVR
LSAALEERERLSRRVHDGAIQVLALVARRGREIGGETAKLAELAGEQERALRRLVSAADT
DTMAGPLTDVGALLRTRASDRVSVSVPAEPVLLDHPVARELFAAAENALDNVAAHAGADA
RAFVLLEDLGEEVTVSIRDDGVGIPEGRLAEAERQGRMGVAKSIVGRMDWLGGTAVLTTG
PDSGTEWELTVPRTRKGQ

>gene_5|GeneMark.hmm|218_aa|+|3004|3660	>empty-fasta-def-line
MTDAAPTVMVVDDHPIWRDAVARDLADDGFDVVATADGVASASRRAAVVRPDVVLMDMRL
GDGSGAQATAEVLAVSPRSRVLVLSASDERDDVLQAVKAGATGYLVKSASRTELADAVRA
TAEGRAVFTPGLAGLVLGEYRRIAQQPAQEGPATPTLTERETEILRYVAKGLTAKQIAAR
LSLSHRTVENHVQATFRKLQVANRVELARYAIEHGLDE

>gene_6|GeneMark.hmm|613_aa|+|3704|5545	>empty-fasta-def-line
MTEPQRAVIARVTADLTAVSAYLNRMAGDLATLDRLVAQQSAAPRPEAVAPQWSPQAPQQ
PPRQWTPQPVAPVAPVAVPARQPDTPPKPRSEGWIGKVLAMAGVAVTLVGVVSLLVLAAQ
AGILRPEVRVAAGAALAVALVAVAMWLDRRPGGRVGAVALAATGVAAAYMDVIAVTAIYE
WVPAPVGLALAGVVGACGLMLARWWGSEQLGLLVFVPLIALAPVITDGVTLLLIGFMLAL
SAASLPVQFGRDWFWLHAARTAAVTIPLLVALVSAAIGGREDLRLALVCALAAVPALVGG
VTVSRFSTRPVATTVVSALGTVPLLCVSATADRVPAVLLIAGLAAAALAVAAIGDRLPGI
GAPVRRVWAATSAAAALIAVLVAFDGTVAAPVLLAMSIAIAVGGQRDPVGRCAAIGFALI
GAMFYLDHAAPAMLVEATPLDGPTVASVVIGSVMLIGAAAANGWTWSRTVSDTEVVRLVW
VAVSAVIGYAATALTVTVGVALGGAEVGFLAGHMAATLSWIVAAALAFGYAARRPGASRS
VLIGGGLVLVAAATGKLFLFDLGTLDGMYRVVLFIVGGLVLLGMGAGYARFLAQQSDGRS
DAQPGTDHEAHST

>gene_7|GeneMark.hmm|161_aa|+|5609|6094	>empty-fasta-def-line
MSARRTESDIQGFHATPEFGGNLQKVLVDLIELSLQGKQAHWNVVGSNFRDLHLQLDELV
DFAREGSDTIAERMRALDAVPDGRSDTVAATTTLPEFPAFERSTADVVDLITTRINATVD
TIRRVHDAVDAEDPSTADLLHGLIDGLEKQAWLIRSENRKV

>gene_8|GeneMark.hmm|270_aa|-|6099|6911	>empty-fasta-def-line
LSVGFVISTVSAAVMVVATVWSALREIARAEDAMEAEYQRSEKLLANILPATIATRLKDP
ARTIIADRYDDASILFADIAGYTKRASETAPCDLVRFLDTLYTDLDALVEQHCLEKVKTS
GDSYMVVGGIPEPRTDHAEALAALALDMAAAVADLTDPEGRRVPLRIGLAAGPVVAGVVG
ARRFFYDVWGDAVNVASRMESTDVEGRIQVPDDMCLRLRHAFVLEERGEVEVKGKGVMRT
WYLVGRRDGERAPLRTGDARSESVGNPAGG

>gene_9|GeneMark.hmm|45_aa|+|8173|8310	>empty-fasta-def-line
MDLLFAVLPGMAGLVLLTAAGGAIGVRHARAAQAVPAPQIARFMA

>gene_10|GeneMark.hmm|208_aa|+|8643|9269	>empty-fasta-def-line
MTGSTTDADRPRRVLIAEDEALIRLDLAEMLREEGYEVVGEAGDGQEAVEMAESLRPDLV
IMDVKMPRRDGIDAASEIASKRIAPIVILTAFSQRELVERARDAGAMAYLVKPFNINDLV
PAIEVAVSRFAELSALETEVATLSERLETRKLVERAKGLLQAKHKMTEPEAFKWIQRAAM
DRRTTMKRVAEVVLETLDDTKQAPAPEQ

>gene_11|GeneMark.hmm|371_aa|+|9547|10659	>empty-fasta-def-line
VVALAIAGCNQSTPEEEAAQTDLKIVEKVQIDENGAEVTGAGDVTPADPAGDGNAVCPPV
MIAMMGALNGPDAALGINIKNGVQMAIDKHNAANAQCQVQLKAFDTEGDPQKATGVAPQI
VDEPFIIGVVGPAFSGETKATGDVFNQAGLVATTASATNVQLSENGWRTFFRGLANDGVQ
GPSVANYMKNTLENKKVCVVDDSTDYGLGLAEAVRTTLGPVADASCNISVKKGDKDFSAA
VTQIKGAAPDSVFYSGYYSEAAPFVQQLKDGGVEATFISADGTKDPEFVKQAGESSKGAL
LSCPCGPATAEFAEEYTQKFGQEPGTYSTEGYDLGTILLKGIDSGAITRADLLNYVRNYE
GQGVARKYQWT
b2gof15/students/jostrnat/finalproject.txt · Last modified: 2015/11/03 19:57 by jostrnat
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki