Supplementary Materials for
Sparks, M.E. and Brendel, V. (2008) MetWAMer: eukaryotic translation initiation site prediction


Table S1. Method performances on C.elegans TIS-containing data. 17,016 TIS-containing instances were used in three separate five-fold cross-validation experiments. Results from applying a non-stratified parameter set (homogeneous); a priori-known cluster-specific parameter sets for k=3 (cluster-specific); and group-specific parameter sets for a random three-way split of the data (random split) are shown. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).

DeploymentMethodTPFPFNSnSp

1st-ATG14,7972,21900.86960.8696

homogeneousLLKR6,8529,1859790.40270.4273
WLLKR10,3394,3782,2990.60760.7025
MFCWLLKR12,1784,775630.71570.7183
PFCWLLKR12,0024,3346800.70530.7347
BAYES7,0796,2573,6800.41600.5308

cluster-specificLLKR9,6186,6387600.56520.5917
WLLKR12,5353,0021,4790.73670.8068
MFCWLLKR13,8833,069640.81590.8190
PFCWLLKR13,4172,7698300.78850.8289
BAYES9,6614,7882,5670.56780.6686

random splitLLKR6,8579,0921,0670.40300.4299
WLLKR10,2784,4122,3260.60400.6997
MFCWLLKR12,1894,764630.71630.7190
PFCWLLKR11,7524,2759890.69060.7333
BAYES7,1596,2833,5740.42070.5326

Table S2. Method performances on C.elegans non-TIS-containing data. 16,617 non-TIS-containing instances were used in three separate five-fold cross-validation experiments. Results from applying a non-stratified parameter set (homogeneous); a priori-known cluster-specific parameter sets for k=3 (cluster-specific); and group-specific parameter sets for a random three-way split of the data (random split) are shown. TN represents the number of instances for which the method (correctly) refused to predict a TIS, and FP the number for which some prediction was made, though always incorrect. Sn = TN/(TN+FP).

DeploymentMethodTNFPSn

1st-ATG90815,7090.0546

homogeneousLLKR4,46412,1530.2686
WLLKR6,58910,0280.3965
MFCWLLKR2,03414,5830.1224
PFCWLLKR5,31011,3070.3196
BAYES7,2499,3680.4362

cluster-specificLLKR5,69810,9190.3429
WLLKR7,2399,3780.4356
MFCWLLKR2,38114,2360.1433
PFCWLLKR8,2448,3730.4961
BAYES8,1778,4400.4921

random splitLLKR4,60512,0120.2771
WLLKR6,57710,0400.3958
MFCWLLKR2,04014,5770.1228
PFCWLLKR6,22410,3930.3746
BAYES7,2379,3800.4355

Table S3. Effect of parameter set indexing strategy on PFCWLLKR performance using C.elegans TIS-containing data. 17,016 TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation using Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM) and weight array matrix scores (WAM); parameter indexing under both modulating and static approaches was tested. k=3 clusters were considered. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).

Indexing strategyTPFPFNSnSp

modulatingedit11,5484,9814870.67870.6987
PWM11,5054,9915200.67610.6974
WAM11,6214,9174780.68290.7027

staticedit12,5733,5808630.73890.7784
PWM12,5143,5349680.73540.7798
WAM12,6403,5298470.74280.7817

Table S4. Effect of parameter set indexing strategy on PFCWLLKR performance using C.elegans non-TIS-containing data. 16,617 non-TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation using Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM) and weight array matrix scores (WAM); parameter indexing under both modulating and static approaches was tested. k=3 clusters were considered. TN represents the number of instances for which the method (correctly) refused to predict a TIS, and FP the number for which some prediction was made, though always incorrect. Sn = TN/(TN+FP).

Indexing strategyTNFPSn

modulatingedit4,47012,1470.2690
PWM4,52412,0930.2723
WAM4,46812,1490.2689

staticedit5,46611,1510.3289
PWM5,56011,0570.3346
WAM5,46011,1570.3286

Table S5. Method performances on H.sapiens TIS-containing data. 273 TIS-containing instances, obtained from (non-withdrawn) gene annotations in the 30 April 2008 CCDS chromosome 21 annotation available at NCBI, were used in one ten-fold cross-validation experiment. Results from applying a non-stratified parameter set (homogeneous) are shown. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).

MethodTPFPFNSnSp

1st-ATG273001.00001.0000

LLKR15294270.55680.6179
WLLKR18647400.68130.7983
MFCWLLKR2116110.77290.7757
PFCWLLKR19946280.72890.8122
BAYES14486430.52750.6261

Valid XHTML 1.0 Strict

This site served
as XHTML here.