Table S1. Method performances on C.elegans TIS-containing data. 17,016 TIS-containing instances were used in three separate five-fold cross-validation experiments. Results from applying a non-stratified parameter set (homogeneous); a priori-known cluster-specific parameter sets for k=3 (cluster-specific); and group-specific parameter sets for a random three-way split of the data (random split) are shown. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).
Deployment | Method | TP | FP | FN | Sn | Sp |
1st-ATG | 14,797 | 2,219 | 0 | 0.8696 | 0.8696 | |
homogeneous | LLKR | 6,852 | 9,185 | 979 | 0.4027 | 0.4273 |
WLLKR | 10,339 | 4,378 | 2,299 | 0.6076 | 0.7025 | |
MFCWLLKR | 12,178 | 4,775 | 63 | 0.7157 | 0.7183 | |
PFCWLLKR | 12,002 | 4,334 | 680 | 0.7053 | 0.7347 | |
BAYES | 7,079 | 6,257 | 3,680 | 0.4160 | 0.5308 | |
cluster-specific | LLKR | 9,618 | 6,638 | 760 | 0.5652 | 0.5917 |
WLLKR | 12,535 | 3,002 | 1,479 | 0.7367 | 0.8068 | |
MFCWLLKR | 13,883 | 3,069 | 64 | 0.8159 | 0.8190 | |
PFCWLLKR | 13,417 | 2,769 | 830 | 0.7885 | 0.8289 | |
BAYES | 9,661 | 4,788 | 2,567 | 0.5678 | 0.6686 | |
random split | LLKR | 6,857 | 9,092 | 1,067 | 0.4030 | 0.4299 |
WLLKR | 10,278 | 4,412 | 2,326 | 0.6040 | 0.6997 | |
MFCWLLKR | 12,189 | 4,764 | 63 | 0.7163 | 0.7190 | |
PFCWLLKR | 11,752 | 4,275 | 989 | 0.6906 | 0.7333 | |
BAYES | 7,159 | 6,283 | 3,574 | 0.4207 | 0.5326 |
Table S2. Method performances on C.elegans non-TIS-containing data. 16,617 non-TIS-containing instances were used in three separate five-fold cross-validation experiments. Results from applying a non-stratified parameter set (homogeneous); a priori-known cluster-specific parameter sets for k=3 (cluster-specific); and group-specific parameter sets for a random three-way split of the data (random split) are shown. TN represents the number of instances for which the method (correctly) refused to predict a TIS, and FP the number for which some prediction was made, though always incorrect. Sn = TN/(TN+FP).
Deployment | Method | TN | FP | Sn |
1st-ATG | 908 | 15,709 | 0.0546 | |
homogeneous | LLKR | 4,464 | 12,153 | 0.2686 |
WLLKR | 6,589 | 10,028 | 0.3965 | |
MFCWLLKR | 2,034 | 14,583 | 0.1224 | |
PFCWLLKR | 5,310 | 11,307 | 0.3196 | |
BAYES | 7,249 | 9,368 | 0.4362 | |
cluster-specific | LLKR | 5,698 | 10,919 | 0.3429 |
WLLKR | 7,239 | 9,378 | 0.4356 | |
MFCWLLKR | 2,381 | 14,236 | 0.1433 | |
PFCWLLKR | 8,244 | 8,373 | 0.4961 | |
BAYES | 8,177 | 8,440 | 0.4921 | |
random split | LLKR | 4,605 | 12,012 | 0.2771 |
WLLKR | 6,577 | 10,040 | 0.3958 | |
MFCWLLKR | 2,040 | 14,577 | 0.1228 | |
PFCWLLKR | 6,224 | 10,393 | 0.3746 | |
BAYES | 7,237 | 9,380 | 0.4355 |
Table S3. Effect of parameter set indexing strategy on PFCWLLKR performance using C.elegans TIS-containing data. 17,016 TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation using Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM) and weight array matrix scores (WAM); parameter indexing under both modulating and static approaches was tested. k=3 clusters were considered. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).
Indexing strategy | TP | FP | FN | Sn | Sp | |
modulating | edit | 11,548 | 4,981 | 487 | 0.6787 | 0.6987 |
PWM | 11,505 | 4,991 | 520 | 0.6761 | 0.6974 | |
WAM | 11,621 | 4,917 | 478 | 0.6829 | 0.7027 | |
static | edit | 12,573 | 3,580 | 863 | 0.7389 | 0.7784 |
PWM | 12,514 | 3,534 | 968 | 0.7354 | 0.7798 | |
WAM | 12,640 | 3,529 | 847 | 0.7428 | 0.7817 |
Table S4. Effect of parameter set indexing strategy on PFCWLLKR performance using C.elegans non-TIS-containing data. 16,617 non-TIS-containing instances were used in five-fold cross-validation experiments, in which parameter sets were selected for putative TIS evaluation using Hamming distance relative to cached medoids (edit), position weight matrix scores (PWM) and weight array matrix scores (WAM); parameter indexing under both modulating and static approaches was tested. k=3 clusters were considered. TN represents the number of instances for which the method (correctly) refused to predict a TIS, and FP the number for which some prediction was made, though always incorrect. Sn = TN/(TN+FP).
Indexing strategy | TN | FP | Sn | |
modulating | edit | 4,470 | 12,147 | 0.2690 |
PWM | 4,524 | 12,093 | 0.2723 | |
WAM | 4,468 | 12,149 | 0.2689 | |
static | edit | 5,466 | 11,151 | 0.3289 |
PWM | 5,560 | 11,057 | 0.3346 | |
WAM | 5,460 | 11,157 | 0.3286 |
Table S5. Method performances on H.sapiens TIS-containing data. 273 TIS-containing instances, obtained from (non-withdrawn) gene annotations in the 30 April 2008 CCDS chromosome 21 annotation available at NCBI, were used in one ten-fold cross-validation experiment. Results from applying a non-stratified parameter set (homogeneous) are shown. TP represents the number of instances for which the method correctly identified a TIS; FP for which a prediction was made, though incorrect; and FN for which no prediction was made, but should have been. Sn = TP/(TP+FP+FN), and Sp = TP/(TP+FP).
Method | TP | FP | FN | Sn | Sp |
1st-ATG | 273 | 0 | 0 | 1.0000 | 1.0000 |
LLKR | 152 | 94 | 27 | 0.5568 | 0.6179 |
WLLKR | 186 | 47 | 40 | 0.6813 | 0.7983 |
MFCWLLKR | 211 | 61 | 1 | 0.7729 | 0.7757 |
PFCWLLKR | 199 | 46 | 28 | 0.7289 | 0.8122 |
BAYES | 144 | 86 | 43 | 0.5275 | 0.6261 |
This site served |