Skip to main content

Table 1 Performance of BoostMe using different feature combinations

From: BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues

Features

RMSE (all)

RMSE (int.)

AUROC

AUPRC

Accuracy

Nearest non-missing upstream and downstream neighboring beta values and distances (N)

0.15046 ± 0.00940

0.21429 ± 0.01133

0.94983 ± 0.00774

0.98595 ± 0.00337

0.93743 ± 0.01065

Sample average (A)

0.09594 ± 0.00478

0.14304 ± 0.00649

0.98954 ± 0.00177

0.99424 ± 0.00486

0.96237 ± 0.00464

A, N

0.09330 ± 0.00461

0.13768 ± 0.00620

0.99019 ± 0.00160

0.99769 ± 0.00049

0.96389 ± 0.00457

A, N, transcription factor binding sites

0.09333 ± 0.00459

0.13776 ± 0.00617

0.99018 ± 0.00159

0.99769 ± 0.00049

0.96384 ± 0.00457

A, N, recombination rate

0.09330 ± 0.00462

0.13774 ± 0.00621

0.99018 ± 0.00160

0.99769 ± 0.00049

0.96386 ± 0.00459

A, N, ATAC-seq peaks (P)

0.09327 ± 0.00461

0.13768 ± 0.00620

0.99019 ± 0.00160

0.99769 ± 0.00049

0.96389 ± 0.00457

A, N, histone marks (H)

0.09322 ± 0.00461

0.13758 ± 0.00619

0.99020 ± 0.00159

0.99769 ± 0.00049

0.96393 ± 0.00456

A, N, GENCODE annotations (G)

0.09323 ± 0.00461

0.13759 ± 0.00619

0.99019 ± 0.00159

0.99769 ± 0.00049

0.96390 ± 0.00457

A, N, chromatin states (C)

0.09318 ± 0.00461

0.13759 ± 0.00619

0.99019 ± 0.00159

0.99769 ± 0.00049

0.96390 ± 0.00457

A, N, P, H, G, Ca

0.09311 ± 0.00459

0.13735 ± 0.00616

0.99022 ± 0.00158

0.99770 ± 0.00049

0.96401 ± 0.00454

  1. Feature selection performance was evaluated on holdout validation sets by repeating the training and validation process ten times using ten different random seeds. All metrics were calculated by averaging across all 58 samples and are displayed as mean ± standard deviation. RMSE, root-mean-squared error; int., intermediate beta values, defined as having a sample average between 0.2 and 0.8; AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve. Accuracy was calculated as the number of beta values correctly predicted as methylated or unmethylated divided by the total number of beta values. aFinal set of features used to benchmark performance