Skip to main content

Table 1 Negative training data sets in individual models, and corresponding accuracy, sensitivity, specificity and AUC values

From: Prediction of plant lncRNA by ensemble machine learning classifiers

Training dataset

Negative data

AUC

Accuracy

Specificity

Sensitivity

  

GB

RF

GB

RF

GB

RF

GB

RF

1

3000 H. sapiens (set A)

0.940

0.943

0.962

0.956

0.988

0.990

0.548

0.404

 

1000 M. musculus (set A)

        
 

3000 O. sativa (set A)

        

2

3000 H. sapiens (set A)

0.943

0.944

0.960

0.953

0.988

0.989

0.576

0.461

 

3000 O. sativa (set A)

        

3

3000 H. sapiens (set A)

0.961

0.962

0.973

0.970

0.990

0.992

0.693

0.592

 

1000 M. musculus (set A)

        
 

3000 A. thaliana (set A)

        

4

3000 H. sapiens (set A)

0.962

0.966

0.972

0.967

0.990

0.990

0.725

0.640

 

3000 A. thaliana (set A)

        

5

3000 H. sapiens (set B)

0.955

0.959

0.965

0.958

0.991

0.980

0.608

0.530

 

3000 A. thaliana (set B)

        

6

4500 H. sapiens (set A + 1500 seq)

0.961

0.967

0.979

0.979

0.995

0.995

0.633

0.571

 

4500 A. thaliana (set A + 1500 seq)

        

7

3000 H. sapiens (set A)

0.963

0.967

0.976

0.971

0.993

0.992

0.700

0.603

 

4500 A. thaliana (set A + 1500 seq)

        

8

2000 H. sapiens (2000 from set A)

0.964

0.965

0.968

0.965

0.988

0.990

0.695

0.619

 

1000 M. musculus (set A)

        
 

3000 A. thaliana (set A)

        
  1. Training datasets of random forest (RF) and gradient boosting (GB) individual models are described. The positive training dataset, 436 validated lncRNAs, remained constant throughout all training datasets. Specificity, sensitivity, accuracy and AUC values were found using 10-fold cross validation of all training data