Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: A new computational strategy for predicting essential genes

Figure 1

Flow chart for constructing FWM and assessing its performance in predicting essential genes between and within species. (A) FWM construction. During essential gene prediction from species 1 to species 2, the goal of FWM is to calculate the score vector S i and the weighted coefficient vector W. To calculate S i , we mainly employ kernel density estimation (KDE) combined with Naïve Bayes estimation (see Methods). When calculating W, we first collect prior information (e.g., known essential genes in species 2 or from a closely related species); this information is used as training-prediction dataset to assess W in combination with the training set. Finally, we calculate the posterior probability of the genes in species 2 belonging to essential genes based on the weighted Naïve Bayes (WNB) method. (B) FWM performance for predicting essential genes between and within species. To assess the performance of FWM within species (e.g., SCESCE or SPOSPO), 20%, 50%, and 80% of the whole genes were randomly selected as the training set, respectively, and the rest as testing set. We used the training set itself as a training-prediction set to calculate weights; the AUC score for the testing set was then calculated through the WNB method. Finally, the process was replicated 1,000 times to obtain the corresponding AUC distributions. To predict essential genes between species (e.g., SCESPO or SPOSCE), all of the genes in SCE (or SPO) were selected as the training set, 20% (or 50%, 80%) of the SPO (or SCE) genes were randomly selected as the training-prediction set, and the rest of the genes were designated as the testing set. Similar to the comparison within species, AUC distributions were obtained by replicating the process 1,000 times.

Back to article page