Analysis of merged whole blood transcriptomic datasets to identify circulating molecular biomarkers of feed efficiency in growing pigs

Table 3 Iterative steps for model reduction to predict FCR values¹

	Number of probes	Number of genes	R²	RMSE
Random Forest procedure
FCR	604	411	0.42	0.366
	100	58	0.62	0.301
	50	30	0.65	0.293
	25	17	0.67	0.281
	10	8	0.68	0.278
Gradient Tree Boosting
FCR	728	477	0.78	0.241
	100	56	0.79	0.235
	50	27	0.80	0.234
	25	12	0.81	0.229
	10	5	0.80	0.223

Random forest (RF) or gradient treenet boosting (GTB) algorithms were applied on a transcriptomic dataset containing 26,687 molecular probes measured in whole blood sampled from 148 pigs. Dataset was split into training (n = 74) and validation test (n = 74) subsets to evaluate model performance in predicting food conversion ratio (FCR). The first rounds led to model stabilization with 604 molecular probes as very important variables (VIP) for FCR prediction using RF and 728 probes for FCR prediction with GTB, respectively, out of the 26,687 expressed annotated probes. The second entry was an iterative step of the former procedure, but considering the VIP identified in the first step as the new inputs. This increased the accuracy of the prediction evaluated by the root mean square error (RMSE) and the coefficient of determination (R²). Iterative steps were further performed. The numbers of annotated probes and their corresponding unique genes identified as VIP were indicated at each step. Iterative models were almost equivalent in performance, so that the ones including 27–30 unique genes were further selected. Models obtained with GTB algorithms performed better than those obtained by using RF procedures

ISSN: 1471-2164