Abstract
Heat stress has threatened dairy cows during periods of high temperatures. Dairy farms can often assess the respiratory rate (breahts per minute, bpm) of dairy cows, and then implement timely environmental control measures to mitigate the heat stress, thus enhancing the welfare of animals and production performance. However, previous empirical or numerical models cannot accurately predict the respiratory rate of individual dairy cows in real environments. Data-driven machine learning models can be expected to provide a better representation of the relationship between various variables, particularly on the actual respiratory rate. In this study, a random forest-based prediction model was introduced for the cow respiration rate. Five hyperparameters were fine-tuned using genetic algorithm (GA), differential evolution algorithm (DE), particle swarm optimization algorithm (PSO), and Bayesian optimization (BO). These random forest-based models were compared with the artificial neural network (ANN) and extreme gradient boosting (XGBoost) models under grid search (GS). The training data was obtained from a commercial dairy farm in North China. The thermal environment parameters (air temperature, relative humidity, wind speed, and solar radiation) were integrated with the sampling time blocks and cattle-related variables. The thermal environment variables were utilized as the input features to construct four heat stress indices, namely the temperature-humidity index (THI), adjusted temperature-humidity index (ATHI), equivalent temperature index for cattle (STIC), and skin temperature index (STIC). The dataset was comprised of 3005 records, with 80% allocated for training and 20% for testing. Hyperparameter optimization was conducted on the training set using 5-fold cross-validation. The correlation analysis revealed that there was a highly significant correlation (P<0.01) between all thermal environment variables, heat stress indices, and the respiratory rate of cows. Additionally, significant correlations (P<0.01) were observed between the thermal environment variables and heat stress indices. In feature collinearity, the overall dataset was partitioned into five sub-datasets: EP, THI, ATHI, ETIC, and STIC dataset. The differentiation among these subsets was combined with the distinct environmental variables. Specifically, the EP dataset comprised four environmental parameters, while the rest dataset's features were the corresponding indices. Results indicated that the baseline random forest model achieved the highest prediction accuracy, when utilizing the adjusted temperature-humidity index, time block, milk production of the cows, days of lactation, body posture, and the number of calving as inputs. On the test set of the ATHI feature, the performance of the RF-based model was better than those of the GS-ANN and GS-XGBoost under the four intelligent optimizations. The random forest model was optimized by the differential evolution algorithm (DE-RF), indicating the highest accuracy, with a coefficient of determination (R2), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) on the testing set of 0.614, 7.708 bpm, 14.4%, and 9.730 bpm, respectively. After that, the Bayesian-optimized random forest model (BO-RF) was achieved in a coefficient of determination, mean absolute error, mean absolute percentage error, and root mean square error on the testing set of 0.614, 7.723, 14.4%, and 9.737, respectively. Notably, the BO-RF was only required 1/200 of the time taken by the DE-RF. As a result, the BO-RF displayed the most favourable overall performance, in terms of prediction accuracy and computational time. Subsequently, the relative importance (RI) of the input features was determined using the BO-RF. The feature importance analysis revealed that the adjusted temperature-humidity index held the highest importance in the model prediction (RI value = 0.73), followed by time block (RI value = 0.09), milk production of cattle (RI value = 0.07), and days of lactation (RI value = 0.06). The body posture of the cattle (RI value = 0.03) and the number of calving (RI value = 0.02) shared a marginal impact on the model predictions. This finding can offer valuable insights into the precise and intelligent control system in dairy barns.