Holdout data refers to a portion of historical, labeled data that is held out of the data sets used for training and validating supervised machine learning models. It can also be called test data. The first step in supervised learning is to test a variety of models against the training data and evaluate the models for predictive performance. After a model is validated and tuned with the validation data set, it is tested with the holdout data set to perform a final evaluation of its accuracy, sensitivity, specificity, and consistency in predicting the right outcomes.
Holdout data is important in supervised machine learning to verify that the model that was trained and validated on historical data will produce similar performance when using new data while in operation. Holdout data should be kept separate from the training and validation data sets, and only used in the final assessment of the model’s performance. This independence is important to prevent bias and to properly represent the behavior of the model with new data input going forward.
C3.ai makes it easy to manage different data sets for the training, validation, and testing functions of the ML model development life cycle. The C3 AI® Suite is a complete, end-to-end platform for designing, developing, deploying, and operating enterprise AI applications at industrial scale. Both C3 AI ML Studio and C3 AI Ex Machina support organizing incoming data into normalized time series and then splitting that data into separate sets for training, validation, and testing, using low-code or no-code methods to adjust the parameters.