What Is Holdout Data? Model Testing Basics

Holdout Data

What is holdout data?

Holdout data refers to a portion of historical, labeled data that is held out of the data sets used for training and validating supervised machine learning models. It can also be called test data. The first step in supervised learning is to test a variety of models against the training data and evaluate the models for predictive performance. After a model is validated and tuned with the validation data set, it is tested with the holdout data set to perform a final evaluation of its accuracy, sensitivity, specificity, and consistency in predicting the right outcomes.

Why is holdout data important?

Holdout data is important in supervised machine learning to verify that the model that was trained and validated on historical data will produce similar performance when using new data while in operation. Holdout data should be kept separate from the training and validation data sets, and only used in the final assessment of the model’s performance. This independence is important to prevent bias and to properly represent the behavior of the model with new data input going forward.

How C3 AI enables organizations to use holdout data

C3 AI makes it easy to manage different data sets for the training, validation, and testing functions of the ML model development life cycle. The C3 Agentic AI Platform is a complete, end-to-end platform for designing, developing, deploying, and operating enterprise AI applications at industrial scale. The C3 Agentic AI Platform supports organizing incoming data into normalized time series and then splitting that data into separate sets for training, validation, and testing, using low-code or no-code methods to adjust the parameters.