Model Validation

What is Model Validation?

Model validation is a phase of machine learning that quantifies the ability of an ML or statistical model to produce predictions or outputs with enough fidelity to be used reliably to achieve business objectives. Model validation quantifies the performance that could be expected from a given machine learning model on unseen data. For this reason, many times model validation is done on datasets that are not used by the model during training. There are different approaches to designating data sets for machine learning model development, such as train/validate/test percent split (80/10/10), k-fold cross validation, and time-based splits. There are also different metrics for calculating the validation performance including accuracy, precision, and recall for classification problems and mean absolute error (MAE) and root mean square error (RMSE) for regression problems.


Why is Model Validation Important?

The data scientist who is developing the model, more often than not, decides what is the right combination of model validation approach and metrics for a given business problem.

These validation approaches and metrics are used iteratively by data scientists to do feature and algorithm selection and hyperparameter tuning. These are critical steps leading up to the development of production-grade AI-based applications.


How enables Model Validation? has out of the box tooling (programmatic and visual) to evaluate and analyze model performances post training and tuning. Model results are iteratively compared against initial validation and tested for satisfactory performance. Best performing models are retrained on the entire dataset and promoted for model deployment. [Application Development Methodology]