Regularization is a technique to adjust how closely a model is trained to fit historical data, balancing overfitting and underfitting of a model during training. Both overfitting and underfitting are problems that ultimately cause poor predictions on new data. A supervised model that is overfit will typically perform well on data the model was trained on but perform poorly on data the model has not seen before. A supervised model that is underfit will typically perform poorly on both data the model was trained on and on data the model has not seen before.
Overfitting occurs when a machine learning model is tuned to learn the noise in the data rather than the patterns or trends in the data. Such a model is considered to have “high variance” or “low bias.” Underfitting occurs when the machine learning model does not capture variations in the data – where the variations in data are not caused by noise. Such a model is considered to have “high bias,” or “low variance.”
One way to apply regularization is by adding a parameter that penalizes the loss function when the tuned model is overfit. This allows use of regularization as a parameter that affects how closely the model is trained to fit historical data. More regularization prevents overfitting, while less regularization prevents underfitting. Balancing the regularization parameter helps find a good tradeoff between bias and variance.
Regularization is just one of many advanced machine learning techniques that can be easily employed using the C3 AI Suite and C3 AI Applications, enabling data scientists, developers, and analysts to create robust machine learning models.