Feature importance is the most common tool for explaining a machine learning model. It is so popular that many data scientists end up believing that feature importance equals feature goodness.
It is not so.
When a feature is important, it simply means that the model found it useful in the training set. However, this doesn’t say anything about the ability of the feature to generalize on new data!
To account for that, we need to make a distinction between two concepts:
- Prediction Contribution: the weight that a variable has in the predictions made by the model. This is determined by the patterns that the model found on the training set. This is equivalent to feature importance.
- Error Contribution: the weight that a variable has in the errors made by the model on a holdout dataset. This is a better proxy of the feature performance on new data.
In this article, I will explain the logic behind the calculation of these two quantities on a classification model. I will also show an example in which using Error Contribution for feature selection leads to a far better result, compared to using Prediction Contribution.
If you are more interested in regression rather than classification, you can read my previous article “Your Features Are Important? It Doesn’t Mean They Are Good”.
- Starting from a toy example
- Which “error” should we use for classification models?
- How should we manage SHAP values in classification models?
- Computing “Prediction Contribution”
- Computing “Error Contribution”
- A real dataset example
- Proving it works: Recursive Feature Elimination with “Error Contribution”
- Conclusions