# 5 Explainable Machine Learning Models You Should Understand

## Why use complex model when simple do trick?

Photo by Nick Morrison on Unsplash

# The introduction is about something.

Machine Learning is inescapable in our day to day lives. There are product recommendations on Amazon, targeted advertising, and suggestions of what to watch.

It probably wo n’t ruin your life if something goes wrong with these. Maybe you wo n’t get a perfect selfies, or maybe companies will have to spend more on advertising.

What about facial recognition in law enforcement ?

We ca n’t go in blind in high-risk applications. We need to understand and explain our model before it goes anywhere near a production system.

Machine Learning is needed when we are making decisions about people that can have a negative impact on our lives.

Explainable models allow for better understanding of fairness, privacy, causality, and more trust in the model.

# There is a table of contents.

# There are different types of explainability.

- The model can be explained after predictions have been made or after the model has been trained. These methods allow us to explain complex models. These methods are flawed under certain conditions and need an extra layer of complexity to generate explanations. The SHAP and LIME Python packages are examples.
- Some models can be explained out of the box. Some researchers argue that this is n’t always the case, and that these are typically simpler and have less predictive power.

There are inherently explainable models covered in this article.

# There are models that are inherently explainable.

The movie “ Lipton ” was released in the year 2016 org/abs 3 criteria are used to define model explainability.

- Can a human walk through the steps in a reasonable amount of time ?
- Can the model ‘s features, parameters and weights be broken down ?
- Is it possible to understand how the model will react to unseen data ?

The models that meet this criteria are Decision Trees and Logistic Regression. Both meet all 3 criteria if the model does n’t use too many features.

If you use too many features or highly engineered features, you can reduce explainability. We need to explain our data.

There are several lesser known explainable models that are great to have in your toolkit. You can balance explainability with accuracy in your next project if you use these methods.

## Linear models are generalised.

How does it work ?

A GLM is a fancy way of talking about Logistic Regression. There are other linear models that make these more flexible.

There are 3 components to a GLM.

- The regression equation is a linear combination of variables.

Linear Regression Equation. Image by Author.

Our linear combination of variables is linked to a probability distribution. This is the identity link function in linear regression.

The y variable is generated by the probability distribution. This is a normal distribution in linear regression.

We can get different models by changing these. Logistic regression can be achieved using Logit link function.

There is a lesser known version of the regression. The logarithm of y is assumed to be related to the linear combination of variables.

Check out my article to learn more about Logistic Regression.

Why is it understandable ?

- It ‘s just simple math.

A simple linear regression equation is shown in the below graph. We know that we can use the line to calculate our target. We know how the line is calculated.

The maths still hold, we can scale this up to as many variables as we want. We add our terms together to make a result.

The core concept holds true even though Logistic or Poisson Regression are more complex. A linear combination of variables is being summed.

Simple, single variable linear regression. Image by Author.

- Something is said by the coefficients.

The coefficients are given in terms of the target variable. This allows us to make statements.

Increasing square footage by 50m2 will increase our house price.

Logistic regression uses log odds to convert probabilities.

The explainability of linear models is increased by the fact that coefficients can be converted into human-readable statements.

- Interactions have to be programmed.

The interactions between variables can increase complexity. Many explainability methods can be hard to understand. For example.

Increasing square footage by 50m2 will increase our house price by £1000 until we get to 200m2, where our house price increases by £1000 per 50m2

This type of complex relationship is only found in a Linear model if it is calculated and programmed during feature engineering.

Implementation.

## Decision trees.

Decision Tree example for classifying fruit. Image by Author.

How does it work ?

The Decision Tree should have been seen by most people. The best tree to split our data.

To understand how it works, take a look here.

Why is it understandable ?

Decision Trees allows us to just extract the whole tree and follow along with why the model made a prediction.

Humans ca n’t follow large trees due to the rapid growth of leaves. We can still write basic code to highlight the path our data took to reach its prediction, as well as test how the model will react to unseen data.

These are one of the most explainable models due to the lack of math and a concept found elsewhere in the world.

Implementation.

I would always use scikit-learn for this model.

There are many different ways to plot the tree.

## The GAM is a generalised Additive Models.

GAMs vs GLMs. Image by Author.

How does it work ?

Generalised Additive Models allow us to model non- linear relationships in our data, which is one of the main limitations.

GAMs use a series of complex functions to estimate each variable. Splies mean the variables and target value can have a non- linear relationship.

You can read more about GAMs here.

Why is it understandable ?

Logistic or Linear regression are more explainable than GAMs. The model is very complex. They still maintain a level of explainability which is a great trade-off.

- A combination of variables ?

The target variable is the sum of all other variables with some weight, we now have a complex function modelling each variable. Most GAM packages use partial dependence plots to show this function for all features.

The interactions have to be programmed manually. We know how the model will behave on unseen data, as we know the functions for each feature.

Implementation.

The package in R seems to be the best for GAMs. The two best options are Statsmodels and PyGAM.

Microsoft Research has opensourced their Explainable Boosting Machine, which they call GAM 2. It uses GAMs with automatic interaction terms to maintain explainability, increase performance and reduce the need for Data Scientists to get to deep into the model.

## The boost is monotonic.

Monotonic vs Non-Monotonic Relationships. Image by Author.

How does it work ?

Due to the nature of boosting, the best models for tabular data are not interpretable. The models can use hundreds of trees. They tend to work out their own interaction terms. It ‘s common to use SHAP or LIME to increase interpretability.

The target and feature have a linear relationship in a monotonic relationship.

- Your risk of a heart attack increases.
- Your credit score and likelihood of getting a loan go down.
- The number of bike rentals decreases as the amount of rain increases.

Linear models do not generate monotonic relationships because they include interactions and can model non- linear relationships.

There is a simple hyperparameter that can be used to force a variable to have a positive or negative monotonic relationship.

Why is it understandable ?

We can use statements like those above to explain our model using monotonic relationships. The model fills the criteria because the relationship is fixed. We can make the model more understandable for business people by building some of our real world knowledge into it.

Implementation.

- It ‘s called XGBoost.

The monotone_constraints is a string with one number per feature in our dataset. A positive monotonic relationship is 1 and a negative monotonic relationship is 0.

You can import xgboost as xgb.

The model is xgb. The train has a num_boost_round of 1000 and

- LightGBM.

We are required to pass our features as a list rather than a string. The method has an extraParameter. We can choose how strong the model will try to stick the constraint.

The documentation states something.

The most basic constraints method is basic. It does n’t slow the library, but it over-constrains the predictions.

The intermediate method may slow the library a bit. The method is less constraining than the basic method and should improve the results.

The method may slow the library. This method is not as constraining as the intermediate method and should improve the results.

Lightgbm can be imported as lgb.

The model is called the “ monotone_constraints ‘ : [ -1, 0, 1 ]. The train has a num_round of 1000 and

- Catboost.

Catboost is similar to the others but offers more flexibility as we can pass the constraints as an array, use slicing and name a feature explicitly.

You can check out the Catboost docs here.

How does it work ?

TabNet was published by researchers. Neural Network approaches have not improved when dealing with Tabular data. Tabnet was able to beat the leading tree based models. It is more explainable than boosted tree models. It can be used without preprocessing.

TabNet Model Architecture. Image by Author. Inspired by https://arxiv.org/pdf/1908.07442.pdf.

The model is covered in more detail in my article on TabNet.

Why is it understandable ?

TabNet uses a sequential attention mechanism to pick the most important features, this influences the mask whichcovers the least important features. The weights of the mask allow us to understand which features the model is using to make predictions.

The row level of the dataset allows us to explore which features were selected for a single prediction. The model has a number of masks.

Implementation.

Dreamquark has the best way to use TabNet. It uses a scikit-learn style wrapper. The notebooks provided by Dreamquark show how to implement TabNet while also working to verify the claims of the original authors.

There is a classification.

There is regression.

# The models are compared.

To apply the 3 criteria to each model, we need to go back to Lipton. The criteria are a reminder.

- Can a human walk through the steps in a reasonable amount of time ?
- Can the model ‘s features, parameters and weights be broken down ?
- Is it possible to understand how the model will react to unseen data ?

We want to look at Local Explainability, to what extent can a single prediction be made by the model, in terms of what features it used and what extent did it use each feature to make its decision.

I scored each model Low, Medium or High for each criteria. You can consider each score relative to a linear regression.

Any feature engineering can throw these scores off. Creating complex interaction terms, mathematical transformations or features derived from a neural network will decrease explainability. There are more features that can reduce explainability.

Score per explainability criteria for each model. Image by Author.

# Conclusions

The discussions around model explainability are increasing. Having models that can give reasoning behind their decision is more important than ever.

Let me know how it turns out if you give one of these models a try.

## You can learn more.

## You can get my content straight to your inbox.

Source: https://nhadep247.net

Category: Machine