Trust in LIME: Yes, No, Maybe So? 

by on September 27, 2018

In this Domino Data Science Field Note, we briefly discuss an algorithm and framework for generating explanations, LIME (Local Interpretable Model-Agnostic Explanations), that may help data scientists, machine learning researchers, and engineers decide whether to trust the predictions of any classifier in any model, including seemingly “black box” models.

Do you trust your model?

Trust is complicated.

Merriam-Webster’s definition of trust includes multiple components and relays potential nuances associated with trust. Data scientists, machine learning researchers, and engineers decide whether to trust the model, or the predictions of the classifiers, when building and deploying models. They also make tradeoffs regarding interpretability. Discussions around levels of rigor with machine learning interpretability as well as the implications regarding the lack of machine learning interpretability have been covered previously in the Domino Data Science Blog. In this blog post, we briefly discuss an algorithm and framework for generating explanations, LIME (Local Interpretable Model-Agnostic Explanations), that may help people decide whether to trust the predictions of any classifier in any model, including seemingly “black box” models.

LIME (Local Interpretable Model-Agnostic Explanations)

In 2016, Marco Tulio Riberio, Sameer Singh, and Carlos Guestrin released a paper, “Why Should I Trust You?: Explaining the Predictions of Any Classifier” that was discussed at KDD 2016 as well as within a blog post. Within the paper, Riberio, Singh, and Guestrin propose LIME as a means of  “providing explanations for individual predictions as a solution to the ‘trust the prediction problem’, and selecting multiple such predictions (and explanations) as a solution to ‘trusting the model’ problem.” Riberio, Singh, and Guestrin also define LIME as “an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model”. Within a later Data Skeptic interview, Ribeiro also indicated that

“we set [it] up as a framework for generating explanations. We did some particular explanations, in particular, linear models, but LIME is basically an equation that tries to balance interpretability with faithfulness. So I’ll say it’s a framework you can derive many different kinds of explanations with the LIME framework. “

Why is Trust a Problem In Machine Learning?

Riberio, Singh, and Guestrin “argue that explaining predictions is an important aspect in getting humans to trust and use machine learning effectively, if the explanations are faithful and intelligible”. They also argue that “machine learning practitioners often have to select a model from a number of alternatives, requiring them to assess the relative trust between two or more models” as well as

“Every machine learning application also requires a certain measure of overall trust in the model. Development and evaluation of a classification model often consists of collecting annotated data, of which a held-out subset is used for automated evaluation. Although this is a useful pipeline for many applications, evaluation on validation data may not correspond to performance “in the wild”, as practitioners often overestimate the accuracy of their models [20], and thus trust cannot rely solely on it. Looking at examples offers an alternative method to assess truth in the model, especially if the examples are explained.”

Overfitting, data leakage, dataset shift (training data differs from test data) are a few ways that “a model or its evaluation can go wrong”.  These types of challenges result in people assessing whether to trust the model during model development and deployment.

How Does LIME Address Trust?

To learn the underlying behavior of the model, LIME enables humans to perturb the input in different ways that makes sense to a human, observe how the predictions may change, and then the humans evaluate whether or not to trust the model for specific tasks.  Riberio, Singh, and Guestrin have used image classification as an example in many papers and talks. In the tree frog example, Riberio, Singh, and Guestrin evaluate whether the classifier can “predict how likely it is for the image to contain a tree frog”. They section the tree frog image into different interpretable components and then perturb, “mask”, or hide various interpretable components. Then

“for each perturbed instance, we get the probability that a tree frog is in the image according to the model. We then learn a simple (linear) model on this data set, which is locally weighted—that is, we care more about making mistakes in perturbed instances that are more similar to the original image. In the end, we present the super pixels with highest positive weights as an explanation, graying out everything else.”

Then the human is able to observe the model predictions, compare it with their own observations of the image, and evaluate whether to trust the model. Riberio notes in the Data Skeptic interview that “we as humans have the intuition… we know what a good explanation looks like in a lot of cases and if we see if the model is acting in sensible ways we tend to trust it more.”

Resources to Consider

This blog post provided a distilled LIME overview by excerpting highlights from LIME research. If you are interested in learning more about LIME and further evaluate LIME as a viable tool to help you decide whether to trust your models, consider diving into the resources that were reviewed and cited in this blog post:

Code

Papers

Videos and Interviews

Blog Posts

Domino Data Science Field Notes provide highlights of data science research, trends, techniques, and more, that support data scientists and data science leaders accelerate their work or careers. If you are interested in your data science work being covered in this blog series, please send us an email at writeforus(at)dominodatalab(dot)com.

Share