Skip to content

    Model Interpretability with TCAV (Testing with Concept Activation Vectors)

    on January 20, 2019

    This Domino Data Science Field Note provides very distilled insights and excerpts from Been Kim’s recent MLConf 2018 talk and research about Testing with Concept Activation Vectors (TCAV), an interpretability method that allows researchers to understand and quantitatively measure the high-level concepts their neural network models are using for prediction, “even if the concept was not part of the training". If interested in additional insights not provided in this blog post, please refer to the MLConf 2018 video, the ICML 2018 video, and the paper.


    What if there was a way to quantitatively measure whether your machine learning (ML) model reflects specific domain expertise or potential bias? with post-training explanations? on a global level instead of a local level? Would industry people be interested? These are the kind of questions Been Kim, Senior Research Scientist at Google Brain, poised in the MLConf 2018 talk, “Interpretability Beyond Feature Attribution: Testing with Concept Activation Vectors (TCAV)”. The MLConf talk is based on a paper Kim co-authored and the code is available. This Domino Data Science Field Note provides some distilled insights about TCAV, an interpretability method that allows researchers to understand and quantitatively measure the high-level concepts their neural network models are using for prediction, “even if the concept was not part of the training” (Kim Slide 33). TCAV “uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result” (Kim et al 2018).

    Introducing Concept Activation Vectors (CAVs)

    So. What are Concept Activation Vectors or CAVs? CAVs = the name created by the researchers for representing a high-level concept as a vector and “activation” is the “direction of the values”.

    Within the research, Kim et al. indicate that

    “CAV for a concept is simply a vector in the direction of the values (e.g., activations) of that concept’s set of examples… we derive CAVs by training a linear classifier between a concept’s examples and random counter examples and then taking the vector orthogonal to the decision boundary.“

    In the MLConf talk, Kim relays that “Well, this vector simply means that a direction is pointing from away from random, to more like the concept. And this idea of having this such a vector is not new” as gender direction  was discussed within a word2vec paper. Kim also indicates that CAVs  are “just another way to retrieve that vector”.

    Testing with Concept Activation Vectors (TCAV): The Zebra

    A prominent image classification example used to showcase TCAV within the research is “how sensitive a prediction of zebra is to the presence of stripes”. In the MLConf talk, Kim unpacked obtaining the quantitative explanation or the TCAV score

    “The core idea of TCAV is taking the directional derivative of the logit layer, the probability of zebra with respect to that vector we just got ….and intuitively what this means is if I make this picture more like the concept or a little less like the concept, how much would the probability of zebra change? If it changes a lot, it's an important concept, if it doesn't change a lot, it's not an important concept. We do that for many, many zebra pictures to get more robust and reliable explanation. The final score, TCAV score is simply the ratio of zebra pictures where having stripeness positively increased the probability of zebra. In other words, if you have a 100 pictures of zebras, how many of them returned positive directional derivative? That's it.”

    It is also important to note as indicated on Slide 38 of the MLConf talk that

    “TCAV provides quantitative importance of a concept if and only if your network has learned about it”.

    Why Consider TCAV?

    In the MLConf talk, Kim argues for using interpretability as a tool to work towards using ML “more responsibly and more safely”. Within the paper, Kim et al indicate that

    “TCAV is a step toward creating a human-friendly linear interpretation of the internal state of a deep learning model so that questions about model decisions may be answered in terms of natural high-level concepts”.

    Using high-level concepts is potentially useful when translating seemingly black box models to stakeholders that are not experts in ML. For example, when building a model to help medical doctors to do a diagnosis, it may be helpful to test your model using doctor-friendly high-level concepts to surface bias or level of domain expertise. This blog post provided distilled highlights and excerpts from recent research on TCAV. If interested additional insights, please refer to the following resources that were referenced and reviewed during the writing of this blog post.


    MLConf 2018

    ICML 2018: TCAV-related Links

    Other Related Papers and Slide Decks

    Domino Data Science Field Notes provide highlights of data science research, trends, techniques, and more, that support data scientists and data science leaders accelerate their work or careers. If you are interested in your data science work being covered in this blog series, please send us an email at writeforus(at)dominodatalab(dot)com.

    Other posts you might be interested in

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.