Skip to content
    Latest

    Explaining black-box models using attribute importance, PDPs, and LIME

    In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights into the inner workings of...

    Building a Speaker Recognition Model

    The ability of a system to recognize a person by their voice is a non-intrusive way to collect their biometric information. Unlike fingerprint...

    Building a Named Entity Recognition model using a BiLSTM-CRF network

    In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely available annotated...

    Fitting Support Vector Machines via Quadratic Programming

    In this blog post we take a deep dive into the internals of Support Vector Machines. We derive a Linear SVM classifier, explain its advantages, and...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...

    Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

    In this article, we'll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot...

    On-Demand Spark clusters with GPU acceleration

    Apache Spark has become the de facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition...

    Choosing the right Machine Learning Framework

    Machine learning (ML) frameworks are interfaces that allow data scientists and developers to build and deploy machine learning models faster and...

    Fireside Chat: Stig Pedersen from Topdanmark

    "In having one or two very successful algorithmic deployments, the business then begins coming to you to ask for assistance. It becomes a mutual...

    The importance of structure, coding style, and refactoring in notebooks

    Notebooks are increasingly crucial in the data scientist's toolbox. Although considered relatively new, their history traces back to systems like...

    Evaluating Ray: Distributed Python for Massive Scalability

    Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you...

    Evaluating Generative Adversarial Networks (GANs)

    This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are...

    Data Drift Detection for Image Classifiers

    This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in...

    Model Interpretability: The Conversation Continues

    This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn...

    Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA

    This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider...