Skip to content
    Latest

    Spark, Dask, and Ray: Choosing the Right Framework

    Apache Spark, Dask, and Ray are three of the most popular frameworks for distributed computing. In this blog post we look at their history, intended use-cases, strengths and...

    Data Science

    Data Exploration with Pandas Profiler and D-Tale

    We all have heard how data is the new oil. I always say that if that is the case, we need to go through some refinement process before that raw oil...

    Explaining black-box models using attribute importance, PDPs, and LIME

    In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights...

    Building a Speaker Recognition Model

    The ability of a system to recognize a person by their voice is a non-intrusive way to collect their biometric information. Unlike fingerprint...

    Fundamentals of Signal Processing

    Basics of digital signal processing A signal is defined as any physical quantity that varies with time, space or any other independent...
    Code

    Building a Named Entity Recognition model using a BiLSTM-CRF network

    In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely available annotated...

    Accelerating model velocity through Snowflake Java UDF integration

    Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data...
    Machine Learning

    Fitting Support Vector Machines via Quadratic Programming

    In this blog post we take a deep dive into the internals of Support Vector Machines. We derive a Linear SVM classifier, explain its advantages, and...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...
    Practical Techniques

    Enterprise-class NLP with spaCy v3

    spaCy is a python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin...

    How to supercharge data exploration with Pandas Profiling

    Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary...
    Leaders at Work

    Defining clear metrics to drive model adoption and value creation

    One of the biggest ironies of enterprise data science is that although data science teams are masters at using probabilistic models and diagnostic...

    Announcement: Domino is fully Kubernetes native

    Last week we announced that Domino is now fully Kubernetes native. This is great news for data science teams and IT organizations building modern DS...
    Model Management

    The Role of Containers on MLOps and Model Production

    Container technology has changed the way data science gets done. The original container use case for data science focused on what I call,...

    Manual Feature Engineering

    Many thanks to AWP Pearson for the permission to excerpt "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine...
    Engineering

    Announcing Domino 3.4: Furthering Collaboration with Activity Feed

    Our last release, Domino 3.3 saw the addition of two major capabilities: Datasets and Experiment Manager. “Datasets”, a high-performance, revisioned...