Skip to content
    Latest

    Explaining black-box models using attribute importance, PDPs, and LIME

    In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights into the inner workings of...

    Building a Named Entity Recognition model using a BiLSTM-CRF network

    In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely available annotated...

    Accelerating model velocity through Snowflake Java UDF integration

    Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data...

    Fitting Support Vector Machines via Quadratic Programming

    In this blog post we take a deep dive into the internals of Support Vector Machines. We derive a Linear SVM classifier, explain its advantages, and...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...

    Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

    In this article, we'll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot...

    On-Demand Spark clusters with GPU acceleration

    Apache Spark has become the de facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition...

    How to supercharge data exploration with Pandas Profiling

    Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary...

    Performing Non-Compartmental Analysis with Julia and Pumas AI

    When analysing pharmacokinetic data to determine the degree of exposure of a drug and associated pharmacokinetic parameters (e.g., clearance,...

    Density-Based Clustering

    Original content by Manojit Nandi - Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use...

    The Curse of Dimensionality

    Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples)...

    Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

    Domino and Okera - Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In...

    The importance of structure, coding style, and refactoring in notebooks

    Notebooks are increasingly crucial in the data scientist's toolbox. Although considered relatively new, their history traces back to systems like...