Latest

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

In this article, we'll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot anomalies that the human...

On-Demand Spark clusters with GPU acceleration

Apache Spark has become the de facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition of the MLlib...

Choosing the right Machine Learning Framework

Machine learning (ML) frameworks are interfaces that allow data scientists and developers to build and deploy machine learning models faster and easier. Machine learning is used...

Bringing ML to Agriculture: Transforming a Millennia-old Industry

Guest post by Jeff Melching from The Climate Corporation At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions...

The importance of structure, coding style, and refactoring in notebooks

Notebooks are increasingly crucial in the data scientist's toolbox. Although considered relatively new, their history traces back to systems like Mathematica and MATLAB. This form of...

Evaluating Ray: Distributed Python for Massive Scalability

Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you are interested...

Evaluating Generative Adversarial Networks (GANs)

This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are interested in a tutorial...

Data Drift Detection for Image Classifiers

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production....

Model Interpretability: The Conversation Continues

This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W....

Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA

This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider as well as...

On Being Model-driven: Metrics and Monitoring

This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and...

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and...

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by...

Next page