Latest

Analyzing Large P Small N Data – Examples from Microbiome

Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to measure all the molecules of interest...

Bringing ML to Agriculture: Transforming a Millennia-old Industry

Guest post by Jeff Melching from The Climate Corporation At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions...

The curse of Dimensionality

Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows...

Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

Domino and Okera - Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In the last few years, we’ve seen the...

Why models fail to deliver value and what you can do about it.

Building models requires a lot of time and effort. Data scientists can spend weeks just trying to find, capture and transform data into decent features for...

The importance of structure, coding style, and refactoring in notebooks

Notebooks are increasingly crucial in the data scientist's toolbox. Although considered relatively new, their history traces back to systems like Mathematica and MATLAB. This form of...

Data Drift Detection for Image Classifiers

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production....

Model Interpretability: The Conversation Continues

This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W....

Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA

This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider as well as...

On Being Model-driven: Metrics and Monitoring

This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and...

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and...

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by...

Time Series with R

This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis and extracting meaningful...

Next page