Latest

Data Drift Detection for Image Classifiers

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Run the example in a complementary Domino project. Introduction: preventing...

Model Interpretability: The Conversation Continues

This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from...

On Being Model-driven: Metrics and Monitoring

This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model...

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that...

Themes and Conferences per Pacoid, Episode 13

Paco Nathan's latest article covers data practices from the National Oceanic and Atmospheric Administration (NOAA) Environment Data Management (EDM) workshop as well as...

Code

Natural Language in Python using spaCy: An Introduction

This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available....

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting...

Machine Learning

Deep Learning Illustrated: Building Natural Language Processing Models

Many thanks to Addison-Wesley Professional for providing the permissions to excerpt "Natural Language Processing" from the book, Deep Learning Illustrated by Krohn, Beyleveld,...

Make Machine Learning Interpretability More Rigorous

This Domino Data Science Field Note covers a proposed definition of machine learning interpretability, why interpretability matters, and the arguments for considering a...

Featured

Reflections on the Data Science Platform Market

Reflections Before we get too far into 2019, I wanted to take a brief moment to reflect on some of the changes we’ve...

Featured

On the Importance of Community-Led Open Source

Wes McKinney, Director of Ursa Labs and creator of pandas project, presented the keynote, "Advancing Data Science Through Open Source" at Rev. McKinney's...

Practical Techniques

Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA

This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider...

Time Series with R

This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis...

HyperOpt: Bayesian Hyperparameter Optimization

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is...

Deep Reinforcement Learning

This article provides an excerpt "Deep Reinforcement Learning" from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an...

Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines

This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner....

Seeking Reproducibility within Social Science: Search and Discovery

Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search &...

Announcement: Domino is fully Kubernetes native

Last week we announced that Domino is now fully Kubernetes native. This is great news for data science teams and IT organizations building...

Data Science vs Engineering: Tension Points

This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be...

Next page