Latest

On Being Model-driven: Metrics and Monitoring

This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and performance in the future. Many thanks to Don Miner for...

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that...

Themes and Conferences per Pacoid, Episode 13

Paco Nathan's latest article covers data practices from the National Oceanic and Atmospheric Administration (NOAA) Environment Data Management (EDM) workshop as well as...

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows...

Reflections on the Data Science Platform Market

Reflections Before we get too far into 2019, I wanted to take a brief moment to reflect on some of the changes we’ve...

Code

Natural Language in Python using spaCy: An Introduction

This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available....

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting...

Machine Learning

Deep Learning Illustrated: Building Natural Language Processing Models

Many thanks to Addison-Wesley Professional for providing the permissions to excerpt "Natural Language Processing" from the book, Deep Learning Illustrated by Krohn, Beyleveld,...

Make Machine Learning Interpretability More Rigorous

This Domino Data Science Field Note covers a proposed definition of machine learning interpretability, why interpretability matters, and the arguments for considering a...

Featured

On the Importance of Community-Led Open Source

Wes McKinney, Director of Ursa Labs and creator of pandas project, presented the keynote, "Advancing Data Science Through Open Source" at Rev. McKinney's...

Featured

Model Management and the Era of the Model-Driven Business

Over the past few years, we’ve seen a new community of data science leaders emerge. Regardless of their industry, we have heard three...

Practical Techniques

MNIST Expanded: 50,000 New Samples Added

This post provides a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset.  MNIST: The Potential Danger of Overfitting Recently,...

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, Programming Skills for Data...

HyperOpt: Bayesian Hyperparameter Optimization

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is...

Deep Reinforcement Learning

This article provides an excerpt "Deep Reinforcement Learning" from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an...

Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines

This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner....

Seeking Reproducibility within Social Science: Search and Discovery

Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search &...

Announcing Trial and Domino 3.5: Control Center for Data Science Leaders

Even the most sophisticated data science organizations struggle to keep track of their data science projects. Data science leaders want to know, at...

Data Science vs Engineering: Tension Points

This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be...

Next page