Latest

Manual Feature Engineering

Many thanks to AWP Pearson for the permission to excerpt "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine Learning with Python...

A Practitioner’s Guide to Deep Learning with Ludwig

Joshua Poduska provides a distilled overview of Ludwig including when to use Ludwig’s command-line syntax and when to use its Python API. Introduction New tools are...

Themes and Conferences per Pacoid, Episode 11

Paco Nathan's latest article covers program synthesis, AutoPandas, model-driven data queries, and more. Introduction Welcome back to our monthly burst of themespotting and conference summaries. BTW,...

Addressing Irreproducibility in the Wild

This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s "The Ingredients of a Reproducible Machine Learning Model" talk at a recent...

Can Data Science Help Us Make Sense of the Mueller Report?

This blog post provides insights on how to apply Natural Language Processing (NLP) techniques. A complementary Domino project is available. The Mueller Report The Mueller Report,...

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, Programming Skills for Data Science: Start Writing...

Highlights from the Maryland Data Science Conference: Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred at the Maryland...

SHAP and LIME Python Libraries: Part 2 – Using SHAP and LIME

This blog post provides insights on how to use the SHAP and LIME Python libraries in practice and how to interpret their output, helping readers prepare...

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She...

SHAP and LIME Python Libraries: Part 1 – Great Explainers, with Pros and Cons to Both

This blog post provides a brief technical introduction to the SHAP and LIME Python libraries, followed by code and output to highlight a few pros and...

Making PySpark Work with spaCy: Overcoming Serialization Errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate at Google, as well...

Item Response Theory in R for Survey Analysis

In this guest blog post, Derrick Higgins, of American Family Insurance, covers item response theory (IRT) and how data scientists can apply it within a project. As...

Benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using Fashion MNIST

In this post, Josh Poduska, Chief Data Scientist at Domino Data Lab, writes about benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using Fashion MNIST....

Next page