Latest

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, Programming Skills for Data Science: Start Writing...

Highlights from the Maryland Data Science Conference: Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred at the Maryland...

Themes and Conferences per Pacoid, Episode 5

In Paco Nathan's latest column, he explores the theme of "learning data science" by diving into education programs, learning materials, educational approaches, as well as perceptions...

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She...

Making PySpark Work with spaCy: Overcoming Serialization Errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate at Google, as well...

Item Response Theory in R for Survey Analysis

In this guest blog post, Derrick Higgins, of American Family Insurance, covers item response theory (IRT) and how data scientists can apply it within a project. As...

Benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using Fashion MNIST

In this post, Josh Poduska, Chief Data Scientist at Domino Data Lab, writes about benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using Fashion MNIST....

Learn from the Reproducibility Crisis in Science

Key highlights from Clare Gollnick’s talk, “The limits of inference: what data scientists can learn from the reproducibility crisis in science”, are covered in this Domino...

Feature Engineering: A Framework and Techniques 

This Domino Field Note provides highlights and excerpted slides from Amanda Casari’s “Feature Engineering for Machine Learning” talk at QCon Sao Paulo. Casari is the Principal...

Three Simple Worrying Stats Problems

In this guest post, Sean Owen, writes about three data situations that provide ambiguous results and how causation helps clarifies the interpretation of data. A version...

Classify all the Things (with Multiple Labels)

Derrick Higgins of American Family Insurance presented a talk, “Classify all the Things (with multiple labels): The most common type of modeling task no one talks about”...

On the Importance of Community-Led Open Source

Wes McKinney, Director of Ursa Labs and creator of pandas project, presented the keynote, "Advancing Data Science Through Open Source" at Rev. McKinney's keynote covered open...

Put Models at the Core of Business Processes

At Rev, Nick Elprin, Domino's CEO, continued to provide insights on managing data science based upon years of candid discussions with customers. He also delved into...

Next page