Latest

MNIST Expanded: 50,000 New Samples Added

This post provides a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset.  MNIST: The Potential Danger of Overfitting Recently, Chhavi Yadav (NYU) and Leon Bottou (Facebook AI Research and NYU) indicated in...

Themes and Conferences per Pacoid, Episode 10

Co-chair Paco Nathan provides highlights of Rev 2, a data science leaders summit. Introduction Welcome back to our monthly burst of themespotting and...

Machine Learning Product Management: Lessons Learned

This Domino Data Science Field Note covers Pete Skomoroch’s recent Strata London talk. It focuses on his ML product management insights and lessons...

Addressing Irreproducibility in the Wild

This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s "The Ingredients of a Reproducible Machine Learning Model" talk...

Reflections on the Data Science Platform Market

Reflections Before we get too far into 2019, I wanted to take a brief moment to reflect on some of the changes we’ve...

Code

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting...

Data Scientist? Programmer? Are They Mutually Exclusive?

This Domino Data Science Field Note blog post provides highlights of Hadley Wickham’s ACM Chicago talk, “You Can’t Do Data Science in a GUI”....

Featured

Growing Data Scientists Into Manager Roles

In this post, Ricky Chachra, Research Science Manager at Lyft, provides insight for companies looking to home-grow their promising individual contributors (ICs) into...

Featured

Model Management and the Era of the Model-Driven Business

Over the past few years, we’ve seen a new community of data science leaders emerge. Regardless of their industry, we have heard three...

Practical Techniques

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, Programming Skills for Data...

Highlights from the Maryland Data Science Conference: Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred...

Model Interpretability with TCAV (Testing with Concept Activation Vectors)

This Domino Data Science Field Note provides very distilled insights and excerpts from Been Kim’s recent MLConf 2018 talk and research about Testing with...

Trust in LIME: Yes, No, Maybe So? 

In this Domino Data Science Field Note, we briefly discuss an algorithm and framework for generating explanations, LIME (Local Interpretable Model-Agnostic Explanations), that...

Classify all the Things (with Multiple Labels)

Derrick Higgins of American Family Insurance presented a talk, “Classify all the Things (with multiple labels): The most common type of modeling task no...

Put Models at the Core of Business Processes

At Rev, Nick Elprin, Domino's CEO, continued to provide insights on managing data science based upon years of candid discussions with customers. He...

Announcing Domino 3.4: Furthering Collaboration with Activity Feed

Our last release, Domino 3.3 saw the addition of two major capabilities: Datasets and Experiment Manager. “Datasets”, a high-performance, revisioned data store offers...

Data Science vs Engineering: Tension Points

This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be...

Next page