Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA

This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider as well as...

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and...

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by...

Time Series with R

This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis and extracting meaningful...

Comparing the Functionality of Open Source Natural Language Processing Libraries

In this guest post, Maziyar Panahi and David Talby provide a cheat sheet for choosing open source NLP libraries. What do natural language processing libraries do?...

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, Programming Skills for Data Science: Start Writing...

Highlights from the Maryland Data Science Conference: Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred at the Maryland...

Machine Learning Projects: Challenges and Best Practices

Lukas Biewald is the founder of Weights & Biases. He was previously the founder of Figure Eight (formerly CrowdFlower). This blog post provides insights into why...

Model Interpretability with TCAV (Testing with Concept Activation Vectors)

This Domino Data Science Field Note provides very distilled insights and excerpts from Been Kim’s recent MLConf 2018 talk and research about Testing with Concept Activation Vectors...

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She...

Making PySpark Work with spaCy: Overcoming Serialization Errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate at Google, as well...

Collaboration Between Data Science and Data Engineering: True or False?

This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down...

Growing Data Scientists Into Manager Roles

In this post, Ricky Chachra, Research Science Manager at Lyft, provides insight for companies looking to home-grow their promising individual contributors (ICs) into effective managers. He...

Next page