Improving Zillow’s Zestimate with 36 Lines of Code

Zillow and Kaggle recently started a $1 million competition to improve the Zestimate. We are releasing a public Domino project that uses H2O’s AutoML to generate...

Horizontal Scaling for Parallel Experimentation

The amount of time data scientists spend waiting for experiment results is the difference between making incremental improvements and making significant advances. With parallel experimentation, data...

What Data Scientists Should Know About Hiring, Sharing, and Collaborating

In this post we summarize some of our most recent and favorite answers on Quora to questions from the community about hiring junior data scientists, sharing...

Multicore Data Science with R and Python

This article is an excerpt from the full video on Multicore Data Science in R and Python. Watch the full video to learn how to leverage...

Git Integration in Domino

We recently released new functionality that provides first-class integration between Domino and git. This post describes the new feature, and describes our perspective on the unique...

The Cost of Doing Data Science on Laptops

At the heart of the data science process are the resource intensive tasks of modeling and validation. During these tasks, data scientists will try and discard...

Benchmarking Predictive Models

It's been said that debugging is harder than programming. If we, as data scientists, are developing models ("programming") at the limits of our understanding, then we're...

Principles of Collaboration in Data Science

Data science is no longer a specialization of a single person or small group. It is now a key source of competitive advantage, and as a...

Achieving Reproducibility with Conda and Domino Environments

Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does environment setup waste time on-boarding...

Gain Shell Access To Your Domino Instances

Domino offers a managed, scalable compute environment that provides push-button convenience to data scientists, whether they're interested in exploring data in a Python notebook, or training...

A Quick Benchmark of Hashtable Implementations in R

UPDATE: I am humbled and thankful to have had so much feedback on this post! It started out as a quick and dirty benchmark but I...

High-performance Computing with Amazon’s X1 Instance – Part II

When you have at your disposal 128 cores and 2TB of RAM, it’s hard not to experiment and attempt to find ways to leverage the amount...

High-performance Computing with Amazon’s X1 Instance

We’re excited to announce support for Amazon’s X1 instances. Now in Domino, you can do data science on machines with 128 cores and 2TB of RAM...