Skip to content

    About Eduardo Ariño de la Rubia

    Improving Zillow's Zestimate with 36 Lines of Code

    Zillow and Kaggle recently started a $1 million competition to improve the Zestimate. We used H2O’s AutoML to generate a solution. The new Kaggle...

    Horizontal Scaling for Parallel Experimentation

    The amount of time data scientists spend waiting for experiment results is the difference between making incremental improvements and making...

    Multicore Data Science with R and Python

    This post shows a number of different package and approaches for leveraging parallel processing with R and Python. Multicore Data Science in R and...

    Git Integration in Domino

    We recently released new functionality that provides first-class integration between Domino and git. This post describes the new feature, and...

    The Cost of Doing Data Science on Laptops

    At the heart of the data science process are the resource intensive tasks of modeling and validation. During these tasks, data scientists will try...

    Benchmarking Predictive Models

    It's been said that debugging is harder than programming. If we, as data scientists, are developing models ("programming") at the limits of our...

    Principles of Collaboration in Data Science

    Data science is no longer a specialization of a single person or small group. It is now a key source of competitive advantage, and as a result, the...

    Achieving Reproducibility with Conda and Domino Environments

    Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does...

    Gain Shell Access To Your Domino Instances

    Note: Please be advised that direct access to containers via SSH has been deprecated for Domino versions above 4.x. Indirect SSH access via Workspace...

    A Quick Benchmark of Hashtable Implementations in R

    UPDATE: I am humbled and thankful to have had so much feedback on this post! It started out as a quick and dirty benchmark but I had some great...

    High-performance Computing with Amazon's X1 Instance - Part II

    When you have at your disposal 128 cores and 2TB of RAM, it’s hard not to experiment and attempt to find ways to leverage the amount of power that is...

    High-performance Computing with Amazon's X1 Instance

    We’re excited to announce support for Amazon’s X1 instances. Now in Domino, you can do data science on machines with 128 cores and 2TB of RAM — with...

    Providing Digital Provenance: from Modeling through Production

    At last week's useR! R User conference, I spoke on digital provenance, the importance of reproducible research, and how Domino has solved many of the...

    Announcing Enhanced Apache Spark Support

    Domino now offers data scientists a simple, yet incredibly powerful way to conduct quantitative work using Apache Spark. Apache Spark has captured...

    Ugly Little Bits of the Data Science Process

    This morning there was a great conversation on Twitter, kicked off by Hadley Wickham, about one of the ugly little bits of the data science process.

    The R Data I/O Shootout

    We pit newcomer R data I/O package, feather, against popular packages data.table, readr, and the venerable saveRDS/writeRDS functions from base R....

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.