Tag: R

Three Simple Worrying Stats Problems

In this guest post, Sean Owen, writes about three data situations that provide ambiguous results and how causation helps clarifies the interpretation of data. A version...

On the Importance of Community-Led Open Source

Wes McKinney, Director of Ursa Labs and creator of pandas project, presented the keynote, "Advancing Data Science Through Open Source" at Rev. McKinney's keynote covered open...

Large Visualizations in canvasXpress

Dr. Connie Brett is the owner of Aggregate Genius. Dr. Connie Brett provides custom visualization tool development and support for the Translational Bioinformatics team at Bristol-Myers...

Data Scientist? Programmer? Are They Mutually Exclusive?

This Domino Data Science Field Note blog post provides highlights of Hadley Wickham’s ACM Chicago talk, “You Can’t Do Data Science in a GUI”. In his talk,...

Summertime Analytics: Predicting E. Coli and West Nile Virus

Gene Leynes (Senior Data Scientist) and Nick Lucius (Advanced Analytics) from the City of Chicago discussed two predictive analytics projects that forecasted potential risk involved with...

Model Deployment Powered by Kubernetes

In this article we explain how we’re using Kubernetes to enable data scientists to deploy predictive models as production-grade APIs. Background Domino lets users publish R...

Horizontal Scaling for Parallel Experimentation

The amount of time data scientists spend waiting for experiment results is the difference between making incremental improvements and making significant advances. With parallel experimentation, data...

Multicore Data Science with R and Python

This article is an excerpt from the full video on Multicore Data Science in R and Python. Watch the full video to learn how to leverage...

Using Monte Carlo Simulations in R to Test Methodological Advances in Social Policy Research

This is a guest post written by Kristin Porter, Senior Research Associate at MDRC. MDRC is a nonprofit, nonpartisan education and social policy research organization dedicated...

R vs. Python for Data Science

While the elections are over, some debates continue. R and Python are both popular programming languages for data scientists. Each has its advantages for performing data...

A Quick Benchmark of Hashtable Implementations in R

UPDATE: I am humbled and thankful to have had so much feedback on this post! It started out as a quick and dirty benchmark but I...

High-performance Computing with Amazon’s X1 Instance – Part II

When you have at your disposal 128 cores and 2TB of RAM, it’s hard not to experiment and attempt to find ways to leverage the amount...

A Summary of Using k-NN in Production

This week, Domino’s Chief Data Scientist, Eduardo Ariño de la Rubia, presented a webinar: An Introduction to Using k-NN in Production. If you missed the live...