Author archive for Eduardo Ariño de la Rubia, page 2

Eduardo Ariño de la Rubia

Eduardo Ariño de la Rubia is a lifelong technologist with a passion for data science who thrives on effectively communicating data-driven insights throughout an organization. A student of negotiation, conflict resolution, and peace building, Ed is focused on building tools that help humans work with humans to create insights for humans.

Data Science

Principles of Collaboration in Data Science

Data science is no longer a specialization of a single person or small group. It is now a key source of competitive advantage, and as a result, the scale of projects continues to grow. Collaboration is critical because it enables teams to take on larger problems than any individual. It also allows for specialization and a shared context that reduces dependency on "unicorn" employees who don't scale and are a major source of key-man risk. The problem is that collaboration is a vague term that blurs multiple concepts and best practices. In this post, we clarify the differences between repeatability, reproducibility, and whenever possible the golden standard of replicability. By establishing best practices of frictionless in-team and cross-team collaboration, you can dramatically improve the efficiency and impact of your data science efforts.

By Eduardo Ariño de la Rubia17 min read

Data Science

Achieving Reproducibility with Conda and Domino Environments

Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does environment setup waste time on-boarding people, but configuration issues across environments can undermine reproducibility and collaboration, and can introduce delays when moving models from development to production.

By Eduardo Ariño de la Rubia8 min read

Data Science

Gain Shell Access To Your Domino Instances

Note: Please be advised that direct access to containers via SSH has been deprecated for Domino versions above 4.x. Indirect SSH access via Workspace terminals (e.g. JupyterLab, VSCode etc.) is still available in all Domino releases.

By Eduardo Ariño de la Rubia3 min read

Data Science

A Quick Benchmark of Hashtable Implementations in R

UPDATE: I am humbled and thankful to have had so much feedback on this post! It started out as a quick and dirty benchmark but I had some great feedback from Reddit, comments on this post, and even from Hadley himself! This post now has some updates. The major update is that R's new.env(hash=TRUE) actually provides the fastest hash table if your keys are always going to be valid R symbols! This is one of the things I really love about the data science community and the data science process. Iteration and peer review is key to great results!

By Eduardo Ariño de la Rubia8 min read

Data Science

High-performance Computing with Amazon's X1 Instance - Part II

When you have at your disposal 128 cores and 2TB of RAM, it’s hard not to experiment and attempt to find ways to leverage the amount of power that is at your fingertips. We’re excited to remind our readers that we support Amazon’s X1 instances in Domino, you can do data science on machines with 128 cores and 2TB of RAM — with one click:

By Eduardo Ariño de la Rubia5 min read

Data Science

High-performance Computing with Amazon's X1 Instance

We’re excited to announce support for Amazon’s X1 instances. Now in Domino, you can do data science on machines with 128 cores and 2TB of RAM — with one click:

By Eduardo Ariño de la Rubia10 min read

123

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.