Data Science

Building an Open Product for Power Users

Nick Elprin2015-04-12 | 8 min read

Return to blog home

This post describes our engineering philosophy of building an “open” product, i.e., one that supports existing tools and libraries, rather than building our own custom version of existing functionality. Aside from letting our developers be more productive, we’ve found this approach makes our users much more productive — especially power users, who are especially important to us.

Building an open product

Domino is a workbench for data scientists: it lets you run your analyses (in Python, R, Julia, etc) on scalable infrastructure, keeps your work tracked and centralized so it can be easily shared and reproduced, and lets you expose models through web forms or web services.

From the beginning, we have consciously built Domino to integrate with existing languages, libraries, and tools, rather than building a proprietary way of achieving commonly-used functionality. For example, rather than building a new way to create and share visualizations, we let data scientists use existing libraries like matplotlib or ggplot or anything else, and we simply serve files (HTML, javascript, images, etc) via our web interface.

Technically, this approach manifests through two central architectural decisions:

We run users’ code in Docker containers, which gives them the flexibility to do basically anything they want. Our default environment contains support for most languages (R, Python, Java, C/++, Julia, Matlab via Octave, and more), and we let users customize their own environments if they need something else. They can even securely store VPN credentials, so their code can connect to their own databases.

A project and its results are just files, which we know how to serve through the web UI. There’s no proprietary storage or output format; no special Domino library or package you need to use.

This has given us, and our users, a tremendous amount of flexibility, particularly with respect to visualization capabilities.

Examples

Here are some of the use cases we’ve supported in the last several months:

  1. Enabling interactive visualizations with HTML Widgets and Knitr. Rather than implementing our own functionality for visualization, we were able to immediately use a new R package, htmlwidgets, that provided powerful capabilities. Similarly, rather than creating our own functionality for report generation / markdown from R, the existing knitr package just works — and it works great.
  2. Integration with IPython Notebook — for Python or R (with R Notebook). Users can fire up notebook sessions with one click on whatever hardware they want. We didn’t need to invent or build a new “interactive analytics” UI.
  3. Integration with Plotly. When we saw that Plotly worked with IPython Notebooks, it was a no-brainer to demonstrate this working out-of-the-box on Domino. Now any of our users get the benefits of Plotly’s great interactive widgets in Python or R.

Knitr, htmlwidgets, IPython Notebook, Plotly — these are all fantastic tools that people have worked on for years, and many people already use and love. By letting users bring them to Domino, rather than adapt to some new proprietary way of working, I see ourselves as embracing a philosophy of “standing on the shoulders of giants” — why re-invent the wheel when such great solutions already exist.

Benefits

The benefits of our “open” approach have been massive, in two different ways.

First, user adoption and user satisfaction have been higher, because data scientists have the flexibility to use exactly what libraries and tools they already know and love. If they want to make a beautiful chart or report, they just use whatever package they normally would. There’s nothing new to learn — no new language, syntax, or UI.

Users love this, and the more sophisticated technical users tend to be particularly excited about it, because they have full freedom to do exactly what they need to do. Need to make a precisely formatted report? Just write the code — no need to muck around in some constrained UI.

Second, engineering capacity and efficiency. In the early stages of building a product, an engineering team needs to be very reactive to enable users to address their needs. By having the flexibility to use existing libraries and tools, we have been able to provide users with solutions that meet their requirements — quickly and without consuming our engineering resources. (And aside from the up-front development time, this has saved us the additional development time required for testing, documentation, and support of features that are often subtle and complicated, from an engineering perspective.)

It’s a win-win: our engineers stay focused on functionality that is core to our product plays to our comparative advantage; and our users get to use the languages, tools, and libraries that they prefer anyway.

If we had opted to make our own language, or visualization package, or notebook UI, or markdown/reporting syntax, I am confident that we’d have far fewer users, they’d be much less happy, and we’d be burning engineering time building things that already exist.

Disadvantages

I think the upside of our approach outweighs the downside, by far — but like all decisions, this isn’t free from disadvantages.

The first disadvantage is that unsophisticated/low-tech users have a steeper learning curve. Domino gives you the flexibility to do anything you want by writing code, but that means there aren’t wizards and simplified UIs to control complex functions. As a result, Domino is not a platform that “automates the process of data science” or lets non-technical users build rich visualizations. Said differently, Domino is more like a power saw than like safety scissors. We’re fine with that — for now — but it is worth calling out, explicitly.

The second disadvantage is that we have to tolerate the warts that come with the third-party tools that exist. To be sure, if we set out to build a new interactive analytics notebook from scratch, we could imagine dozens of ways it would be better than IPython Notebook. But at the same time, we don’t want the perfect to be the enemy of the good — especially when the good already exists, and the perfect would take years to create.

Conclusion

Our decision to build an open product, rather than creating proprietary functionality for common analytical use cases (especially visualization) has accelerated the growth of our user base as well as the velocity of our engineering team. It’s hard for me to imagine taking a different approach, yet we see plenty of products that opt to build their own new ways of delivering existing functionality (e.g., notebooks).

Our approach is also, largely, an extension of our “buy vs build” philosophy for using software components. If there’s an existing library or software package that provides the functionality we need, in general, we prefer to use it rather than build our own (unless it’s a capability so core to our competitive advantage that we need complete control over it — more on this topic on a later post.) Similarly, if there are entire capabilities and features we can provide to our users by integrating with existing solutions, that seems preferable to building our own new features.

Nick Elprin is the CEO and co-founder of Domino Data Lab, provider of the open data science platform that powers model-driven enterprises such as Allstate, Bristol Myers Squibb, Dell and Lockheed Martin. Before starting Domino, Nick built tools for quantitative researchers at Bridgewater, one of the world's largest hedge funds. He has over a decade of experience working with data scientists at advanced enterprises. He holds a BA and MS in computer science from Harvard.

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.