Uber and the Need for a Data Science Platform

best practicesdata science

For those wondering if data science platforms are really a thing, there’s a great article by Kevin Novak, the head of Uber’s Data Science Platform team. He describes the problems they saw with their data science team, and why they needed a platform.

Uber is famous for the central role of data science in their business, so it’s no surprise that they’re forward thinking about how data science needs to be done. For people looking at how data science teams should organize their work, this is a great case study on where data-driven companies are investing.

An important take-away from Kevin’s article is that there are basic problems that even high-powered, well-managed teams have as they scale up. In the case of Uber:

  • They were reinventing the wheel: “Duplicative work was happening.”
  • They weren’t building best practices: “Our pattern of sharing best ideas via communally owned libraries, wikis, and Jupyter notebooks weren’t scaling.”
  • Their infrastructure slowed down data science: “The engineering infrastructure necessary to query data at scale, process experimental data, or build and maintain machine learning models wasn’t built with data scientists in mind.”
  • They were missing opportunities to generate insights: “We missed opportunities to leverage data because of friction.”

The conclusion that Kevin Novak reached was that they needed a “data science platform” to make data science more efficient. Their goal was to “find the fundamental problems that every Uber data scientist faces, and form cross-functional teams dedicated to solving that issue awesomely.”

We know where Kevin is coming from. This is exactly the way we see our mission at Domino, and the driving motivation behind our product.

As for features, Kevin is completely correct when he describes the need for sharing, reusability, faster deployment of models into production. These are baseline features that any data science platform should have.

But the most important point Kevin makes is the one about “seductive technology.” Getting people to make even small changes to the way they work is really hard. The more you ask of them, the harder it gets. People don’t want to have to learn a new IDE or port existing work to a new development paradigm.

It is the lack of user adoption that kills so many of these initiatives. You end up with all the cost and none of the benefits. So before you buy or build a product, make sure that people will really love the product you’re delivering.

There is, however, one area where we at Domino disagree with Kevin.

Not surprising, we don’t think companies need to build their own platform. There’s a great example of why this is bad. Fifteen years ago, there were plenty of companies that turned up their noses at Salesforce.com and decided to build their own CRM.

The justifications for building their own were the same then as Kevin uses now: we need special features, we’ll have more control, we’ve got great engineers who could build it for cheap, and of course, this is strategic to the business.

In the long run, these homegrown platforms turned out to be boat anchors, not accelerators. The code base got ossified and unusably complex. The platform stopped being the cool new thing people wanted to work on. Companies under-estimated the cost and resources required for ongoing maintenance and support. It became impossible to keep up with all the features that the users were demanding. For CRM systems, the team of “we can build that” morphed into the team of “you can’t have that.” As a result, companies with home built CRM systems found the platform they had put them at a disadvantage, even as they cost a fortune to maintain.

If you’re thinking about building your own platform, it's important to make sure this is a project you can commit substantial resources to for the next five years or more. The total investment to keep up with the basic functionality will be millions of dollars. For a company like Uber, with over $1 billion raised, and a deep bench of engineering talent, this may have been the right decision. For many companies, it’s clearly the wrong one.

But this is a small disagreement on a topic where we generally see eye to eye with Kevin. Data science teams need collaboration and reusability. If you have people doing quantitative research that's important to your business, you’ll need a data science platform.