Data Science

In Pursuit of Purple Unicorns

Matthew Granade2015-04-09 | 7 min read

Many analytics managers we know describe their greatest challenge as recruiting data scientists. They often blame the problem on too much demand now that “data scientist” is the “hottest job of the 21st century.”

That’s a convenient explanation but it understates the managers’ own responsibility for the problem. In my view, one of the biggest causes of this problem is having designed a role with so many skill requirements it’s almost impossible to fill.

Whether you accept this explanation or not, there are several creative strategies for making it easier to fill the various functions analytics managers want data scientists to play. I describe three below.

The problem: Everything and the kitchen sink

While the specifics vary from place to place, one analytics manager we know wants the following skills and abilities in his data scientists:

Statistics
Algorithms (e.g. machine learning)
Programming languages (e.g. Python, R)
Data technologies (e.g. databases)
Engineering/Infrastructure (e.g. EC2, web services)
Domain knowledge (though this can be acquired)

Unfortunately, he finds it nearly impossible to recruit such a role. I think that’s because he has designed a role for a purple unicorn. What is a purple unicorn? Something we all wished existed but doesn’t (or is so rare it might as well not exist).

My view is that the so-called data scientist is often an ill-designed idea. The skills and abilities are too vast and diverse that there will never be enough data scientists. Rather than clarifying the role, process design, or technology, executives are simply piling more required skills onto the job specifications. Here is a prime example of the everything-and-the-kitchen-sink approach. For those wanting to skip the link, the job description includes everything from SQL to “high proficiency in programming” to experience analyzing large data sets to statistics and experience with special data.

Rather than clarifying the role, process design, or technology, executives are simply piling more required skills onto the job specifications.

Requiring engineering and infrastructure skills, for example, creates a significant step-up in the engineering know-how required for the job. What percentage of statistical experts also have that kind of engineering ability? Even if thirty or forty percent of them do (which is unlikely), companies are needlessly limiting their candidate pool when better technology could easily take care of the problem.

Of course, this is happening in a context of tremendous demand. I won’t repeat all the statistics we read over and over but it’s worth remembering that while simultaneously adding more requirements to the role, companies are searching for more data scientists than ever.

Making the hunt easier

Here are three ways to mitigate the data-scientist-as-purple-unicorn problem:

Role disaggregation. Rather than glomming more and more skills onto a single role, the most successful analytical companies are thinking about what specific skills and abilities they need for a project. They can then assemble roles for a team that have skillsets that appear more frequently (and naturally) in the wild.
Maybe you need an analyst who has a deep grounding in statistical techniques and problem solving. That should be easier to find. You then hire a data expert who knows how to access, clean, and manage data sources, and a technical person who can code and set up infrastructure.
Broadly speaking, we are seeing this more frequently now that “data engineer” roles are distinguished from “data scientist” roles. They all share a deep commitment to the goal, but the disaggregated skillsets are easier to find.
Disaggregation of roles requires more collaboration. We see this occurring at a cultural level — data scientists are leaving their silos and working with others — as well as a technological level, which is making it easier for data scientists to work on projects together.
*Provide data analysts and scientists with technology and [tools to automate parts of their work](https://www.dominodatalab.com?utm_source=blog&utm_medium=post&utm_campaign=in-pursuit-of-purple-unicorns).* Why require a data scientist to spin up an EC2 cluster, for instance, when that process can be automated? By not automating it, you are adding a new skill set to the job requirements for your data scientists, in this case infrastructure management. Given how hard data scientists are to find, you want to keep these additional skill sets to a minimum and tools and technology are a key way of doing that. (Imagine if to be a taxi driver you had to be able to build a car.)
This requires investment in tools and technology, which too many companies are uncomfortable with. In our experience, the companies that do the best data science spend 30 to 40 percent of an analyst’s salary on equipping them to do their jobs.
*Hire data scientists and analysts based on the abilities that matter most for the job.* Don’t create a laundry list of everything required to overcome every obstacle between data and impact on the enterprise.
Instead, look for deep curiosity and strong problem structuring / solving ability, as well as statistics, data science algorithms, and some programming ability—enough to use R. Exclude heavier engineering skills (e.g. infrastructure, data engineering). Do not get too carried away with programming abilities, because these can be supplemented.
Said differently, having helped hired a lot of these folks in my career, one of my main observations is that a few core abilities matter more than specific skills. Skills can be learned. Abilities tend to be more innate and also more empowering – they’re the engine that really makes a person go. And of those abilities, probably the most important is asking questions in a smart way that helps propel you to answers.

Skills can be learned. Abilities tend to be more innate and also more empowering – they’re the engine that really makes a person go.

I think Rob makes this point nicely at the very end of this interview.

We think these approaches when applied together are better than the endless hunt for more purple unicorns. The theme, in short, is to take a more functional view of what you want the data scientist to achieve and design for that using various tools, not just one role.

Matthew Granade

Matthew is an experienced executive and business builder at the intersection of advanced analytics, data, finance, and technology. As a co-founder and member of the board, he brings to Domino decades of experience in management and market positioning to help the company unleash data science at the world’s most sophisticated companies. Matthew previously built Point72 Ventures and was Co-Head of Research at Bridgewater Associates.

Summary

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.

In Pursuit of Purple Unicorns

The problem: Everything and the kitchen sink

Making the hunt easier

Other posts you might be interested in

Domino expands Generative AI capabilities with AI Gateway and Vector Data Access

Prompt engineering slowing you down? It’s time to try RAG and here's why.

Fine-Tuning for mortals: Ray and Deepspeed Zero on Domino

Subscribe to the Domino Newsletter