Pete Skomoroch’s “Product Management for AI” session at Rev provided a “crash course” on what product managers and leaders need to know about shipping machine learning (ML) projects and how to navigate key challenges. Skomoroch proposes that managing ML projects are challenging for organizations because shipping ML projects requires an experimental culture that fundamentally changes how many companies approach building and shipping software. Yet, this challenge is not insurmountable. Skomoroch advocates that organizations consider installing product leaders with data expertise and ML-oriented intuition (i.e., for what is and isn’t possible) to address these challenges.
A few highlights from the session include
- Companies with successful ML projects are often companies that already have an experimental culture in place as well as analytics that enable them to learn from data.
- Ensure that product managers work on projects that matter to the business and/or are aligned to strategic company metrics.
- Unfortunately, many ML-oriented ideas bubbled up from the business side aren’t feasible. As a result, product leaders managing ML projects need to have good intuition for what is and isn’t possible to have hard pragmatic conversations about prioritizations. Good product managers “know the difference between easy, hard, and impossible problems”.
- Be aware that machine learning often involves working on something that isn’t guaranteed to work. It is similar to R&D. As a result, Skomoroch advocates getting “designers and data scientists, machine learning folks together and using real data and prototyping and testing” as quickly as possible.
For more insights from this session, watch the video or read through the transcript.
Hi, I’m Peter Skomoroch. This talk will be a crash course in the top things product managers and leaders need to know about shipping machine learning projects. Companies that understand how to apply machine learning will be best positioned to scale and win their respective markets over the next decade. That said, AI product management is harder than most people realize. Without large amounts of labeled training data solving most AI problems is not possible. The talent and leadership needed to bridge the worlds of product design, machine learning, research, and user experience is scarce. Some of these people are in this room. How many people in this room are in a product role right now? How many people are in a machine learning data scientist role? How many people are designers? Probably not many. This will be relevant for everybody in the room. All of these people need to get together to make this work. This talk will describe how you can navigate all these challenges that you’re going to face and build a business where every product interaction benefits from your investment in machine learning.
Before we dive into that crash course, let me give you a little background on myself. I was most recently the Co-Founder and CEO of an AI startup called SkipFlag that was acquired by Workday at the beginning of last year. SkipFlag was a knowledge base that built itself from your enterprise communication, email, Slack, Wiki information and content. It used deep learning to build an automated question answering system and a knowledge base based on that information. It is like the Google knowledge graph with all those smart, intelligent cards and the ability to create your own cards out of your own data.
Previously, I was an early member of the data team at LinkedIn where my SkipFlag Co-Founder Sam Shah and myself built and scaled a whole bunch of innovative data and machine learning products that you probably use like LinkedIn skills, people you may know, and who viewed my profile. We also worked on a series of things behind the scenes that were powered by machine learning, helped things like sales navigator, and other products at LinkedIn. I was also recently cohost to the O’Reilly AI Bots podcast where we interviewed many leading researchers, platforms, and startups building conversational AI and services.
I’ve been involved with a large number of machine learning projects over my career, some successful ones, some not so successful.
I pulled together a set of lessons based on that experience. I hope you find useful.
What are we talking about specifically when I say AI products? AI products are automated systems that collect and learn from data to make user facing decisions with machine learning. In this talk, I’ll sometimes use the terms “AI” and “machine learning” interchangeably. We won’t go deep into the mathematics or the engineering details of how machine learning actually works. I assume a good number of people here have a fair amount of background there. All you need to know, for now, is that machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn based on data by being trained on past examples.
Over the last decade, many consumer companies like LinkedIn, Facebook, Google, Netflix, have deployed machine learning at scale to power friend recommendations, recommend movies, personalize ads, and power search results. This was all enabled and accelerated by an earlier wave of big data technologies ushered in by Google’s MapReduce paper and the open source project Hadoop. Now many people are using Spark. We’ve moved on to PyTorch and TensorFlow where we spend most of our time coding. But all of these things are built on this foundation of the consumer internet wave for the last 20 years.
If machine learning is so amazing, why hasn’t every company applied it and reinvented itself already? Well, it turns out that even basic machine learning is really hard and managing these AI projects in real business is much harder than most people realize. Machine learning isn’t fairy dust that you can sprinkle on your existing product, “get me some AI”. Even acquiring a company doesn’t instantly make you an AI company. You can’t just plug it in off the shelf, using Cloud APIs. Cloud APIs are great, but they are only a piece of the puzzle and that’s not going to magically make your entire product intelligent. One of the things I said a few months back, I tweeted out that as a rule of thumb, you can expect the transition of your enterprise company to machine learning will be about 100X harder than your transition to mobile was.
How many people are involved in shipping a mobile app for their company in some form?
One of the fundamental differences there…. is that with mobile, the underlying engineering stack and what you’re building is very similar. You’re presenting it in a smaller form factor. With machine learning, it’s like ripping out all the plumbing. You’re changing things fundamentally in how you build and ship software. One pattern that we’ve seen emerge with successful projects is that they happen in a company that already has an experimental culture …and analytics in place where they are learning from data already.
Companies that succeed at machine learning tend to build on those existing analytics use cases. These measurement-obsessed companies have an advantage when it comes to AI. Google, Facebook, other leaders, they really have set up a culture of extreme measurement where every part of their product experience is instrumented to optimize clicks and drive user engagement. It turns out that that level of measurement and tracking is what you need to throw off all the data and labels that you need for building machine learning products. This also means that that 100X number that I mentioned is probably an underestimate if you haven’t already shifted your culture in that way. Many companies, many startups, or larger companies that I talk to or advise, who are just starting on this journey, they’re interested in building some cool AI feature and it turns out they don’t have any tracking data. They haven’t been tracking search, impressions, clicks, or results in any way. Then you have to give them the hard news that they are about two years away from doing anything real.
Now let’s talk about some of the organizational requirements. The main thing I’m going to focus on in the rest of this talk is a product manager who’s working to ship an AI product. For people who aren’t really familiar with what a product manager does, an AI product manager looks something like this. You’ve got user experience, a UX background, some technical capability, and some knowledge of the business world. The Venn diagram represents traditional product management where the product manager is the product owner. The person who sets strategy, defines the rules of the market game you’re playing, and they work together with all these different teams and functions within the business to prioritize projects and execute on a roadmap to deliver those.
Being a product leader is a hard role. You are the quarterback, making sure that things ship on time. It’s harder for many folks in this role who don’t have any data or machine learning background and are thrown or thrust into shipping something like this. They lack probably what a lot of folks in this room have in terms of domain expertise around data, machine learning, and how these models work. The social problem of bridging these worlds is one of the hardest things to do. You’re lucky if you have a product leader who has some of that background. One of the most important requirements of a good data PM is that they need to have good intuition for how machine learning works and what is and isn’t possible. A lot of ideas that come from the business side really just aren’t feasible. How many people have had something, put forth as a project, and you have to give them the hard news that it’s just not possible with the data you have? Okay. So that happens a lot. You really have to have that. When people are coming up with these ideas, when things are being prioritized, people in this room need to be in that room where that decision is happening.
What should product managers know about machine learning building these products? Ideally, if you’re hiring a new PM, you want someone who’s done this before. Someone who’s worked at a company where they have a feedback loop, where they’ve shipped a product, they see the journey of iterating to get the model working, and they deal with all these issues that you encounter once something is live. There’s no substitute for that experience. You really have to just go through it. But hopefully the rest of this talk will give you an outline of what you need to know.
A good data product manager knows the difference between easy, hard, and impossible problems. A good example of this would be something like automatic text summarization. It seems like something that’s within reach of our current machine learning algorithms. You always see interesting stories on Twitter about new algorithms that can summarize text like a human or write stories like a human. But the reality is, if you give something, an arbitrary news article and ask it to do a generative summary, what comes out is often not factually correct and sometimes it’s not even sensible. It just looks like plausible English. If you have a high priority project that requires something like that, that’s a technological leap. It’s probably not feasible, at least in the near term. Knowing the difference between these things is something I call the “art of the possible”. Knowing what is and isn’t feasible. It’s one of the most important things to be able to do.
The other thing to consider is that even if something is feasible, it’s not the same as having product market fit. Just because you can do something doesn’t mean that’s what people want. Even if it’s something that customers want and could work, it might not be the best thing to invest your time and resources on. It might take too long to build, there might be something else that could have more impact. Another pattern that I’ve seen in good PMs is that they’re very metric-driven. They already know their company’s data inside and out and that usually means being able to run SQL queries, having an understanding of tracking, logs, data coverage and quality. They’re the people who can run around the company and help your team get that data that they don’t have, or get access to it, or change the product, or the business in a way where that it can get that data.
If you have a good PM and you have a data team, now you have to get them together and you have to prioritize those ideas. You can recognize the ones that aren’t feasible. You bring it down to a smaller set of things. There should be no shortage of ideas of what to do if you have a bunch of smart people and get them together. The problem is more often that not all ideas are worth pursuing. You have to figure out how do you prioritize your list of AI projects. Data scientists and machine learning engineers should avoid spending time on things that won’t serve the needs of the end user. Clever approaches to problems that people don’t really have, are not that useful. It’s very easy, especially when people are coming from academia. You have your pet algorithm, or project, that you’re interested in and you want to work on, but often that doesn’t map directly to the business. And I’ve done that myself. Everybody has something, their pet project or the thing that they want to work on, and try to shoehorn it into the business. That’s a pattern to avoid.
There’s a huge mountain of organizational and data engineering effort to ship any machine learning project. The ultimate payoff needs to match that investment. This means you can actually undermine your team as a leader if you attempt to under promise and over deliver on a small win for the business that won’t drive the core metric. That’s another pattern. People want to just dip their toes in and do a small sample project. The danger there is that that small sample project will be hard to ship too. And then you’ll do a lot of work to get it out and then there’ll be no ROI at the end.
To prioritize, how do we do this? I’d start with your company mission. Think about your mission and your near term strategic objectives. For example, at LinkedIn, our mission was to connect the world’s professionals to make them more productive and successful. We had a set of strategic objectives at any given time and one of the main ones was to become the professional profile of record. You want to be the system of record. Essentially, everybody’s resume now is on LinkedIn. But it didn’t happen overnight. There was a lot of work to get those profiles filled out and to get people connected.
If you knew that that was your overall objective, then you can start taking all the various ideas that your data scientists come up with, or that your product folks come up with, and start aligning them thematically. Then, thematically, if they’re aligned to one of those important objectives, then you can do this deep work of estimating the level of effort. I like to use the idea of T-shirt sizes, like keep it, stay out of the weeds, and use a small, medium, large effort. Then do the same for impact. When you stack rank these things, you’ll see some things that have a high degree of difficulty and low impact. Those don’t spark joy. Get rid of them if you can. This can be a fun exercise. I think in the end, what I’ve often found is there isn’t a huge amount of disagreement. People can generally get on the same page around these things.
The most important advice that I could give a machine learning product manager is to make sure that you work on things that matter to the business. The foundational work of the product manager is to make sure that you’re solving the right problems. Understand how your core business and product works and what those strategic goals are for yourself. Then what you want to do is pair your machine learning applications directly to one of those objectives, such that when you change a number in your model, like people are tracking accuracy, false positives. You have all your machine learning metrics, but ideally, whatever, your machine learning team is increasing or improving, has a direct impact on another metric in your business….and you can actually see that. I remember at LinkedIn, on the left, you see a chart of LinkedIn member growth. On the right, we have total endorsements and people you may know was probably the most successful example of this from LinkedIn.
I mentioned before, our mission was to connect the world’s professionals. People you may know is directly paired with that. When we shipped an algorithm update, when we improved the model, you would see a step function change in that growth. More people are getting connected, more people are coming on the site, bringing in more members, within months, when people you may know launched, it drove this massive impact and then it created a flywheel. Those people who came in also used it and it drove massive growth.
Another example of this is one of the products I came up with at LinkedIn, LinkedIn endorsements. When the objective was to become the profile of record, it means people need to fill out the profiles. A lot of people would sign up in the early days and you just create…you put your name in, you put your job title, where you work. There wasn’t much more information on the profile. That’s not very useful for finding candidates for job matching, for all the other things you want to do, or for advertising. We wanted to get those profiles filled out. Endorsements use machine learning recommendations to prompt social tagging of the people in your network and that drove massive amounts of profile completion. We had, I think, a billion endorsements in the first six months of the product.
That’s great when you can do that. If you can find a machine learning application that ties to a direct business metric, it can be magical. The problem is there’s a lot of work to get there. To do something like endorsements, there’s a lot of front end user experience work. There is a lot of plumbing and piping using different systems that already existed along with new machine learning models and that all had to be phased out. It didn’t happen overnight. It happened over many, many months. The projects that you prioritize, maybe sub-projects that when put together in phase, become your project roadmap. The problem with traditional product roadmaps, when it comes to machine learning, is that most product managers at heart are not comfortable with ideas that are expensive and have an uncertain probability of success.
One thing I left out of that prioritization is, oftentimes in machine learning, you’re doing things you’ve never done before. You don’t know if it’s going to actually work. You’re estimating the level of difficulty. But at the end of the road, it may not actually work the way that you intended. That is very uncomfortable. It’s closer to R&D than kind of like a well-known project. Most PMs don’t know how to begin to justify the expense of AI roadmaps. You’ll struggle to make the case in most organizations for the cost required for the research investment needed up front.
How many people start a project with a bunch of research and fiddling with models? Eric Colson from Netflix [formerly of Netflix and now currently at Stitch Fix] said something similar before. You want to get working with the data. You don’t want to just ideate and come up with ideas, and do this in a waterfall approach. As quickly as possible, you want to get designers and data scientists, machine learning folks together and using real data and prototyping and testing.
Product managers that are comfortable with roadmaps like this, are really valuable. They are going to be your advocate in the organization working with other non-machine learning focused PMs. How do you tell the difference between an AI technology that you can productize now versus something that’s viable in someone’s certain timeframe? Again, I think that just comes from experience. Unfortunately, most of the interesting things in AI are on the cutting edge of what we can do with engineering, which makes them unpredictable for planning purposes.
One of the reasons why adopting your company to machine learning is hard is that it represents a fundamental shift in the software engineering approach. A lot of the classic project planning and waterfall approach for software engineering breaks down when you have data that’s changing underneath you. Data is constantly changing. If you have a user facing product, the data that you had when you prototype the model may be very different from what you actually have in production. This really rewards companies with an experimental culture where they can take intelligent risks and they’re comfortable with those uncertainties.
There’s a great quote here on the right from Jeff Bezos that I like, ”If you only do things where you know the answer in advance, your company goes away,”. In many ways, this need for an experimental culture means that machine learning is better-suited to the consumer space right now than it is for a lot of enterprise companies. Enterprise product management traditionally is very top down. You talk to your biggest customers, you form ideas and future ideas and roadmaps, and those requirements driven that way, whereas consumer is more bottom up. You have those strategic objectives and you’re actually tracking a lot of very granular data and things are more often driven from data which makes, data scientists, machine learning, engineer ideas more prone to be accepted.
As I mentioned, I think that’s why companies like Google and Facebook really have an advantage in the space. They have the foundations of data infrastructure. I didn’t mention in this talk, but there’s a great diagram from Monica Rogati, the Data Science Pyramid of Needs, with a lot of foundational infrastructure and logging, feature extraction, all those things that you need to do to make these models work. That’s often the first year or two of a new company’s journey. Often the AI or data PMs are just doing that work in the first couple of years.
Once you’ve done that work, now you’re in the mode where you’ve down selected an idea, and you want to ship that idea. This is typically what it looks like. These are some of the key steps to shipping machine learning products. A key thing to understand here is that steps two to four take the majority of the time. It’s critical that you do the upfront work to select that problem correctly. The biggest time sink is often around data collection, labeling and cleaning. Assuming that’s in place, the next biggest value add to improve model accuracy and metrics is around feature engineering, clever feature engineering often makes or breaks projects. Coming up with those creative signals you can extract from the raw data to use as better inputs to your machine learning models is something to definitely invest in.
The other thing to keep in mind when you’re building these things is that perfect is the enemy of done. This applies to algorithms and design. You want to get a version one out as quickly as possible. It could be because 80% of the work, the real work happens after your product is live. You’re going to learn so much more once it’s out there and you’re going to get orders of magnitude, probably more training data if you design it correctly, such that you get feedback. Imagine, ratings and reviews. You launch the feature, you get those signals and then you can feed them back in as new training data. Notice how the last step is to leverage the derived data from your model building process to power other machine learning systems. That’s something…that’s like the highest level of actualization here. Once you start shipping products, you can start combining the data you get from those various products to do new things that you couldn’t do before,
Endorsements came from what we built with skill recommendations. For an individual user, we infer what skills you might have and we could show them to you. The more active users would add them themselves. But the vast majority of your users are not on the site that often. You can actually combine the people you may know algorithms and the skill suggesting algorithms together to build something like endorsements. That’s powerful when you can make that happen.
I mentioned earlier that design is critical. Product managers need to keep in mind that when they’re working with designers, that some of the key pieces of data needed for a machine learning powered solution or interface might be missing. You might not have that data. A lot of times, designs will be done in the absence of reality. You’d come up with some cool idea or concept of what you could do with machine learning. A lot of designers don’t understand machine learning deeply. Even the training data that you have might be less informative than what your designer initially assumed or there’s some algorithmic leap, like the tech summarization example I mentioned. You really need to get these people together in a room. Get the machine learning folks, the designers, the product folks together and talk about what’s really possible in a supportive way.
When possible, you want to provide your designers with real sample data. You can use tools like Framer or Figma. What you’re seeing here is a Figma, it’s a collaborative design tool, but you can build real prototypes. You can wire up like a Flask app that is a web application that will show real data to your designers and help them get an intuitive feel of what the data’s going to look like and how it’s going to look for different users. Then you can actually wire it up directly into these prototyping tools and you can see and get a feel for what the product would be like and the quality of recommendations before you build the whole thing.
Data quality is something I keep talking about and it’s important to emphasize that you’re going to need to spend a lot of time on data quality no matter what you’re building. The quote on the right is from Ruslan Belkin, former VP of Engineering at Salesforce, he said, ”Every single company I’ve worked at and talked to has the same problem without a single exception so far—, poor data quality, especially tracking data.” There’s either incomplete data, missing tracking data or duplicative tracking data, things like that.
One of the most important jobs of PMs is to collect the right data at the right time. You want to build user facing products in a way, similar to….a physicist. A physicist, they build things like the Super Collider to get better measurements about the world. In a way, when you’re building a product, you’re building like a Super Collider. You’re building the system that is also collecting data and it’s enabling all the future products that you are going to build. At the time of collection, you want to make sure that your systems are collecting things accurately. Mistakes made at that phase are extremely hard to correct. You don’t want to be correcting it afterwards.
Testing. We’re getting towards the end of the product life cycle here. Testing is critical. Unfortunately, a lot of the traditional unit tasks and other deterministic things that you do in QA will break with machine learning. Because the models are changing, the data underneath, that is training the models, is changing. It’s very hard to have stable tests. The first thing to know is that sunlight is the best disinfectant. An algorithm work that drags on without integration in the end product where the end results can be tested is really risky and dangerous. You should focus on proving out the core concept of the product and rolling out like a simple version of the model if possible, and then getting tests in place so you can tell if you’re improving, right, or if you’re regressing or if you’re breaking things.
You want to be aware of unintended consequences from seemingly small product changes. If you think about something like changing the phrasing of a question on your website, it can alter the nature of the data that’s being collected. It may seem like a small change to a designer or a PM, but for the machine learning team, it’s going to corrupt all their data. Or create some time artifact. The other thing to remember is that the prototype is not the product. If you are running on sample data, so those things that look good in the design and remember that it was sample data. You have to get through this process so that you can test what it looks like using real data. Real production data from real users, real customers, and then QA can really get to working for us.
I mentioned using a prototype app so you could see a design and see how the product is going to come together with the machine learning model. You also need, I think, more granular windows, more detailed windows into the innards of your system in your models. I don’t just mean like metric dashboards as to how your model is doing. What you see here is an example of a viewer app for the skill recommendations I mentioned. At LinkedIn, we could put in any member ID, there’s a button there for random members so you could see how that would work. You can see, keep in mind, almost everybody has a public profile. You can take that information and you can see what skills would be inferred. In this case, these were the skills that inferred from me. This was probably nine years ago at this point. But that was immensely valuable because we could see things like…in the early days it wasn’t working for teachers or it didn’t work for doctors. The coverage of the data wasn’t sufficient in those areas or whatever the other issues were. Or maybe there were just some model difficulties with those domains. But we could assess that and we could iterate and we could adjust our roadmap to improve the model to ship on time.
Machine learning products that interact with humans are fundamentally different because these are complex systems where everything’s changing. You need to have these windows into the data and into your models and be able to test and change them visually. If you’ve done that hard work at debugging, you’ve shipped to product, it’s moving in metric, the really powerful thing that can happen is that you get these things called flywheel effects. Users generate data as a side effect of using most software products, and that data in turn can improve the products algorithms and enable new types of recommendations leading to more data. These AI flywheels get better, the more customers use them, and they lead to these competitive moats.
There’s a bit of controversy lately about whether data moats are real or not. I think definitely in consumer companies they are real. I think we have yet to see that clear connection in enterprise as widely. But I think when these dots are connected, it will be pretty powerful in your organization. Companies like Microsoft, Google, and Amazon recognize this reality. They are reorganizing their entire companies and org charts around machine learning, with senior executives driving the company wide efforts. Amazon now has a powerful AI flywheel. That’s a good article by Steven Levy about this.
Some final thoughts I’ll leave you with to succeed in shipping machine learning products. You want to invest in that data infrastructure training as well as logging and tools for rapid iteration. Enable people in small teams to come together and use your data across your company. Some of the biggest challenges in making this all work are organizational and cultural, not technical. Ensure you have real executive support at the top and you staff good product leaders who understand this demand and also invest broadly in AI talent that includes product managers, data engineers, data scientists and designers who have some capability around data. Finally, find a machine learning problem with a direct connection to a strategic company metric and ship it. Thank you.
Editorial note: This transcript has been edited for readability. Also, Skomoroch presented different and earlier iterations of this talk. For more information on earlier iterations, read this article.