In this Data Science Popup session, Danny Lange, VP of AI and Machine Learning at Unity Technologies, gives an inside look at practical applications and challenges of AI in enterprises such as Unity, Netflix, Uber, and elsewhere.
I’m Danny, and I’m going to have a little talk about making corporations smart again.
That title seemed funny a few months ago. But I don’t know if it’s funny today. I actually mean in the sense that as corporations, they grow larger and larger and more remote. Your taxi company is not a local thing any longer. It’s a global corporation like Uber. Or your bookstore disappeared and became a global Amazon.com.
As that happens, it’s more and more important for these corporations or enterprises to treat each customer like it’s a small village, where they walk up to the desk in the general store, and they are known customers.
So that’s really, really important. And that is really differentiating a lot of businesses. That’s why some of the mid-sized businesses doesn’t get it that they need to treat each customer as the cohort of one. They are getting in trouble against these much, much bigger guys who actually have the ability to execute on the technology needed for things like that.
So very quickly, just so we can get the context of where I’m coming from, I’m VP of AI and Machine Learning at Unity Technologies right now. I was recently head of machine learning at Uber, where I was leading the data team and the data platform team—not the data science team, the data platform team—and the machine learning team.
We thought of machine learning as an internal cloud service at Uber that every team across the organization could rely on. Think of the machine learning in that context as infrastructure. I did the exact same thing at Amazon, where I ran an elastic machine learning team, which was an internal machine learning service used by a hundred teams across the organization. And subsequently also, rebuilt that as a public service on Amazon Web Services under the brand Amazon Machine Learning. Before that, machine learning at Microsoft, a couple of startups, and IBM research way, way back.
Here’s the interesting thing. If I ask to raise your hand if you used Uber, you’re going to probably all do this. Same for Amazon, et cetera. If I ask you about Unity, you’re going to look at me and say, “What is that?”
Very quickly, I just want to give you an idea that Unity is a gaming engine. The really interesting thing is that we have 700, 800 million monthly active users. It’s actually really big. It’s really up there. Pokemon Go, Nintendo, almost 40% of all mobile games are running on Unity.
Think about some of the things we can do to create one-on-one experiences with the gamers. Let’s talk about typical business challenges very quickly. Some of these are more or less—should we call them—trivial, not trivial to solve. But you very easily run into them nowadays, where everything has to be super cheap. And I need it within 60 minutes or two hours delivery or next morning.
I find myself increasingly shopping on Amazon. I mean, I shop very much on Amazon. The kids, the other day, had a broken Xbox controller, which is really bad, if you can imagine that. And they had same-day delivery by 8 p.m., like that. We didn’t even have to go anywhere, and then they came early, like 4 p.m. It was like, kids were like, whoa, okay!
That’s something about customer expectations. Imagine running a global corporation that can deliver me an Xbox controller within hours east of Seattle. That’s pretty crazy. Demand volatility, seasonal stuff going on, suddenly everybody wants something for Valentines in pink. Suddenly everything needs to be shipped. We have 300 something days a year. And then everybody needs to ship you a quarter of what they need in a year a few days before the Christmas holidays.
That’s a crazy-high number of products. I believe—I don’t know the exact number—I believe that Amazon has close to a billion products, a billion SKUs in their catalog, more than they have customers. It’s crazy.
Supply complexities, the frequent shipments, people order several times a day, handling all that. Then on top of that, of course, we start to demand of these corporations to give us recyclable packaging and not too much packaging, though not too little either, et cetera, et cetera.
All that stuff is really, really hard. And despite that 99.9% of enterprises out there, they develop stuff this way, which is they hire programmers. And programmers write in C++, C, C#, Python, whatever. They have program managers and product managers. They write specs. They know everything. And they write this finite program that takes input and produces output that is slightly different from what they expected. And therefore there’s a feedback loop, where they go back and modify and modify and modify.
This very Newtonian way of developing software sucks. Yet everybody’s doing it. It’s very finite. You’ll see it everywhere. There are a few places where it’s really good, like your bank account. Probably a good deal, good idea there.
But most other things, if you start looking around, are not Newtonian. They’re not exact. I hate when Google Maps says that you’re going to be able to walk there over here in 11 minutes. It’s kind of 11 minutes, maybe a little faster, maybe a little slower. It really depends. 11, why are they giving me that number?
What they should instead is, of course, instead of hiring a bunch of developers, they should just hire a few. Let’s call them machine learning engineers. We can also call them data scientists. But we’re going to get back to that later. They take a learning algorithm. They take historic data from the past operations of the business. And they build a model. And they feed in new data from their ongoing business into that model. And they make predictions.
That’s much more of a Heisenberg principle. It’s much more based on the uncertainties in this world. And all things we do are really about probabilities and distributions. So this is how 90% of the software should be written, except the transactional software. A lot of software really ought to be this way.
I can give tons and tons of examples, both at Amazon and Uber—I’m going to dream up a few now—where this is still in transition when it actually should have happened much faster. So we can look at examples. I love to take Uber Eats as an example, because some of the very first— you all know Uber Eats in here? Ordering your lunch and the Uber driver will pick it up at the restaurant and then deliver it to you?
Some of the very first implementations of that literally had a line of code in there saying distance between you and the restaurant times the average speed in a North American city plus—let’s say—12 minutes, which basically led to everybody getting their meals much, much later than they were promised here, because you cannot compute that in a finite number.
Actually that was made a little change to it. So the 12 minutes became a variable where it could be changed a little. If it was a burger, it would be faster than some other meal. But what you really need to do there, and what happened was that that team gathered some of their data, fit it and built a model that would estimate delivery time for your meal.
What I really loved about that team was that even the UX team got it and gave you a range that would basically say, your meal is going to be delivered between 17 to 21 minutes. So that’s a 60% probability it’s going to be in there, maybe slower, maybe faster. The Uber app, and the normal Uber at is still giving me a finite.
This morning, the car was three minutes. Then it was seven minutes. Then it was—it was all over the map. I can’t use those very finite numbers. So show me distributions. Visualize them. And understand that Heisenberg is actually right. If we try to run a business, we are modeling the real world out there. To model the real world, there are so many uncertainties. So do not try to implement it in C# or Python or C++. It’s just not going to work.
I also found inspiration in something else, because we talk about, let’s get models in there. Let’s build predictive applications so that we can tell you when the burger’s going to be there. But that’s only a part of the story. We need to look at the OODA loop, which is basically John Boyd, US Air Force, studying combat air, combat in Korea during the Korean War.
He basically observed that pilots when they’re in air combat, they go through a process where they fundamentally observe where the enemy is and what the situation is. They orient themselves in that 3D space. And then they basically decide to do something. I’m going to fly over here. I’m going to shoot at the guy or whatever. I’m going to do something. And they’re going to act on that.
The interesting thing is not as much the loop. The loop is trivial and well understood. What he observed was that the faster you move through the loop, the better. So even if you make wrong decisions, if you move fast through the loop, you can correct those decisions. If you move very slowly and think very deliberate and try to do it well, you end up actually losing the combat. That was a very, very important observation.
He wrote a nice paper on it. And what I loved was that number one, he mentioned Heisenberg. So I was like, good. We are on a good path here, which is basically from the pilot’s perspective, you can only observe that much. You cannot keep observing and getting more and more accurate in your observation. There are limits to how accurate you can observe.
Think about this in a business context. Think about the product managers that write specs. I remember—I think we are moving a bit away from it—but I remember specs that would be hundreds of pages of specification of a system. They haven’t even started programming yet. We’re still up in the observe thing where we still dream up stuff. It goes so slow.
The other thing was Goedel, in this paper, which is also very interesting because it’s an observation about any model of reality is incomplete. So we have to continuously refine the model and look at the data and refine the model. Important component too, important component too. We cannot just assume that it’s a one-shot and then we are done. We have to keep refining.
The last one is Boltzmann. I like Boltzmann too – a very, very overlooked person. H’s the guy that if you drop some glass, then it becomes many pieces. But it doesn’t come back together very easily. So that’s basically any attempt that we have to build a business, build some IT systems around it, it will break, constantly break. That’s why we need to move through this loop.
Even if we talk about machine learning and data science and modeling, if you want to build a business, you have to be able to work through this loop very fast to stay on top of your competition.
This brings me onto something that I experienced a lot at Amazon. Amazon has—what is it—300 million monthly active users or customers? Not users, they are customers. They pay. What you want to do in a company like Amazon and most other companies on this earth—but most of them are not really doing it yet—is you want to really work through the loop I talked to you about with each individual customer.
You are data scientists here. How many know—do you all know the multi-armed bandit? Very, very, very, very important, very important at Amazon, super important, because we would basically look at maximizing the winnings. At Amazon, that means to sell the most stuff to you. So what Amazon is trying to do—and this is not just Amazon, but let’s use as an example—is to constantly run this exploitation versus exploration.
The exploitation is that you look at some nice shoes for your wife’s birthday. I did that. And then there are shoes everywhere I look, which is a little tiring. That’s exploitation. They are running bandits on me. They really are narrowing down and saying, he likes shoes, women’s shoes. However, there is in there, 5%, 10%, 15%, 20% of something else. That’s the exploration, Because we really don’t want to keep thinking that shoes is the only thing you’re interested in.
Netflix is the same model. So there’s the stuff that we recommend because we know you like it. Or we think you like it. And then there’s all this other stuff, the exploration part stuff that we serve to you trying to figure out if you like it. What’s important here is it’s equally important to show it. And you don’t click on it. And you click on those two events are equally important.
What the bandit approach allows a company like Amazon or a company like Netflix to do is to really find out you don’t like sci-fi. But maybe you like sci-fi comedy. Try to dig deep. Try to get to know you better. And that’s how you use the bandit algorithm to really start creating this cohort of one. So despite you having 100, 200, 300 million users, every single user is treated in a very personalized way.
The Uber app that came out in November has this awesome feature using machine learning, which is basically suggesting three likely destinations that you’re not going to go to. Awesome feature. I heard a lot of people like it. It’s done using machine learning. That’s good. And it’s personal.
The other day I was going to go somewhere, Saturday night. I was yelling to my wife, saying, I can’t remember the address. And I looked at the app. And it suggested that place. I’m like, OK, let’s go there. I don’t even have to bother about the address. That is exploitation. They try to say “Hey, we know you. We will send you where you usually go Saturday night.”
The other thing they did too is that they basically say, interesting bars in this area, which is like, I didn’t ask for that. So now they try to see if I’m going to bite and if they’re going to get me to go somewhere else. And maybe I don’t bite. And then I ignore it. And they figure out that I’m not interested in that. But they will try to probe around, very, very important.
There are scary things there because when you start showing people stuff you don’t know if they like or not, you’re actually wasting space and opportunity on stuff that may not sell. So, therefore, that’s the bandit in there. It’s the trade-off between the shoe exploitation and the risky exploration. Finance guys, they know all about this.
The other thing is the adversarial environments that people start gaming the system and try to confuse it. It happened a lot to Netflix by accident, because it turned out that families would have accounts. And I would watch. My wife would watch. The kids would watch. The kids are in an age range. And Netflix got completely lost. They don’t get it. There’s nothing that makes sense. We want to watch all kinds of stuff. And then the recommendation gets really bad. So that’s an adversarial environment. It’s kind of tricky to deal with. But it’s a reality.
I want to show you this video. This is by Jeff Schneider and a few other guys from Carnegie Mellon. These guys ran the– I don’t think the video is running. It should. OK, done.
Let’s stop this one for a minute. You’re going to miss the robotic snake. It’s so cute. So this is by Jeff Schneider. He’s one of the key guys on Uber’s self-driving car project. And it’s a very interesting example. It’s a mechanical snake.
You can imagine a mechanical snake. I can pull it up after my talk. You have to see it. I have it on my laptop. It’s a mechanical snake. It can move. It’s really, really hard to program such a thing. You have to program it. It’s an event-driven program. It has sensors on the side of its body. So it can feel where there is friction. And it can move and utilize that friction.
Super, super hard to code that in C or whatever programming language you’re going to use, event-driven. And it’s going to real—you’re going to have to think back to Newton. You’re going to have to think about how does snake—you have to understand how snakes move. And you have to implement that, say, in C or C++ or whatever, in some deterministic way, really, really difficult.
What these guys did instead was to use reinforcement learning. So basically they tell the snake to cross a barrier or wiggle around. And wiggle, wiggle, and wiggle until you accidentally figure out to get to the other side. And you will get a reward. And you’ll be a good snake when you get to the other side. And it wiggles.
As you can see in this video that you can’t watch now, it will wiggle. It will learn to—actually initially they put a big barrier. And it can’t figure it out ever. It can wiggle for a week. It will not wiggle over. It’s a foot, 12 inches. But then they make it two inches. And it wiggles over. And then they make it four. And now it knows how to wiggle over something. So it connects to figure out how to do it over four, four inches after a few attempts. And it goes all the way up to 18 inches suddenly. So the 12 inches that initially was a problem was not a problem any longer.
Reinforcement learning is so interesting, because in that one, we don’t code the thing any longer. So we’re not really trying to understand. Nor do we use data science to understand anything. We’re actually letting the thing figure it out on its own, which is actually super, super powerful. Because if you think about that snake as a business, then whether that’s book selling or movie streaming from Netflix, if you use that to learn how to get people to buy, you don’t even have to understand why people are buying as long as they are buying.
Take the bandits one step up and you have a system of figuring that out. There’s an even—so the snake example were based on some Bayesian-based, fairly traditional machine learning, reinforcement-based machine learning.
This next example is the DeepMind one. I want you to see it, because this is the Terry examples that DeepMind did. There’s a reinforcement learning algorithm, Deep-Q. It can operate a joystick. And you can see a screen. And then it knows the score. And it’s supposed to get a good high score. And this thing is training. So it knows nothing. It’s completely generic. It’s just trying to figure out how to play this game. And it’s not very good at it for the first 10 minutes at least.
Think of it, it’s a bit like the snake example. It’s just randomly trying to figure it out. But after a couple of hours, it’s actually getting pretty good. Now see it the way it’s playing, shooting down all the blue ones, going for the green ones, so working its way up. Anyone who actually played this game knows there’s a trick to it. What is the trick?
You want to spend a little time in the corner and get the thing, the ball, up above. Look what happened after four hours of training. Nobody told this generic algorithm that. Look, it’s working very hard over there in the corner. It will get there. There we go. This is crucial. This is the cornerstone of business, which is long term value over short term value.
Good business people go for long-term value. The short-term value guys, they get killed. And this is an example. So here you have—and this is new—24 months ago, not possible. This is new that you have a generic algorithm. I think they played 50 different Atari, 40 or 50 different Atari games with the same algorithm, Deep-Q. So it’s Q learning, traditional Q learning moved to a deep net world.
The key point here is that that generic algorithm can contain enough information because of the deep net to capture the fact that it’s okay to invest a little in building that channel up, because you’re going to get a bigger reward at the end. A simple Q learning algorithm would not be able to try to have that kind of non-linear reward function in it. It cannot figure that out. The chain of value is just too complex.
Actually I met with some game-designers last week in Europe. They all, actually a couple of them were physicists from the Niels Bohr Institute but gone into gaming 10 years ago. And I taught them a bit about this. And they said, oh, no, no, no. We tried that 10 years ago. Been there, done that. Just doesn’t work. I’m like, well, this stuff is not really even two years old.
There’s a lot of people out there who doesn’t know about this and doesn’t really understand that you can start applying these reinforcement learning algorithms to business problems to improve performance.
I actually had some game designers explain how they built these games where they are computer agents moving around and single shooter games. And they look pretty natural. It’s very interesting. They explained to me how they do a mesh. And they assign values to each field and stuff like that. I’m like, oh, I’ve heard about that. But I come from the Q learning world, where I have an algorithm doing that. They actually do it by hand, which is interesting.
The snake example was what we called progression learning, which means that you actually send the snake to school. Go first grade, second grade, third grade, because it can’t learn the end goal right away. It has to be brought there, which is also something that’s not really well understood at this point. That machine learning, you can’t go always for the big bang. You have to move there gradually.
The breakaway example is what I said, the short-term versus long-term. And deep net really offers, deep learning really offers a difference there. This stuff cannot be done with any other technology at this point. So I want to finish up. I’m always nervous when I go to talk, because there’s all these data scientists around. And some of this stuff here is—and I can tell you both from Uber perspective and from Amazon perspective—it’s less about data science and more about operations and execution.
It’s about letting an algorithm loose on the customer and get to know the customer really well, more than trying to understand a deep analysis around the behavior of that customer. As long as you get the snake to climb the wall, or you get the customer to buy the book or watch the streaming movie or grab an Uber, you don’t really care that much.
Where does that leave the data scientist? Because we are no longer thinking about cohorts of a million, or cohorts by county, or cohorts by gender or age all that stuff. We don’t care about that, because the algorithm deals with every single customer in a unique way, and learn their preferences and behavior and what it takes to sell to every single customer.
To do that, it’s all about operations. We cannot put data scientist. We’ve got to put people into that loop, because it’s going to slow it down. We have 300 million customers. They’re shopping all the time. Data scientists move to the side. Let the system figure it out, which means that data scientists should move up the food chain. It’s no longer a question about analyzing.
Let me give an example. I have a consulting company stopping by the other day. They wanted Unity to buy their services, because they have a new AI division. And that AI division, they have 50 people. And they were going to help us. So they put up a slide deck.
The first slide had three boxes on it. They said, this is our AI architecture, first data ingest and storage. Makes sense. Number two, data processing and machine learning, makes sense. Number three, tableau-based visualization. That last thing, that’s the check out. Or that’s the Uber app or whatever. What’s the visualization for? So they came with a very, very old fashioned data science perspective. They wanted me to use that technology to produce reports to impact policy, the company policy like let’s sell more of this. Let’s address this particular age group, or millennials this or that.
When in fact, what is happening is that we’re moving beyond that. And we are actually automating a lot of that. So I think that data scientists need to think much more strategic. We need to see how we can improve those processes to sell even more, to make the system learn even more about the customer, and get out of creating visualizations and reports and work on something for two months and report back the results of that.
I know I’m exaggerating a little here. But I’m trying to make a point. And I think also in all of this, there is interesting stuff going on. I don’t really have time to talk about it. But Netflix is doing some amazing stuff, where they totally manipulate even the imagery they use for movies to hook you. They show only actors, when they do the little DVD thing or whatever, the movie thing, the image, they only show you actors that they know you like. But they won’t show the actors you don’t like in that movie to get you to watch, stuff like that.
There’s this constant stuff going on, which brings me to the last point, which is, yeah. But at one point, we need to have someone to make sure that we still keep the customer interest in mind. So I think that was what I was going to say today.
Q: Are there any particular conditions you see for an organization to make transition?
Yeah, I think it’s very important, number one, that the organization make machine learning available, easily available as a scalable service, whether it’s an external cloud service they use…
It needs to make it available to the team so that the data scientists, together with software engineers, can lift the quality of the service. If you have to go and download your own Python this and Python that, or R this or that, remember it needs to run automatically. Because you’re off skiing in Tahoe and can’t get back because of snow, and the model goes out-of-band, and the service is terrible. All that stuff needs to be running automatically.
Data scientists should be given the proper infrastructure to actually spend their brain waves on something that’s really important and not operational or semi-operational stuff. So one of the things we did at Uber was to basically monitor the performance.
Say every time we make an ETA, a few minutes later, we have the ground truth. So we would actually monitor the models. And you could see it. That was actually a graph that you could see it. And when you went out of band, that would kick off a rebuild of the model. No people involved until that rebuild doesn’t work or whatever.
Then you get picked, which means that now the data scientists can start going around saying, how can we improve this even more? Should we have additional data sources in, more traffic, more weather information? Will weather information help us and so on? But remove that operational level is very important. You mentioned that you were in the machine learning team and data science team. What’s the difference, basic difference between them?
I consider machine learning infrastructure problem. It’s basically machine learning is about a set of predefined algorithms, some scalable hardware and software running in a cloud service internally, externally. I don’t care. And that’s the machine learning side of it, initial support workflows and data links and have feature storage et cetera, et cetera.
The data science part of it, I consider data science should be embedded within program teams. So you have teams. Maybe you have a full-stack. You have data science in there. You have analytics team. At Uber, we had Uber Eats. You had the Uber app itself, et cetera. At Amazon, you have all of these different organizations in the big organization that’s using machine learning. That’s where the data scientists are embedded.
I do not necessarily think that it should be a separate organization. I think it probably makes more sense to have them in there where they can actually assist the individual teams with data science expertise. In the situations that you described as on Uber and Netflix, and the snake example, one thing that’s very obvious in those cases that doesn’t generalize to many other cases is that it’s cheap and quick to try out different things.
There are many businesses where trying out something is actually very expensive. It’s certainly the case that in the world of—let’s say—mathematical optimization. This approach of not aiming at the precise, what we think is the optimal solution, but going close enough to there, and maybe randomly looking around it to figure out the better or what the real solution is, that’s an approach that has been around for a while.
Q: Do you think that—let’s say—in the middle ground, where we have some idea of what the cost is associated with trying out things, that maybe thinking about it in those traditional optimization terms is fruitful?
So let’s go back to the snake a little, because there’s this—we developed a demo of a chicken that was going to learn to cross the road. And the chicken dies 1,000 times before it figures out how to cross the road safely. And that’s, if you think about it in a real world, that’s 1,000 dead chickens for that experiment, not good.
That’s what I meant by, we need to start thinking a bit about the educational side of this. How can we train some of these algorithms without doing too much harm initially? So can we artificially train them a bit? And I will give you example in a moment. Can we artificially train them a bit so that we get up to a level where we can let them loose on something that’s more costly if a mistake is made? So let me give an example.
In self-driving cars, the traditional approach is let’s drive around with operators behind the wheel ready to break. And let’s gather data. Let’s go and hand label all the data, pedestrian, parked car, driving car, bicycle. And then let’s do that for years. And then maybe we have 1 million or 2 million miles gathered that way. Or let’s build a realistically rendering cityscape, with cars and people, everything simulated. And train the car, the car system, train it in that synthetic environment. Let it drive a billion miles.
It’s low cost, because it happens all in a GPU somewhere, or 1,000 GPUs. And then I reach my 90% level. And then I put it out on the real street. And I will teach it the last 10%. This is oversimplified. But it’s an example of some thinking that is actually not happening that much yet, where we actually do this progression learning. We don’t want to experiment on customers.
Boeing don’t want to experiment reinforcement learning on airplanes. And they are going to crash 1,000 times before they succeed. That’s not going to fly. They actually do a lot of simulation, by the way, already before they go to the real world. And I think that’s something that needs to be studied much more.
Q: How do you manage the time?
So I think—I think this is actually a little bit of the last question.
One of the things that is common in all the examples you shared is whether they’re graphing feedback. I think that what on you’re touching on is important. But all the struggles I’ve seen in terms of applying machine learning and AI is taking the feedback where you don’t have a million or billion tries to figure it out.
One of the examples is meteorology versus climatology. In meteorology, you write a forecast, and you find out in the next 24 hours if this happened or did not. Climatology you forecasted, you won’t find out for 100, 1,000 or 10,000 years.
Where do you think some of the opportunities are for being of use, and machine learning, in this cases where the real feedback isn’t going to be available? The timeline is screwy to do this.
There are situations where it’s not—there are lots, there are thousands, if not millions. There are probably millions of examples where either the feedback loop is too long or it’s too costly. That makes this very, very difficult.
This is not a solution to all problems in the world. However, it has worked pretty well for the human race to send kids to school and learn general stuff and keep learning through life. That’s sort of my message is that if corporations understand, enterprises understand, that their infrastructure is not finite and fixed and deterministic, but a very complex infrastructure that needs to keep adjusting and learning from the surroundings, that’s my key message.
Yes there’s a million cases where you needed a bunch of smart data scientists in there to hold hands and make sure that things don’t go crazy. But I think the long term goal is to start using these technologies to a much greater extent.