In this Data Science Popup session, W. Whipple Neely, Director of Data Science at Electronic Arts, explains why data scientists have responsibilities beyond just data science.
I wanted to talk to you a little bit about something that I’m surprised to find is apparently so current for everyone, which is this engineering/data science divide.
We have a team that is mostly people with PhDs, mostly statisticians, economists, people with a mathematical background, and relatively few people who really have an engineering background. My own background is a little odd. I think it’s probably relevant to tell you a little bit about myself for the process.
When I was very young, I decided that the only sane response to the universe was to get a PhD in mathematics, which I did. I taught for a few years, which I loved. Then, I decided to go into industry in the ’90s and started working as basically a mathematical consultant for software engineering teams.
I ended up being trained by them to be a software engineer myself because it was easier for them to tell me, go make this happen, than to have me describe the mathematics of a problem to them.
I really lucked out. I was taught by amazing software engineers how to function as part of a team, how to make production software. I did that for a number of years, and then fell in love with statistics.
After about seven years of that, I went back to graduate school and got a second PhD in statistics. Then graduated into a world that had learned the word “big data.” And this was very fortunate.
It was the second time I had graduated from a major degree program into a recession. So it was very, very effective that it wasn’t a recession for those particular collection of skills that purely through random chance I’d fallen in love with.
A lot of my time is spent teaching my team about software engineering practices. And this talk is really about that transformation.
Trying to take a team of people who really understand what it is to program in a personal sense on their own computers to make a beautiful model. Sometimes, tremendously gorgeous, perfectly sophisticated models that are well-honed to solve the problem they have, and then can’t be productionalized. People for whom learning a programming language is also not a problem at all.
My team eats new programming languages for breakfast. They’ve done lots of them over the years. So it’s easy to learn a new programming language.
It’s not so easy to learn how to talk to a data engineer. That’s part of why I’m personally so grateful for some pieces of the talks I’ve seen before. Especially Kimberley’s where she talked about making friends with data engineers. In many ways, this is also just about one set of team that starts from a particular direction in this.
This talk is really about just telling you some of our journey. I’m hoping that I hear from you about some parts of your journey and I can learn and absorb those skills, too. And that history.
I also have a small thing I wanted to interpret here. How many of you have seen this diagram? Many of you. There are many versions of it. Of course, it talks about what is a data scientist.
It’s somebody who knows the math and stats. It’s somebody who knows computer science that is described in the original article as hacker skills.
I really hate that part about hacker skills. Then, there’s some version of domain expertise. I seem to have problems with direction.
If you do a Google Image Search on this, you will see all sorts of variations on this. I don’t know if it’s relevant. But when I put “who” in the Image Search, I got the third from the top there. Or, third from the left on the top.
I got a female figure with the “who.” I get the same search results with a male figure there. I don’t know if that means anything or what it means, but it amused me no end.
This is what I really think this diagram should be about. From what I’m hearing from the previous talks, I think a lot of you may have very similar experiences.
There’s the science part. Which to me, it’s the mathematics. It’s the stats. It’s the computer science. It’s the machine learning. It’s all of those things, including the domain expertise. And then, there’s the engineering bit, which is, how do we make this happen?
I mean, how do we actually make it scalable, and repeatable, and make it something that our data engineers don’t run screaming from the room when we describe it?
Then there’s the collaboration piece. It’s about really being part of the business, having a seat at the table. But also, really understanding the question that’s being asked.
From my own journey, that was a very important one. My very first gig doing applied mathematics in the early ’90s was I got called up by a biologist whose first question was, I need to understand Legendre polynomials.
I thought, well, that’s easy. I can teach about Legendre polynomials. And I did this whole song and dance about Legendre polynomials. It turned out that what he actually needed was he had map data with biological diversity information and he just needed to know the latitudes and longitudes. He’d done some reading about maps, found out that a lot of maps approximate this with Legendre polynomials.
It was an early lesson for me about how sometimes the question being asked is not the answer that’s needed. But if you ever want to know about Legendre polynomials, I’ve got a great talk I can give you.
I try to talk to my team a lot about this model as opposed to the one they’ve seen. So the problems we really faced were mostly that engineering piece. My team is actually pretty good at the collaboration.
One of the nice things about people who are trained in statistics and have a lot of experience is that they’ve gotten that collaboration piece. Their bread and butter is actually talking to people and turning the problem into something that both sides understand. Or, turning the question into a problem that both sides understand. They have that collaboration piece pretty well.
They can do the science piece pretty well. Since I work for Electronic Arts, which is a video game company, my team attracts people who love video games. They understand the domain. And they understand the collaboration piece.
They’re not so good at the engineering piece. A lot of this process has been to try and bring them up with that. And one of the things we found is that there isn’t a really easy intermediate step for them between the kinds of programming that they could do on their laptops and the kinds of enterprise-level things that they needed to do.
One of the things we tried to do is to figure out where the gaps are for them. Where we started was people would write an R or Python script. They’d run it manually. Then, they’d update a report.
Or, if they were trying to do something on the enterprise level, they’d write the script. They’d run it manually. Then, they’d update a static model implementation.
The place we started with our data engineering team was that our chief data architect told me point blank, look. Any model at all could be described as some SQL. And I tried to talk about fitting the model. I tried to talk about, what would it look like to do?
I tried to show him what happened when my team—with a lot of skill and help from data engineers, tried to render into SQL a fairly simple forecast model. As many of you know, that turns out to be rather hard. So this is where we started.
We needed some way that we could boost up the engineering skills. And also, work better with our data engineers. We started looking at where the real problems were with what our team could do, even locally.
The longer-term relationship with the data engineers was a slower process. It’s going very well, but it was a slower process. What we found is things that we could talk to people about that really gave them some motivation is to say, you know, code gets lost.
We keep on reimplementing the same thing over and over again. There’s tons of manual work with this, which really appealed to people. My team is—it’s the data science team, but we also serve as a center of—my boss likes to call this the center of expertise.
I hate that term, but a center of consulting for analytics teams throughout the company. EA is about—I can’t remember—8,000 to 10,000 people. We have lots of analytics teams.
They tend to come to us for consulting. So we also use this process to try all of this with some of our analytics teams. And everybody was complaining about manual work. One of the analysts in one of our embedded teams told me, I am sick of being a walking dashboard. I heard similar complaints over and over again.
The other thing was that there were no automated checks for any sort of correctness or robustness. There was no culture of testing things or what it meant to test things. Our report went out. A dashboard went up. And there was nothing that people could do other than eyeballing it.
We hadn’t built this culture of, what does it mean to repeat things? Of course, if you start automating things, you’re forced to do that. Or, at least I would be forced to do it.
Colleen Chrisco, who’s the Director of Analytics at PopCap Games, which is—it’s actually where I started. I bullied them into hiring Colleen, by the way. They didn’t want to hire somebody as an analyst who had so much math background. I told them that they had to.
One of the things Colleen said was, “Our analysts are pretty good at writing scripts and generating reports, but our team needs help with the bookends.”
I think for her and for the rest of us, that “bookends” term was really key. What she meant was that scheduling tasks, they had no idea how to schedule a task. They didn’t have a way of serving them automatically. We needed a way, if we were really going to seduce people into this engineering approach.
I think seduction is probably a good way to do it. Because often, people come into these things and it’s like, my god. I’ve got all these headaches with actually trying to write the code, and build the model, and get the stuff out.
Why do I have to acquire all these new, unpleasant, difficult skills? Actually, I think they’re pretty wonderful skills. But if they’re new, they might seem unpleasant.
I guess this is a little repetitive. The point is, in terms of the diagram, we were weak on the engineering skills.
We gave a pair of talks in which I talked about software engineering and analytics, and then she followed up with one where her comment was that she’d taken this back to her team and they had said, “We’re not programmers. I’ve never scheduled a job before. I don’t even know where to start.” That was her image for how the team felt about this whole process.
In order to calm things down, we had to do something really, really clever. In my case, very lucky. What we did was we built a very simple system. Simple. And part of it was the seduction.
We said, you can solve your problem with scheduling. Get your code checked into a source code repository, which was itself a victory since most of the code for our various teams resided on those laptops where they’d been doing all their work. And built a new technology called R Server.
You check in your files with some scheduling information into the R Server and the R Server runs it automatically. And then, outputs the results. So that for the first time in the entire company—and you could, of course, do this with Jupyter Notebooks, too. But our team was heavily into R and actually needed a variety of outputs, not just a notebook.
People were able to schedule things. It also had the result that they had to have them in source code control. That’s why it was a little seductive. It’s transformed several of our teams.
When I say “we did the following,” I had one part in this, which was that I knew I wanted somebody to do this. I was very lucky; I hired a genius.
I found the 0.07 unicorn that Kimberley spoke about. I found Ben Weber, who now works at Twitch. And he’s a computer scientist who’s worked as a director of analytics and loves games. And he came in.
He did an amazing job of just working with our team, finding out where their problems were. And he really built this system. And I’ve forgotten the URL, but he has it as an open source system. It really transformed things.
Sometimes, you can find a unicorn. They’re great, even if you only have them for a little while. And we had Ben for about a year, but he really made this happen. I asked him to do something like this, and he made a realization that was far better than anything I could have conceived of, so I’m very grateful for that.
Where it landed us, we’ve gotten a lot better. We’d automated things. We got Colleen’s bookends covered. It was adopted by multiple analytics teams throughout Electronic Arts. As a result, the teams really started using the technology to improve their work. Teams became more efficient.
The analysts who had made the “walking dashboard” comment came back and said, I’m not a walking dashboard anymore. I love this.
Because the system was so general, she was able to serve up her reports as automatically updated R Markdown files. How many of you know R markdown? Ah. So if you went to the R User Conference at Stanford last year, you would have seen Don Knuth, who Paco talked about, talk about R Markdown as an example of literate programming.
My own team, I’ve been talking to them about literate programming for a while. About how great R Markdown was as an example of this. When they went to the conference, they came back and they said to me, R Markdown is an example of literate programming. This is really exciting.
I said, yes, I know. One of the really powerful things of this whole system was that people actually had a motivation for remembering to check their code into the source code control system.
I could have mandated that. I pushed them gently in that direction for a long time, and then I started mandating it. It just wasn’t natural for them. But when it became part of their workflow, it was the most natural thing in the world.
It didn’t solve everything, however. We produced more tools, but we really have only taken the first step towards being a real software-producing organization. I honestly believe that whether you’re partnering with data engineering teams whose job is to take what you’ve done and productionalize it.
All you’re doing is the sort of old-fashioned model where you’ve got some code on your laptop, and you run it, and then you write a report. An analytics or a data science team is still a software-producing organization. So much of our value is in that software reproduce, even if it’s a small script.
One of my goals for the team is for the whole team to really understand this in terms of what it means to make things repeatable. What it means to make their lives easier because things have unit tests.
We’ve only taken the first step there. And effectively, we’ve really expanded the laptop model. The system of R Server—I wouldn’t promote this as an enterprise-level data science or data engineering solution.
It’s a good intermediate step for training people to get where they need to. It’s been very effective there. But it also introduced some issues that is worth thinking about with a system like this. Because the server system we set up is so popular with multiple teams, not just mine, there’s been this incredible proliferation of models and things that are running in the R Server.
We didn’t stop to think, “Oh, we’d better curate these.” We’d better figure out when we need to retire them. That’s one of the things we’re grasping with now. It’s actually really helping the conversations with the data engineers.
Suddenly, my team understands that if they get the data engineers to productionalize something, one of the things they need to say is, these are the criteria under which this can be turned off and thrown away.
One of our engineering team’s big worries is you want us to have these automated processes going, but you can’t tell us when they go away. And we don’t want this big proliferation.
My team is now dealing with that because they’re maintaining the R Server. They have all of these other analytics teams that are running these things on this and they really want to know when they can get rid of them.
As part of this, we have people leave. But the server processes they started keep on going. This is just one step in what we’ve done.
I think I’ve probably already said this. We needed to make a cultural change. We’ve done a whole bunch of things. We’ve started teaching R classes in the company that teach good coding style.
We actually introduced a style guide for our R code, which is basically a very, very basic style guide, but it was fun for me because I got to tell some horror stories about how bad code can be, which I enjoy doing.
We started to do code and project reviews. I think we could do that much, much better. We’ve also started training the team in programming engineering in new languages.
Actually, our colleagues—not in our team, but our central engineering team also has a small data science team that started thinking in terms of how to use SPARC as a platform to make models and predictive code be first-class citizens of the enterprise system.
They’re doing it in SPARC. So they’re teaching some SPARC classes, which I’m very happy about. In order to avoid the unicorn problem, I’ve also taken some of my headcount and said, these are reserved for people whose primary background is in software engineering.
I’m not necessarily looking for somebody who really knows analytics or data science. They just have to be comfortable with the idea of working with analysts and data scientists. Fortunately, a lot of them are.
What’s next? I really want us to set up a dev test and prod environment, where we can really move things across.
We’ve gotten some of the cultural change going. People are thinking about the difference between developing code and testing code, and then productionalizing it. If you think about that old laptop script—run script, get the images, get the results, write them up in a report model—there’s no real distinction between dev, test, and prod there other than what’s implicitly there.
We’re working with our central data engineering team to be able to say, OK, what does this look like if we really set this up for the data scientists?
We’re also upgrading our tool set and beginning to do pair programming with people. We’ve found in the past that really helps.
Q: What is the solution again?
Neely: Let me go back. Maybe I can talk through the diagram.
So what we had before was the laptop. Imagining I’m holding a laptop. My data scientists would write all of their code there on the local directory of the laptop. They’d get some results. They would either translate those into something that could be turned into the enterprise system into a SQL script.
That means that all the power of what R or Python could do in terms of fitting the model, choosing the model was lost. The code also lived in the laptop. Weeks and weeks of time could go into creating this code. And where did it live? It lived in some random directory on their laptop.
So perforce, which is our enterprise source code control system. We started saying you can use Perforce. I guess I should be honest. I started saying I want you to use Perforce. It was a big shift for them, so I wasn’t making it like an annual goal for them.
Thou shalt use Perforce or you won’t get your bonus. It was more, I want you to do this. But what we then did is we developed our own technology.
Or rather, Ben developed for us as part of the group this R Server technology. So it was a Java app that he wrote. And basically, what that Java app does is it will go to Perforce, grab a directory, grab a project, basically, from Perforce. Say, oh, here’s the script there that I know how to run.
Runs it automatically. Pipes the results out. And then, produces a web report, something in a repository, just the general output of whatever it can do.
This had two effects. It meant the Java app that was our server—I’m sorry if I blasted through that too quickly—it could handle all the scheduling people needed.
It’s hard to schedule something on your laptop. You might have closed it, for example. And it also did something that was intentionally a little manipulative. I hope that’s not a bad word.
It got people to have a reason to use Perforce because, do you want to schedule this? You want it to run on the server? That’s lovely. You’d better check the code into Perforce.
There was some gentle heavy-handedness. Did that answer your question? Yeah.
It wasn’t just the Perforce. It was the combination of the Perforce with our locally-developed R Server app.
Q: I think the organization I’m with is in a very similar place. We’re very much R focused. You mentioned sort of extending what you guys are doing with Python. I’m just curious. Not being super-experienced with it, I still view it in my mind just being an alternative to R rather than something where you would extend like that. I understand we’re working in that direction. Can you just share, just for my benefit, maybe some examples of things that you see Python as being able to do that R isn’t able to do? Because you’re sort of setting it up as the next step rather than an alternative path, which is I think how we still view it.
Neely: That’s a great question. And I have the perfect answer, which is it’s not about the relative strengths of the languages.
I actually don’t think that’s the important variable. The place where we’ve used Python is—part of this is particular to Electronic Arts’ model.
We have a central analytics and a central data science team with lots of embedded groups. The thing that Python allows us to do that R simply can’t do and will never be able to do is it allows us to talk to our client groups who are used to using Python.
That sounds silly, but I actually think it’s really one of the important distinctions in terms of which computer language you choose. We did a matchmaking algorithm for one of our studios. And the data scientist on my team who developed the matchmaking algorithm, did it in R because he’s very familiar with R.
The studio we were working with, their engineering team that needed to implement this really loved Jupyter and IPython Notebooks for good reason. They’re eminently lovable. And that was the language they knew. And so we had two choices:
He could take the model and the algorithms that he developed in R and turn them into sort of a pure mathematical algorithmic description that he could have then handed off to the engineers.
Or, he could say, I re-rendered it as a Python Notebook, an IPython Notebook. And he did that and it was very easy for them to implement. So I agree with you.
I mean, my background is a statistician. I’ve been using R since the—I think 1996 was when I first encountered R. It has incredible breadth and depth of models that Python hasn’t yet gotten.
But I honestly think that for a lot of our work, that’s not the most important variable. It’s what our client teams and partners can use.
I encourage my team—when we’re working with central engineering, we use Scala. They like Scala. And they like SPARC. So we use that.
When we’re dealing with one of EA’s studios where the engineers and the embedded analysts are fond of Python, we’ll use Python.
Q: Does the system track dependencies between outputs?
Neely: It can in a very primitive sense. But not I think in the wiser, more sophisticated sense that you’re asking. So it’s possible to do that. Because this just generally takes scripts from a Perforce directory and says run them. If that Perforce directory, if the work there was set up to do that tracking, it can.
But the scale of this is really at this point very much an extension of that laptop model. It’s not taking the results, putting it in a repository, running it in such a way that you can cache the results and then get them, except in the sense that R and good practices within that directory would natively support.
But that’s a great idea. I’m going to take it back to the team.
Q: I’m curious. How long and how many engineers does it take to improve this pipeline? And then on an ongoing basis, how much effort is required to maintain this as they come along?
Neely: Oh, good question. So how many engineers it took was one. But remember that I did find a unicorn.
I found a unicorn who made it happen in fairly short order once he spent the time—and this is a beautiful example of the importance of collaboration and engineering. He spent the time working with the team to figure out their workflows and realized that a fairly lightweight system could support this for the next step in our maturation as a software-producing organization.
It took Ben relatively little time to make this, but he’s a genius and a unicorn. I haven’t found another one since he moved off to Twitch. I forgive him for that. I’m happy for him. He’s very happy there.
In terms of maintaining it, this also, I think, is a reflection of Ben. He trained one of our data scientists who’s primarily a data scientist. He’s got a master’s degree in statistics, but loves programming and has been doing it for a few years, to maintain it and understand it. And it’s been relatively problem-free.
So we haven’t had much maintenance we’ve had to do, except small things like updates when some R package is changed. It’s been pretty easy. It is very lightweight. It is really a small thing.
I think Colleen in her description of this as dealing with the bookends for the system was really quite perceptive in terms of where the teams needed help.