Why understanding key differences between data science and engineering matters
As data science becomes more mature within an organization, engineering leaders are often pulled into leading, enabling, and collaborating with data science team members. While there are similarities between data science and software development (e.g., both include code), well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Conflict and unproductive workflows that engineering leaders are then tasked with resolving. Data science, unlike software development, is more similar to research, has unique computing demands, and the teams often work closely with business stakeholders with whom engineering teams don't typically engage.
Data science is more like research than engineering
Engineering involves building something that is already understood ahead of time. This allows engineering teams to track, monitor, predict, and control the engineering process. However, data science projects are often centered around answering a question that may turn into an insight or model. This focus on answering a question is what makes data science an exploratory and experimental research process. This also results in the need for more flexibility and agility around data science infrastructure and tooling than what is needed within engineering.
Variable computing demands
Engineering teams build software that may run on high-performance architecture. The engineering team uses infrastructure for testing and QA, and the infrastructure needs are static and predictable. Individual engineers often work on a single machine with a 16-32GB of RAM and four-to-eight cores. In contrast, data science projects’ compute capacity is not predictable and constant. Data science work involves computationally intensive experiments. Memory and CPU can be a bottleneck. For example, it could take 30 minutes to write code for an experiment and then it could take eight hours to run the experiment on a laptop. To avoid this type of bottleneck, the data scientist may utilize large machines for parallelizing work across cores or loading more data into memory.
Integration with other parts of the organization
While engineering is aligned with the organization's overall priorities, engineering teams are often independent and their work does not require close integration with finance, marketing, or HR teams. Data science projects are often focused on answering a question for a business stakeholder. For example, a data science team would work very closely with the HR team when building models for employee retention.
In this post, we discussed data science’s similarity to research, data science’s variable computing demands, and how data scientists often work closely with business stakeholders with whom engineering teams do not typically engage. If you are interested in reading more about how to enable data science within your organization, please see The Practical Guide to Managing Data Science at Scale.
New to Domino? Consider a Guided Tour.Watch a Demo of Domino
Recent PostsTransformers - Self-Attention to the rescue How data science can fail faster to leap ahead N-shot and Zero-shot learning with Python A Hands-on Tutorial for Transfer Learning in Python Getting started with k-means clustering in Python Feature extraction and image classification using Deep Neural Networks and OpenCV Getting Started with OpenCV Speeding up Machine Learning with parallel C/C++ code execution via Spark Semi-uniform strategies for solving K-armed bandits Polars - A lightning fast DataFrames library
Other posts you might be interested in
Subscribe to the Data Science Blog
Receive data science tips and tutorials from leading Data Scientists right to your inbox.