At last week's useR! R User conference, I spoke on digital provenance, the importance of reproducible research, and how Domino has solved many of the challenges faced by data scientists when attempting this best practice. More on the topic, and a recording of the talk, below.
What are you doing to ensure that you're mitigating the many risks associated with provenance (or lack thereof)?
Reproducibility is important throughout the entire data science process. As recent studies have shown, subconscious biases in the exploratory analysis phase of a project can have vast repercussions over final conclusions. The problems with managing the deployment and life-cycle of models in production are vast and varied, and often reproducibility stops at the level of the individual analyst. Though R has best in class support for reproducible research, with tools like KnitR to packrat, they are limited in their scope.
In this talk we present a solution we have developed at Domino, which allows for every model in production to have full reproducibility from EDA to the training run and exact datasets which were used to generate. We discuss how we leverage Docker as our reproducibility engine, and how this allows us to provide the irrefutable provenance of a model.
New to Domino? Consider a Guided Tour.Watch a Demo of Domino
Recent PostsSnowflake and RAPIDS For On-Demand Computing by a Storm Parallel Computing with Dask: A Step-by-Step Tutorial Lightning fast CPU-based image captioning pipelines with Deep Learning and Ray Everything You Need to Know about Feature Stores 5 MLOps Best Practices for Large Organizations Choosing a Data-Governance Framework for Your Organization Transformers - Self-Attention to the rescue How data science can fail faster to leap ahead N-shot and Zero-shot learning with Python A Hands-on Tutorial for Transfer Learning in Python
Other posts you might be interested in
Subscribe to the Data Science Blog
Receive data science tips and tutorials from leading Data Scientists right to your inbox.