Subject archive for "reproducibility"

Comma separated values containing integers

Perspective

The Case for Reproducible Data Science

Reproducibility is a cornerstone of the scientific method and ensures that tests and experiments can be reproduced by different teams using the same method. In the context of data science, reproducibility means that everything needed to recreate the model and its results such as data, tools, libraries, frameworks, programming languages and operating systems, have been captured, so with little effort the identical results are produced regardless of how much time has passed since the original project.

By Sundeep Teki8 min read

Perspective

Seeking Reproducibility within Social Science: Search and Discovery

Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search & Discovery” at Rev. Lane described the approach that the Coleridge Initiative is taking to address the science reproducibility challenge. The approach is to provide remote access for government analysts and researchers to confidential data in a secure data facility and to build analytical capacity and collaborations through an Applied Data Analytics training program. This article provides a distilled summary and a written transcript of Lane’s talk at Rev. Many thanks to Julia Lane for providing feedback on this post prior to publication.

By Ann Spencer25 min read

Data Science

MNIST Expanded: 50,000 New Samples Added

This post provides a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset.

By Ann Spencer5 min read

Data Science

Addressing Irreproducibility in the Wild

This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s "The Ingredients of a Reproducible Machine Learning Model" talk at a recent WiMLDS meetup. Mawer is a Principal Data Scientist at Lineage Logistics as well as an Adjunct Lecturer at Northwestern University. Special thanks to Mawer for the permission to excerpt the slides in this Domino Data Science Field Note. The full deck is available here.

By Ann Spencer7 min read

Data Science

Learn from the Reproducibility Crisis in Science

Key highlights from Clare Gollnick’s talk, “The limits of inference: what data scientists can learn from the reproducibility crisis in science”, are covered in this Domino Data Science Field Note. The full video is available for viewing here.

By Domino5 min read

Data Science

Data Scientist? Programmer? Are They Mutually Exclusive?

This Domino Data Science Field Note blog post provides highlights of Hadley Wickham’s ACM Chicago talk, “You Can’t Do Data Science in a GUI”. In his talk, Wickham advocates that, unlike a GUI, using code provides reproducibility, data provenance, and the ability to track changes so that data scientists have the ability to see how the data analysis has evolved. As the creator of ggplot2, it is not a surprise that Wickham also advocates the use of visualizations and models together to help data scientists find the real signals within their data. This blog post also provides clips from the original video and follows the Creative Commons license affiliated with the original video recording.

By Ann Spencer7 min read

12 3 4

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.