Tag: Data Engineering
Simplify data access and publish model results in Snowflake using Domino Data Lab
Introduction Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when...
The Curse of Dimensionality
Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows...
Machine Learning in Production: Software Architecture
Special thanks to Addison-Wesley Professional for permission to excerpt the following "Software Architecture" chapter from the book, Machine Learning in Production. This chapter excerpt provides data...
Themes and Conferences per Pacoid, Episode 7
Paco Nathan covers recent research on data infrastructure as well as adoption of machine learning and AI in the enterprise. Introduction Welcome back to our monthly...
Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java
In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She...
Collaboration Between Data Science and Data Engineering: True or False?
This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down...
Docker, but for Data
Aneesh Karve, Co-founder and CTO of Quilt, visited the Domino MeetUp to discuss the evolution of data infrastructure. This blog post provides a session summary, video,...
New G3 Instances in AWS – Worth it for Machine Learning?
We benchmarked AWS’s new G3 instances for deep learning tasks and found they significantly outperform the older P2 instances. The new G3 instances are now available...
Data Science on AWS: Benefits and Common Pitfalls
More than two years ago, we wrote about the misguided fear of the cloud among many enterprise companies. How quickly things change! Today, every enterprise we...
Deep Learning on GPUs without the Environment Setup
We have seen an explosion of interest among data scientists who want to use GPUs for training deep learning models. While the libraries to support this...
Enabling Data Science Agility with Docker
This post describes how Domino uses Docker to solve a number of interconnected problems for data scientists and researchers, related to environment agility and reproducibility of...