Subject archive for "sparkling-ml"

Machine Learning

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid rewriting spaCy into Java. She has already written a complementary blog post on using spaCy to process text data for Domino. Karau is a Developer Advocate at Google as well as a co-author on High Performance Spark and Learning Spark. She also has a repository of her talks, code reviews, and code sessions on Twitch and Youtube.

By Holden Karau5 min read

Data Science

Making PySpark Work with spaCy: Overcoming Serialization Errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate at Google, as well as a co-author of "High Performance Spark" and "Learning Spark". She has a repository of her talks, code reviews and code sessions on Twitch and YouTube. She is also working on Distributed Computing 4 Kids.

By Domino8 min read

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.