Skip to content
    Latest

    Spark, Dask, and Ray: Choosing the Right Framework

    Apache Spark, Dask, and Ray are three of the most popular frameworks for distributed computing. In this blog post we look at their history, intended use-cases, strengths and...

    On-Demand Spark clusters with GPU acceleration

    Apache Spark has become the de facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition...

    Themes and Conferences per Pacoid, Episode 13

    Paco Nathan's latest article covers data practices from the National Oceanic and Atmospheric Administration (NOAA) Environment Data Management (EDM)...

    Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

    In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid...

    Making PySpark Work with spaCy: Overcoming Serialization Errors

    In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate...

    Using Apache Spark to Analyze Large Neuroimaging Datasets

    This article was written by Sergul Aydore, Ph.D., and Syed Ashrafulla, Ph.D. Sergul and Syed received their Ph.D.s in Electrical Engineering in 2014...

    Announcing Enhanced Apache Spark Support

    Domino now offers data scientists a simple, yet incredibly powerful way to conduct quantitative work using Apache Spark. Apache Spark has captured...