Skip to content
    Latest

    Data Exploration with Pandas Profiler and D-Tale

    We all have heard how data is the new oil. I always say that if that is the case, we need to go through some refinement process before that raw oil is converted into useful...

    Explaining black-box models using attribute importance, PDPs, and LIME

    In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights...

    Building a Named Entity Recognition model using a BiLSTM-CRF network

    In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely available annotated...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...

    Fireside Chat: Stig Pedersen from Topdanmark

    "In having one or two very successful algorithmic deployments, the business then begins coming to you to ask for assistance. It becomes a mutual...

    Faster data exploration in Jupyter through Lux

    Notebooks have become one of the key primary tools for many data scientists. They offer a clear way to collaborate with others throughout the process...

    Density-Based Clustering

    Original content by Manojit Nandi - Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use...

    Analyzing Large P Small N Data - Examples from Microbiome

    Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to...

    The Curse of Dimensionality

    Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples)...

    Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

    Domino and Okera - Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In...

    Evaluating Ray: Distributed Python for Massive Scalability

    Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you...

    Evaluating Generative Adversarial Networks (GANs)

    This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are...

    Themes and Conferences per Pacoid, Episode 12

    Paco Nathan's latest monthly article covers Sci Foo as well as why data science leaders should rethink hiring and training priorities for their data...