Skip to content
    Latest

    Feature extraction and image classification using Deep Neural Networks and OpenCV

    In a previous blog post we talked about the foundations of Computer vision, the history and capabilities of the OpenCV framework, and how to make your first steps in...

    Data Exploration with Pandas Profiler and D-Tale

    We all have heard how data is the new oil. I always say that if that is the case, we need to go through some refinement process before that raw oil...

    Explaining black-box models using attribute importance, PDPs, and LIME

    In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights...

    Building a Named Entity Recognition model using a BiLSTM-CRF network

    In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely available annotated...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...

    Fireside Chat: Stig Pedersen from Topdanmark

    "In having one or two very successful algorithmic deployments, the business then begins coming to you to ask for assistance. It becomes a mutual...

    Faster data exploration in Jupyter through Lux

    Notebooks have become one of the key primary tools for many data scientists. They offer a clear way to collaborate with others throughout the process...

    Density-Based Clustering

    Original content by Manojit Nandi - Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use...

    Analyzing Large P Small N Data - Examples from Microbiome

    Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to...

    The Curse of Dimensionality

    Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples)...

    Providing fine-grained, trusted access to enterprise datasets with Okera and Domino

    Domino and Okera - Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In...

    Evaluating Ray: Distributed Python for Massive Scalability

    Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you...

    Evaluating Generative Adversarial Networks (GANs)

    This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are...

    Themes and Conferences per Pacoid, Episode 12

    Paco Nathan's latest monthly article covers Sci Foo as well as why data science leaders should rethink hiring and training priorities for their data...

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.