Latest

Explaining black-box models using attribute importance, PDPs, and LIME

In this article we cover explainability for black-box models and show how to use different methods from the Skater framework to provide insights into the inner workings of a simple credit scoring neural network model. The...

Fitting Support Vector Machines via Quadratic Programming

In this blog post we take a deep dive into the internals of Support Vector Machines. We derive a Linear SVM classifier, explain...

The Future of Data Science – Mining GTC 2021 for Trends

Deep learning enthusiasts are increasingly putting NVIDIA’s GTC at the top of their gotta-be-there conference list. I enjoyed mining this year’s talks for...

Enterprise-class NLP with spaCy v3

spaCy is a python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin document analysis,...

How to supercharge data exploration with Pandas Profiling

Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary statistics or descriptive...

Code

The Curse of Dimensionality

Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be...

The importance of structure, coding style, and refactoring in notebooks

Notebooks are increasingly crucial in the data scientist's toolbox. Although considered relatively new, their history traces back to systems like Mathematica and MATLAB....

Machine Learning

Building a Named Entity Recognition model using a BiLSTM-CRF network

In this blog post we present the Named Entity Recognition problem and show how a BiLSTM-CRF model can be fitted using a freely...

Deep Learning Illustrated: Building Natural Language Processing Models

Many thanks to Addison-Wesley Professional for providing the permissions to excerpt "Natural Language Processing" from the book, Deep Learning Illustrated by Krohn, Beyleveld,...

Featured

Simplify data access and publish model results in Snowflake using Domino Data Lab

Introduction Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many...

Featured

Why models fail to deliver value and what you can do about it.

Building models requires a lot of time and effort. Data scientists can spend weeks just trying to find, capture and transform data into...

Practical Techniques

Accelerating model velocity through Snowflake Java UDF integration

Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data science efforts in...

Faster data exploration in Jupyter through Lux

Notebooks have become one of the key primary tools for many data scientists. They offer a clear way to collaborate with others throughout...

Leaders at Work

Fireside Chat: Stig Pedersen from Topdanmark

"In having one or two very successful algorithmic deployments, the business then begins coming to you to ask for assistance. It becomes a...

Defining clear metrics to drive model adoption and value creation

One of the biggest ironies of enterprise data science is that although data science teams are masters at using probabilistic models and diagnostic...

The Role of Containers on MLOps and Model Production

Container technology has changed the way data science gets done. The original container use case for data science focused on what I call,...

HyperOpt: Bayesian Hyperparameter Optimization

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is...

Deep Reinforcement Learning

This article provides an excerpt "Deep Reinforcement Learning" from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an...

Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines

This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner....

Ray for Data Science: Distributed Python tasks at scale

Editors Note: This article was originally posted on Patterson Consulting's blog and can be found at http://www.pattersonconsultingtn.com/blog/blog_index.html and has been republished with permission....

On-Demand Spark clusters with GPU acceleration

Apache Spark has become the de facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition...

Next page