Latest

Accelerating model velocity through Snowflake Java UDF integration

Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data science efforts in order to shift...

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the inner workings of...

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

In this article, we'll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot anomalies that the human...

Ray for Data Science: Distributed Python tasks at scale

Editors Note: This article was originally posted on Patterson Consulting's blog and can be found at http://www.pattersonconsultingtn.com/blog/blog_index.html and has been republished with permission. Why Do We...

Enterprise-class NLP with spaCy v3

spaCy is a python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin document analysis, chatbot capabilities, and...

How to supercharge data exploration with Pandas Profiling

Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary statistics or descriptive charts. Pandas Profiling,...

Simplify data access and publish model results in Snowflake using Domino Data Lab

Introduction Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when...

PyCaret 2.2: Efficient Pipelines for Model Development

Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding.  Even for experienced developers and data...

Faster data exploration in Jupyter through Lux

Notebooks have become one of the key primary tools for many data scientists. They offer a clear way to collaborate with others throughout the process of...

Performing Non-Compartmental Analysis with Julia and Pumas AI

When analysing pharmacokinetic data to determine the degree of exposure of a drug and associated pharmacokinetic parameters (e.g., clearance, elimination half-life, maximum observed concentration ([latex]C_{max}[/latex]), time...

Density-Based Clustering

Original content by Manojit Nandi - Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use clustering to identify malfunctioning...

Analyzing Large P Small N Data – Examples from Microbiome

Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to measure all the molecules of interest...

Bringing ML to Agriculture: Transforming a Millennia-old Industry

Guest post by Jeff Melching from The Climate Corporation At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions...

Next page