Vasco Yasenov
  • About Me
  • CV
  • Blog
  • Research
  • Children’s Book
Categories
All (42)
bayesian methods (1)
bootstrap (3)
causal inference (11)
correlation (8)
heterogeneous treatment effects (2)
hypothesis testing (3)
instrumental variables (1)
linear models (1)
machine learning (9)
missing data (1)
multiple testing (2)
nonparametric models (1)
paradox (3)
parametric models (1)
propensity score (1)
randomized experiments (5)
semiparametric models (1)
statistical inference (16)
statistical models (1)
variable selection (1)
weights (2)

Advanced Topics in Statistical Data Science

 

The Kolmogorov–Smirnov Test as a Goodness-of-fit

statistical inference
The Kolmogorov–Smirnov (KS) test is a staple in the statistical toolbox for checking how well data fit a hypothesized distribution. It comes in both a one-sample and a…
May 5, 2025
 

Jackknife vs. Bootstrap: A Tale of Two Resamplers

bootstrap
statistical inference
If you’ve ever dived into resampling methods, you’ve likely come across the jackknife and the bootstrap. They both aim to help us estimate uncertainty or bias without…
May 4, 2025
 

The Roles of Covariates in Randomized Experiments

randomized experiments
causal inference
Properly implemented randomized experiments—such as randomized controlled trials (RCTs) and A/B tests—guarantee unbiased estimates of the causal effect of a treatment \(T\) o…
May 2, 2025
 

Causal Inference with Residualized Regressions

causal inference
linear models
The Frisch-Waugh-Lovell (FWL) theorem offers an elegant alternative to standard multivariate linear regression when estimating causal effects. Instead of running a full…
May 1, 2025
 

Causal vs. Predictive Modeling: Subtle, but Crucial Differences

causal inference
machine learning
It’s one of the most common mix-ups I see among data scientists—especially those coming from a machine learning background: confusing causal modeling with predictive…
Apr 30, 2025
 

The Two Types of Weights in Causal Inference

weights
causal inference
Causal inference fundamentally seeks to answer: What is the effect of a treatment or intervention? The challenge lies in ensuring that the comparison groups—treated versus…
Feb 28, 2025

Binscatter: A New Visual Tool for Data Analysis

correlation
In the realm of data visualization, the classical scatter plot has long been a staple for exploring bivariate relationships. However, as datasets grow larger and more…
Feb 9, 2025
 

Filling in Missing Data with MCMC

missing data
Every dataset inevitably contains missing or incomplete values. Practitioners then face the dilemma of how to address these missing observations. A common approach, though…
Jan 31, 2025
 

The Limits of Semiparametric Models: The Efficiency Bound

statistical inference
semiparametric models
The efficiency bound is a cornerstone of the academic literature on semiparametric models, and it’s easy to see why. This bound quantifies the potential loss in efficiency…
Jan 22, 2025

The Limits of Nonparametric Models

statistical inference
nonparametric models
Nonparametric statistics offers a powerful toolkit for data analysis when the underlying data-generating process is too complex or unknown to be captured by parametric…
Jan 22, 2025
 

The Limits of Parametric Models: The Cramér-Rao Bound

statistical inference
parametric models
Obtaining the lowest possible variance is a primary goal for anyone working with statistical models. Efficiency (or precision), as is the jargon, is a cornerstone of…
Jan 12, 2025

The Three Classes of Statistical Models

statistical models
Statistical modeling is among the most exciting elements of working with data. When mentoring junior data scientists, I never fail to see the spark in their eyes when our…
Jan 12, 2025
 

The Delta Method: Simplifying Confidence Intervals for Complex Estimators

statistical inference
You’ve likely encountered this scenario: you’ve calculated an estimate for a particular parameter, and now you require a confidence interval. Seems straightforward, doesn’t…
Jan 10, 2025
 

Stein’s Paradox: A Simple Illustration

statistical inference
paradox
In the realm of statistics, few findings are as counterintuitive and fascinating as Stein’s paradox. It defies our common sense about estimation and provides a glimpse into…
Jan 10, 2025

Mutual Information: What, Why, How, and When

correlation
When exploring dependencies between variables, the data scientist’s toolbox often relies on correlation measures to reveal relationships and potential patterns. But what if…
Jan 2, 2025

Generating Variables with Predefined Correlation

correlation
Suppose you are working on a project where the relationship between two variables is influenced by an unobserved confounder, and you want to simulate data that reflects this…
Dec 20, 2024
 

Stratified Sampling with Continuous Variables

randomized experiments
causal inference
Stratified sampling is a foundational technique in survey design, ensuring that observations capture key characteristics of a population. By dividing the data into distinct…
Dec 18, 2024
 

Column-Sampling Bootstrap?

bootstrap
statistical inference
The bootstrap is a versatile resampling technique traditionally focused on rows. Let’s add a twist to the plain vanilla bootstrap. Imagine you have a wide dataset—many…
Dec 16, 2024
 

The Bootstrap and its Limitations

bootstrap
statistical inference
The bootstrap is a powerful resampling technique used to estimate the sampling distribution of a statistic. By repeatedly drawing observations with replacement from the…
Dec 16, 2024

Simpson’s Paradox: A Simple Illustration

paradox
causal inference
Simpson’s paradox is one of the most counterintuitive phenomena in data analysis. It describes situations where a trend observed within groups disappears—or even…
Dec 6, 2024
 

Causation without Correlation

causal inference
correlation
While most people understand that correlation doesn’t imply causation, it might surprise many to learn that causation doesn’t always result in correlation. In the absence of…
Nov 21, 2024
 

Gradient Boosting Methods: A Brief Overview

machine learning
Gradient boosting has emerged as one of the most powerful techniques for predictive modeling. In its simplest form, we can think of gradient boosting like having a team of…
Nov 6, 2024

Bayesian Analysis of Randomized Experiments: A Modern Approach

bayesian methods
randomized experiments
Imagine you’re a data scientist evaluating an A/B test of a new recommendation algorithm. The results show a modest but promising 0.5% lift in conversion rate—up from \(8\%\)…
Oct 29, 2024
 

Weights in Statistical Analyses

weights
statistical inference
Weights in statistical analyses offer a way to assign varying importance to observations in a dataset. Although powerful, they can be quite confusing due to the various…
Sep 18, 2024
 

Causality without Experiments, Unconfoundedness, or Instruments

causal inference
instrumental variables
Causality is central to many practical data-related questions. Conventional methods for isolating causal relationships rely on experimentation, assume unconfoundedness, or…
Aug 12, 2024
 

FOCI: A New Variable Selection Method

variable selection
machine learning
In our data-abundant world, we often have access to tens, hundreds, or even thousands of variables. Most of these features are usually irrelevant or redundant, leading to…
Jun 11, 2024
 

Nonlinear Correlations and Chatterjee’s Coefficient

correlation
Much of data science is concerned with learning about the relationships between different variables. The most basic tool to quantify relationship strength is the correlation…
Apr 12, 2024
 

A Brief Introduction to Conformal Inference

machine learning
Traditional confidence intervals estimate the range in which a population parameter, such as a mean or regression coefficient, is likely to fall with a specified level of…
Dec 20, 2023
 

Using Conformal Inference for Variable Importance in Machine Learning

machine learning
Many machine learning (ML) methods operate as opaque systems, generating predictions when given a dataset as input. Identifying which variables have the greatest impact on…
Dec 20, 2023
 

New Developments in False Discovery Rate

multiple testing
statistical inference
A while back I wrote an article summarizing various approaches to correcting for multiple hypothesis testing. The dominant framework, False Discovery Rate (FDR), controls…
Oct 27, 2023
 

ML-Based Regression Adjustments in Randomized Experiments

machine learning
randomized experiments
Randomized experiments are the gold standard when interested in measuring causal relationships with data. In settings with small treatment effects or underpowered designs, a…
Aug 1, 2023
 

The Alphabet of Learners for Heterogeneous Treatment Effects

machine learning
randomized experiments
heterogeneous treatment effects
Numerous tales illustrate the inadequacy of the average to capture meaningful quantities. Statisticians love these. In my favorite one the protagonist places her head in a…
Jul 28, 2023
 

Lasso for Heterogeneous Treatment Effects Estimation

heterogeneous treatment effects
causal inference
Lasso is one of my favorite machine learning algorithms. It is so simple, elegant, and powerful. My feelings aside, Lasso indeed has a lot to offer. While, admittedly, it is…
Jun 30, 2023
 

An Overview of Machine Learning Methods in Causal Inference

machine learning
causal inference
The most exciting trend in causal inference over the last decade has been the infusion of machine learning (ML) techniques. Supervised machine learning is designed to find…
Apr 30, 2023
 

The Variance of Propensity Score Matching Estimators

propensity score
causal inference
Propensity score matching (PSM) is among the most popular methods for estimating causal effects with observational data. It lends its fame to both its power and simplicity.…
Mar 30, 2023
 

Correlation is a Cosine

correlation
statistical inference
You might have come across the statement, “correlation is a cosine,” but never taken the time to explore its precise meaning. It certainly sounds intriguing—how can the…
Feb 9, 2023
 

Correlation is Not (Always) Transitive

correlation
statistical inference
At first, I found this really puzzling. \(X\) is correlated (Pearson) with Y, and Y is correlated with \(Z\). Does this mean X is necessarily correlated with \(Z\)?…
Dec 22, 2022
 

Lord’s Paradox: A Simple Illustration

correlation
paradox
Lord’s paradox presents a fascinating challenge in causal inference and statistics. It highlights how different statistical methods applied to the same data can lead to…
Dec 18, 2022
 

Hypothesis Testing in Linear Machine Learning Models

hypothesis testing
machine learning
Machine learning models are an indispensable part of data science. They are incredibly good at what they are designed for – making excellent predictions. They fall short in…
Nov 6, 2022

Multiple Testing: Methods Overview

multiple testing
statistical inference
The abundance of data around us is a major factor making the data science field so attractive. It enables all kinds of impactful, interesting, or fun analyses. I admit this…
Oct 22, 2022
 

Hypothesis Testing with Population Data

hypothesis testing
statistical inference
Classical statistical theory is built on the idea of working with a sample of data from a given population of interest. Our software packages compute confidence intervals to…
Sep 23, 2022
 

Overlapping Confidence Intervals and Statistical (In)Significance

statistical inference
hypothesis testing
This is a mistake I’ve made myself—more times than I’d like to admit. Even seasoned professors and expert data scientists sometimes fall into the same trap.
Aug 12, 2022
No matching items

    © 2025 Vasco Yasenov

     

    Powered by Quarto