Vasco Yasenov
About Me
CV
Blog
Research
Children’s Book
Categories
All
(42)
bayesian methods
(1)
bootstrap
(3)
causal inference
(11)
correlation
(8)
heterogeneous treatment effects
(2)
hypothesis testing
(3)
instrumental variables
(1)
linear models
(1)
machine learning
(9)
missing data
(1)
multiple testing
(2)
nonparametric models
(1)
paradox
(3)
parametric models
(1)
propensity score
(1)
randomized experiments
(5)
semiparametric models
(1)
statistical inference
(16)
statistical models
(1)
variable selection
(1)
weights
(2)
Advanced Topics in Statistical Data Science
The Kolmogorov–Smirnov Test as a Goodness-of-fit
statistical inference
The Kolmogorov–Smirnov (KS) test is a staple in the statistical toolbox for checking how well data fit a hypothesized distribution. It comes in both a one-sample and a…
May 5, 2025
Jackknife vs. Bootstrap: A Tale of Two Resamplers
bootstrap
statistical inference
If you’ve ever dived into resampling methods, you’ve likely come across the
jackknife
and the
bootstrap
. They both aim to help us estimate uncertainty or bias without…
May 4, 2025
The Roles of Covariates in Randomized Experiments
randomized experiments
causal inference
Properly implemented randomized experiments—such as randomized controlled trials (RCTs) and A/B tests—guarantee unbiased estimates of the causal effect of a treatment
\(T\)
o…
May 2, 2025
Causal Inference with Residualized Regressions
causal inference
linear models
The Frisch-Waugh-Lovell (FWL) theorem offers an elegant alternative to standard multivariate linear regression when estimating causal effects. Instead of running a full…
May 1, 2025
Causal vs. Predictive Modeling: Subtle, but Crucial Differences
causal inference
machine learning
It’s one of the most common mix-ups I see among data scientists—especially those coming from a machine learning background: confusing causal modeling with predictive…
Apr 30, 2025
The Two Types of Weights in Causal Inference
weights
causal inference
Causal inference fundamentally seeks to answer: What is the effect of a treatment or intervention? The challenge lies in ensuring that the comparison groups—treated versus…
Feb 28, 2025
Binscatter: A New Visual Tool for Data Analysis
correlation
In the realm of data visualization, the classical scatter plot has long been a staple for exploring bivariate relationships. However, as datasets grow larger and more…
Feb 9, 2025
Filling in Missing Data with MCMC
missing data
Every dataset inevitably contains missing or incomplete values. Practitioners then face the dilemma of how to address these missing observations. A common approach, though…
Jan 31, 2025
The Limits of Semiparametric Models: The Efficiency Bound
statistical inference
semiparametric models
The efficiency bound is a cornerstone of the academic literature on semiparametric models, and it’s easy to see why. This bound quantifies the potential loss in efficiency…
Jan 22, 2025
The Limits of Nonparametric Models
statistical inference
nonparametric models
Nonparametric statistics offers a powerful toolkit for data analysis when the underlying data-generating process is too complex or unknown to be captured by parametric…
Jan 22, 2025
The Limits of Parametric Models: The Cramér-Rao Bound
statistical inference
parametric models
Obtaining the lowest possible variance is a primary goal for anyone working with statistical models. Efficiency (or precision), as is the jargon, is a cornerstone of…
Jan 12, 2025
The Three Classes of Statistical Models
statistical models
Statistical modeling is among the most exciting elements of working with data. When mentoring junior data scientists, I never fail to see the spark in their eyes when our…
Jan 12, 2025
The Delta Method: Simplifying Confidence Intervals for Complex Estimators
statistical inference
You’ve likely encountered this scenario: you’ve calculated an estimate for a particular parameter, and now you require a confidence interval. Seems straightforward, doesn’t…
Jan 10, 2025
Stein’s Paradox: A Simple Illustration
statistical inference
paradox
In the realm of statistics, few findings are as counterintuitive and fascinating as Stein’s paradox. It defies our common sense about estimation and provides a glimpse into…
Jan 10, 2025
Mutual Information: What, Why, How, and When
correlation
When exploring dependencies between variables, the data scientist’s toolbox often relies on correlation measures to reveal relationships and potential patterns. But what if…
Jan 2, 2025
Generating Variables with Predefined Correlation
correlation
Suppose you are working on a project where the relationship between two variables is influenced by an unobserved confounder, and you want to simulate data that reflects this…
Dec 20, 2024
Stratified Sampling with Continuous Variables
randomized experiments
causal inference
Stratified sampling is a foundational technique in survey design, ensuring that observations capture key characteristics of a population. By dividing the data into distinct…
Dec 18, 2024
Column-Sampling Bootstrap?
bootstrap
statistical inference
The bootstrap is a versatile resampling technique traditionally focused on rows. Let’s add a twist to the plain vanilla bootstrap. Imagine you have a wide dataset—many…
Dec 16, 2024
The Bootstrap and its Limitations
bootstrap
statistical inference
The bootstrap is a powerful resampling technique used to estimate the sampling distribution of a statistic. By repeatedly drawing observations with replacement from the…
Dec 16, 2024
Simpson’s Paradox: A Simple Illustration
paradox
causal inference
Simpson’s paradox is one of the most counterintuitive phenomena in data analysis. It describes situations where a trend observed within groups disappears—or even…
Dec 6, 2024
Causation without Correlation
causal inference
correlation
While most people understand that correlation doesn’t imply causation, it might surprise many to learn that causation doesn’t always result in correlation. In the absence of…
Nov 21, 2024
Gradient Boosting Methods: A Brief Overview
machine learning
Gradient boosting has emerged as one of the most powerful techniques for predictive modeling. In its simplest form, we can think of gradient boosting like having a team of…
Nov 6, 2024
Bayesian Analysis of Randomized Experiments: A Modern Approach
bayesian methods
randomized experiments
Imagine you’re a data scientist evaluating an A/B test of a new recommendation algorithm. The results show a modest but promising 0.5% lift in conversion rate—up from
\(8\%\)
…
Oct 29, 2024
Weights in Statistical Analyses
weights
statistical inference
Weights in statistical analyses offer a way to assign varying importance to observations in a dataset. Although powerful, they can be quite confusing due to the various…
Sep 18, 2024
Causality without Experiments, Unconfoundedness, or Instruments
causal inference
instrumental variables
Causality is central to many practical data-related questions. Conventional methods for isolating causal relationships rely on experimentation, assume unconfoundedness, or…
Aug 12, 2024
FOCI: A New Variable Selection Method
variable selection
machine learning
In our data-abundant world, we often have access to tens, hundreds, or even thousands of variables. Most of these features are usually irrelevant or redundant, leading to…
Jun 11, 2024
Nonlinear Correlations and Chatterjee’s Coefficient
correlation
Much of data science is concerned with learning about the relationships between different variables. The most basic tool to quantify relationship strength is the correlation…
Apr 12, 2024
A Brief Introduction to Conformal Inference
machine learning
Traditional confidence intervals estimate the range in which a population parameter, such as a mean or regression coefficient, is likely to fall with a specified level of…
Dec 20, 2023
Using Conformal Inference for Variable Importance in Machine Learning
machine learning
Many machine learning (ML) methods operate as opaque systems, generating predictions when given a dataset as input. Identifying which variables have the greatest impact on…
Dec 20, 2023
New Developments in False Discovery Rate
multiple testing
statistical inference
A while back I wrote an article summarizing various approaches to correcting for multiple hypothesis testing. The dominant framework, False Discovery Rate (FDR), controls…
Oct 27, 2023
ML-Based Regression Adjustments in Randomized Experiments
machine learning
randomized experiments
Randomized experiments are the gold standard when interested in measuring causal relationships with data. In settings with small treatment effects or underpowered designs, a…
Aug 1, 2023
The Alphabet of Learners for Heterogeneous Treatment Effects
machine learning
randomized experiments
heterogeneous treatment effects
Numerous tales illustrate the inadequacy of the average to capture meaningful quantities. Statisticians love these. In my favorite one the protagonist places her head in a…
Jul 28, 2023
Lasso for Heterogeneous Treatment Effects Estimation
heterogeneous treatment effects
causal inference
Lasso is one of my favorite machine learning algorithms. It is so simple, elegant, and powerful. My feelings aside, Lasso indeed has a lot to offer. While, admittedly, it is…
Jun 30, 2023
An Overview of Machine Learning Methods in Causal Inference
machine learning
causal inference
The most exciting trend in causal inference over the last decade has been the infusion of machine learning (ML) techniques. Supervised machine learning is designed to find…
Apr 30, 2023
The Variance of Propensity Score Matching Estimators
propensity score
causal inference
Propensity score matching (PSM) is among the most popular methods for estimating causal effects with observational data. It lends its fame to both its power and simplicity.…
Mar 30, 2023
Correlation is a Cosine
correlation
statistical inference
You might have come across the statement, “correlation is a cosine,” but never taken the time to explore its precise meaning. It certainly sounds intriguing—how can the…
Feb 9, 2023
Correlation is Not (Always) Transitive
correlation
statistical inference
At first, I found this really puzzling.
\(X\)
is correlated (Pearson) with Y, and Y is correlated with
\(Z\)
. Does this mean X is necessarily correlated with
\(Z\)
?…
Dec 22, 2022
Lord’s Paradox: A Simple Illustration
correlation
paradox
Lord’s paradox presents a fascinating challenge in causal inference and statistics. It highlights how different statistical methods applied to the same data can lead to…
Dec 18, 2022
Hypothesis Testing in Linear Machine Learning Models
hypothesis testing
machine learning
Machine learning models are an indispensable part of data science. They are incredibly good at what they are designed for – making excellent predictions. They fall short in…
Nov 6, 2022
Multiple Testing: Methods Overview
multiple testing
statistical inference
The abundance of data around us is a major factor making the data science field so attractive. It enables all kinds of impactful, interesting, or fun analyses. I admit this…
Oct 22, 2022
Hypothesis Testing with Population Data
hypothesis testing
statistical inference
Classical statistical theory is built on the idea of working with a sample of data from a given population of interest. Our software packages compute confidence intervals to…
Sep 23, 2022
Overlapping Confidence Intervals and Statistical (In)Significance
statistical inference
hypothesis testing
This is a mistake I’ve made myself—more times than I’d like to admit. Even seasoned professors and expert data scientists sometimes fall into the same trap.
Aug 12, 2022
No matching items