<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Vasco Yasenov</title>
<link>https://vyasenov.github.io/blog/</link>
<atom:link href="https://vyasenov.github.io/blog/index.xml" rel="self" type="application/rss+xml"/>
<description>Personal Site of Vasco Yasenov</description>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Mon, 06 Apr 2026 07:00:00 GMT</lastBuildDate>
<item>
  <title>6 Underrated Plot Types</title>
  <link>https://vyasenov.github.io/blog/six-plots-you-should-know.html</link>
  <description><![CDATA[ 





<div class="reading-time">7 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Most data science workflows rely on a familiar trio of plots: histograms, scatterplots, and boxplots. They are useful, but they leave a lot of structure hidden in the data.</p>
<p>There are several plots that statisticians use regularly but that rarely show up in typical data science notebooks. Many of these are extremely informative for diagnostics, distribution comparison, or exploring high-dimensional relationships.</p>
<p>In this post I’ll look at six of them. To keep things simple I will use the same dataset throughout: the classic <code>iris</code> dataset. The goal is not mathematical rigor but practical intuition and code you can reuse. All examples below are shown in <code>R</code> and <code>Python</code>.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<p>Let’s start by loading the data.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggridges)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("ggridges")</span></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(hexbin)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("hexbin")</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(corrplot)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("corrplot")</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> seaborn <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> sns</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> scipy.stats <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> stats</span>
<span id="cb2-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb2-6"></span>
<span id="cb2-7">iris_bunch <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-8">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris_bunch.frame.copy()</span>
<span id="cb2-9">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"species"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"target"</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(iris_bunch.target_names)))</span>
<span id="cb2-10">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris.rename(columns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{</span>
<span id="cb2-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal length (cm)"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>,</span>
<span id="cb2-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal width (cm)"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_width"</span>,</span>
<span id="cb2-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"petal length (cm)"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"petal_length"</span>,</span>
<span id="cb2-14">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"petal width (cm)"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"petal_width"</span>,</span>
<span id="cb2-15">})</span></code></pre></div></div>
</div>
</div>
</div>
<section id="q-q-plot" class="level3">
<h3 class="anchored" data-anchor-id="q-q-plot">Q-Q Plot</h3>
<p>A Q-Q plot compares sample quantiles to theoretical quantiles from a reference distribution. In practice that reference is usually the normal distribution, which makes the plot a fast diagnostic for residual checks and distributional shape. If the points line up, the sample is broadly consistent with the reference. If they bend away from the line, that tells you where the mismatch lives: skewness shows up as asymmetric curvature, while heavy tails pull the extremes away from the line. One can also use Q-Q plots to compare two empirical distributions, but I’d argue there are better ways to do that.</p>
<p>What I like about Q-Q plots is that they force you to think about <em>where</em> a distribution departs from a model, not just whether a normality test rejects. The downside is that they are easy to overread in small samples and less useful if you do not have a meaningful reference distribution in mind. Unlike traditional statistical tests, Q-Q plots do not spit out a <img src="https://latex.codecogs.com/png.latex?p">-value, so you have to interpret the plot yourself.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/qq-plot.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://vyasenov.github.io/files/qq-plot.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:85.0%" alt="Q-Q plot of iris sepal length"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(iris, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> Sepal.Length)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#66c2a5"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb3-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Q-Q Plot of Sepal Length"</span>,</span>
<span id="cb3-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Theoretical Quantiles"</span>,</span>
<span id="cb3-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sample Quantiles"</span></span>
<span id="cb3-9">  )</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">stats.probplot(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>], dist<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"norm"</span>, plot<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>plt)</span>
<span id="cb4-2">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Q-Q Plot of Sepal Length"</span>)</span>
<span id="cb4-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Theoretical Quantiles"</span>)</span>
<span id="cb4-4">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sample Quantiles"</span>)</span>
<span id="cb4-5">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="violin-plot" class="level3">
<h3 class="anchored" data-anchor-id="violin-plot">Violin Plot</h3>
<p>A violin plot combines a boxplot with a smoothed (symmetric) density estimate. That makes it useful when a plain boxplot feels too compressed. Two groups can have similar medians and quartiles but very different shapes, and a violin plot makes that visible immediately. In the <code>iris</code> data, it is a quick way to see that species differ not only in central tendency but in how concentrated or dispersed their sepal lengths are.</p>
<p>The main drawback is that the density is smoothed, so small samples can look more structured than they really are. It can also be sensitive to the smoothing parameters (bandwidth more than kernel type). Violins also become noisy if you cram in too many categories. Still, when I want a compact distribution comparison across a handful of groups, violin plots are often a strict upgrade over boxplots.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/violin-plot.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2"><img src="https://vyasenov.github.io/files/violin-plot.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:85.0%" alt="Violin plot of iris sepal length by species"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(iris, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Sepal.Length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_violin</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">trim =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.12</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outlier.shape =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Violin Plot of Sepal Length by Species"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">sns.violinplot(data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>iris, x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"species"</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>, inner<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"box"</span>)</span>
<span id="cb6-2">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Violin Plot of Sepal Length by Species"</span>)</span>
<span id="cb6-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span>
<span id="cb6-4">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>)</span>
<span id="cb6-5">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="ecdf-plot" class="level3">
<h3 class="anchored" data-anchor-id="ecdf-plot">ECDF Plot</h3>
<p>The empirical cumulative distribution function shows the share of observations less than or equal to a given value. That sounds modest, but it is one of the cleanest ways to compare distributions because it avoids arbitrary bin choices and displays the full sample directly. When one ECDF sits to the right of another, you can read that as a first-order stochastic dominance story, at least visually.</p>
<p>The ECDF is defined as</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7BF%7D_n(x)%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%201(X_i%20%5Cle%20x).%0A"></p>
<p>Do you remember that the PDF is the derivative of the CDF? Yes, CDF is really central to probability theory and understanding any variable at hand. In microeconomic theory classes, ECDFs are used to establish stochastic dominance relationships. I like ECDFs because they are honest. They show every observation’s contribution to the distribution without smoothing it away. The tradeoff is that they are less familiar to many audiences and can look busy when too many groups are overlaid. For side-by-side distribution comparison, though, they are hard to beat.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/ecdf-plot.png" class="lightbox" data-gallery="quarto-lightbox-gallery-3"><img src="https://vyasenov.github.io/files/ecdf-plot.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:85.0%" alt="ECDF plot of iris sepal length by species"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(iris, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(Sepal.Length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> Species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_ecdf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb7-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ECDF of Sepal Length by Species"</span>,</span>
<span id="cb7-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>,</span>
<span id="cb7-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Empirical CDF"</span></span>
<span id="cb7-8">  )</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">sns.ecdfplot(data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>iris, x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>, hue<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"species"</span>)</span>
<span id="cb8-2">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ECDF of Sepal Length by Species"</span>)</span>
<span id="cb8-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>)</span>
<span id="cb8-4">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Empirical CDF"</span>)</span>
<span id="cb8-5">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="ridgeline-plot" class="level3">
<h3 class="anchored" data-anchor-id="ridgeline-plot">Ridgeline Plot</h3>
<p>Ridgeline plots stack several density curves vertically, which makes them especially useful when you want to compare many related distributions at once. The variables, however, need to be on more-or-less the same scale for the plot to make sense. They are common in cohort analysis and time-based comparisons, but they also work well for grouped exploratory analysis like the species differences in <code>iris</code>.</p>
<p>Their advantage is compactness: you can compare several distributions without the visual clutter of heavy overlap. Their weakness is that they are still density plots, so the same caution about smoothing applies. I use ridgelines when I want a plot that is more expressive than small multiples but less chaotic than overlaying five or six densities in one panel.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/ridgeline-plot.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4"><img src="https://vyasenov.github.io/files/ridgeline-plot.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:85.0%" alt="Ridgeline plot of iris sepal length by species"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(iris, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> Sepal.Length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> Species, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> Species)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density_ridges</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ridges</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb9-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ridgeline Plot of Sepal Length by Species"</span>,</span>
<span id="cb9-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>,</span>
<span id="cb9-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb9-8">  )</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">species_order <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"setosa"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"versicolor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"virginica"</span>]</span>
<span id="cb10-2">x_grid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.linspace(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>,</span>
<span id="cb10-3">                     iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>)</span>
<span id="cb10-4">offsets <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>]</span>
<span id="cb10-5"></span>
<span id="cb10-6">fig, ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb10-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> offset, species <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(offsets, species_order):</span>
<span id="cb10-8">    subset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris.loc[iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"species"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> species, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>]</span>
<span id="cb10-9">    kde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.gaussian_kde(subset)</span>
<span id="cb10-10">    density <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> kde(x_grid)</span>
<span id="cb10-11">    density <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> density <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> density.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span></span>
<span id="cb10-12">    ax.fill_between(x_grid, offset, offset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> density, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>)</span>
<span id="cb10-13">    ax.plot(x_grid, offset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> density, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>)</span>
<span id="cb10-14">    ax.text(x_grid.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.02</span>, offset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.12</span>, species, ha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"right"</span>)</span>
<span id="cb10-15"></span>
<span id="cb10-16">ax.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ridgeline Plot of Sepal Length by Species"</span>)</span>
<span id="cb10-17">ax.set_xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>)</span>
<span id="cb10-18">ax.set_yticks([])</span>
<span id="cb10-19">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="hexbin-plot" class="level3">
<h3 class="anchored" data-anchor-id="hexbin-plot">Hexbin Plot</h3>
<p>Scatterplots are great until they are not. Have you tried a scatterplot with a million points? It’s slow and it’s hard to see anything. Once the sample gets large enough, overplotting hides the very structure you want to see. Hexbin plots solve that by aggregating points into small hexagonal cells and coloring those cells by count. You give up the exact point cloud, but in return you get a much clearer view of where the data are concentrated.</p>
<p>The <code>iris</code> data are too small to truly need a hexbin, which is worth saying out loud. But the plot still illustrates the logic well. On genuinely large datasets, this is often the right substitute for a scatterplot. The cost is that rare points and local outliers become less visible, so it is better for density structure than for point-level inspection.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/hexbin-plot.png" class="lightbox" data-gallery="quarto-lightbox-gallery-5"><img src="https://vyasenov.github.io/files/hexbin-plot.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:85.0%" alt="Hexbin plot of iris sepal length and petal length"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(iris, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(Sepal.Length, Petal.Length)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hex</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb11-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hexbin Plot of Sepal vs Petal Length"</span>,</span>
<span id="cb11-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>,</span>
<span id="cb11-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal Length"</span></span>
<span id="cb11-8">  )</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">plt.hexbin(</span>
<span id="cb12-2">    iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sepal_length"</span>],</span>
<span id="cb12-3">    iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"petal_length"</span>],</span>
<span id="cb12-4">    gridsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>,</span>
<span id="cb12-5">    cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"YlOrRd"</span></span>
<span id="cb12-6">)</span>
<span id="cb12-7">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal Length"</span>)</span>
<span id="cb12-8">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal Length"</span>)</span>
<span id="cb12-9">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hexbin Plot of Sepal vs Petal Length"</span>)</span>
<span id="cb12-10">plt.colorbar(label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>)</span>
<span id="cb12-11">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="corrgram" class="level3">
<h3 class="anchored" data-anchor-id="corrgram">Corrgram</h3>
<p>A corrgram turns a correlation matrix into something you can actually read. Before fitting a regression, building a clustering pipeline, or running PCA, I almost always want to know which variables are moving together and which are largely independent. A corrgram gives that answer in a single glance.</p>
<p>The upside is speed: strong blocks, redundancies, and likely multicollinearity jump out immediately. The downside is that (Pearson) correlation is a blunt summary. It only captures linear association, ignores conditional relationships, and can be badly distorted by outliers. Presumably, one can move away from Pearson correlation and do the same plot with other correlation measures. Corrgrams also don’t work well with too many variables. So I treat corrgrams as a screening device, not as evidence of mechanism. Used that way, they are extremely effective.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="../files/corrgram.png" class="lightbox" data-gallery="quarto-lightbox-gallery-6"><img src="https://vyasenov.github.io/files/corrgram.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:80.0%" alt="Corrgram of iris measurements"></a></p>
</figure>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">corr_matrix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(iris[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb13-2"></span>
<span id="cb13-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">corrplot</span>(</span>
<span id="cb13-4">  corr_matrix,</span>
<span id="cb13-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"color"</span>,</span>
<span id="cb13-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"upper"</span>,</span>
<span id="cb13-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tl.col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>,</span>
<span id="cb13-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tl.srt =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">45</span></span>
<span id="cb13-9">)</span></code></pre></div></div>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">corr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris.drop(columns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"species"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"target"</span>]).corr()</span>
<span id="cb14-2"></span>
<span id="cb14-3">sns.heatmap(corr, annot<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"RdBu_r"</span>, center<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb14-4">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Correlation Matrix (Corrgram)"</span>)</span>
<span id="cb14-5">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
<p>In the <code>iris</code> data, the corrgram immediately tells you that petal length and petal width are carrying very similar information. That is exactly the kind of thing you want to know before moving on to feature engineering, PCA, or a predictive model.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Q-Q plots are among the fastest ways to diagnose whether a distributional assumption is wrong and where it fails.</li>
<li>Violin plots and ECDFs are often better than boxplots and histograms when the goal is comparing full distributions across groups.</li>
<li>Ridgeline plots are excellent for compact multi-group distribution comparisons, especially when overlaid densities start to look messy.</li>
<li>Hexbin plots are the right replacement for scatterplots once overplotting becomes a real problem.</li>
<li>Corrgrams are simple but high-value screening tools before modeling, especially when redundancy and multicollinearity are on the table.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>Wilke’s <em>Fundamentals of Data Visualization</em> is what I have in my bookshelf, but I admit I don’t reach for it very often. Novice data scientists will surely benefit from it, though.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Wilke, C. O. (2019). <em>Fundamentals of Data Visualization</em>. O’Reilly Media.</p>


</section>

 ]]></description>
  <category>correlation</category>
  <category>statistical inference</category>
  <guid>https://vyasenov.github.io/blog/six-plots-you-should-know.html</guid>
  <pubDate>Mon, 06 Apr 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Principal Component Analysis</title>
  <link>https://vyasenov.github.io/blog/flavors-pca.html</link>
  <description><![CDATA[ 





<div class="reading-time">7 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Principal component analysis (PCA) is one of those methods that everyone learns early and then quietly keeps using for years. The appeal is obvious: take a high-dimensional data matrix, rotate it into orthogonal directions of maximum variance, and keep only the first few directions. That gives you compression, visualization, denoising, and sometimes a useful preprocessing step for downstream models.</p>
<p>The common misconception is that PCA is a generic tool for finding the “most important” variables or the “true latent factors” in the data. It is neither. Classical PCA finds directions of high variance. That is often useful, but it is not the same thing as finding predictive features, interpretable components, or nonlinear structure. Once you keep that distinction straight, the many PCA variants make much more sense: each flavor modifies classical PCA to target a different practical goal.</p>
<p>In this post I will use the standard PCA formulation as the baseline and then focus on four variants that I think matter most in applied work. The goal is to get a broad sense of some of the most popular ways PCA has evolved over the years.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?X%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D"> be a data matrix with rows as observations and columns as variables. Assume the columns have been centered so that</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi=1%7D%5En%20X_%7Bij%7D=0%20%5Cqquad%20%5Ctext%7Bfor%20%7D%20j=1,%5Cdots,p.%0A"></p>
<p>When variables are on very different scales, it is often better to standardize them as well and work with the correlation matrix rather than the covariance matrix. I will write the empirical covariance matrix as</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AS%20=%20%5Cfrac%7B1%7D%7Bn%7DX'X.%0A"></p>
<p>The first principal component loading vector <img src="https://latex.codecogs.com/png.latex?v_1%20%5Cin%20%5Cmathbb%7BR%7D%5Ep"> solves</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Av_1%20=%20%5Carg%5Cmax_%7B%5C%7Cv%5C%7C_2=1%7D%20v'Sv.%0A"></p>
<p>Subsequent components solve the same problem subject to orthogonality constraints. If <img src="https://latex.codecogs.com/png.latex?V_k%20=%20%5Bv_1,%5Cdots,v_k%5D">, the corresponding score matrix is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AZ_k%20=%20XV_k.%0A"></p>
<p>Equivalently, if <img src="https://latex.codecogs.com/png.latex?X%20=%20UDV'"> is the singular value decomposition, the columns of <img src="https://latex.codecogs.com/png.latex?V"> are the loading vectors and the diagonal entries of <img src="https://latex.codecogs.com/png.latex?D%5E2/n"> are the explained variances.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="classical-pca" class="level3">
<h3 class="anchored" data-anchor-id="classical-pca">Classical PCA</h3>
<p>Classical PCA is the benchmark because its optimization problem is clean and its geometry is transparent. The first component is the unit vector that captures the most sample variance; the second is the best such vector orthogonal to the first; and so on. If the singular values of <img src="https://latex.codecogs.com/png.latex?X"> are <img src="https://latex.codecogs.com/png.latex?d_1%20%5Cge%20%5Ccdots%20%5Cge%20d_r">, then the proportion of variance explained by the first <img src="https://latex.codecogs.com/png.latex?k"> components is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7B%5Csum_%7Bj=1%7D%5Ek%20d_j%5E2%7D%7B%5Csum_%7Bj=1%7D%5Er%20d_j%5E2%7D.%0A"></p>
<p>In practice, two issues matter more than the derivation. First, PCA is extremely sensitive to scaling. If one variable is measured in dollars and another in percentages, the dollar variable may dominate the first component unless the data are standardized. Second, variance is not the same thing as signal. A noisy feature with large variance can easily drive the first component. I treat classical PCA as a compression tool, not as an automatic discovery engine.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stats)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-4">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb1-5">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span></span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate a low-rank signal with two latent factors</span></span>
<span id="cb1-8">latent_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-9">loadings_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb1-10">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>,</span>
<span id="cb1-11">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,</span>
<span id="cb1-12">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,</span>
<span id="cb1-13">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,</span>
<span id="cb1-14">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>,</span>
<span id="cb1-15">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>,</span>
<span id="cb1-16"> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>,</span>
<span id="cb1-17">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb1-18">), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nrow =</span> p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">byrow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-19"></span>
<span id="cb1-20">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> latent_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(loadings_true) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>), n, p)</span>
<span id="cb1-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(X) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature_"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>p)</span>
<span id="cb1-22"></span>
<span id="cb1-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Standardize before PCA because variables may be on different scales</span></span>
<span id="cb1-24">pca_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prcomp</span>(X, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale. =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-25"></span>
<span id="cb1-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Explained variance ratio</span></span>
<span id="cb1-27">explained_var <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> pca_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sdev<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(pca_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sdev<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(explained_var[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb1-29"></span>
<span id="cb1-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First two loading vectors</span></span>
<span id="cb1-31"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(pca_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>rotation[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.decomposition <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> PCA</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.preprocessing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> StandardScaler</span>
<span id="cb2-4"></span>
<span id="cb2-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-6">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb2-7">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span></span>
<span id="cb2-8"></span>
<span id="cb2-9">latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb2-10">loadings_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([</span>
<span id="cb2-11">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>],</span>
<span id="cb2-12">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>],</span>
<span id="cb2-13">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>],</span>
<span id="cb2-14">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>],</span>
<span id="cb2-15">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>],</span>
<span id="cb2-16">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>],</span>
<span id="cb2-17">    [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>],</span>
<span id="cb2-18">    [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>],</span>
<span id="cb2-19">])</span>
<span id="cb2-20"></span>
<span id="cb2-21">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> loadings_true.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n, p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb2-22"></span>
<span id="cb2-23">scaler <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> StandardScaler()</span>
<span id="cb2-24">X_std <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> scaler.fit_transform(X)</span>
<span id="cb2-25"></span>
<span id="cb2-26">pca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> PCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb2-27">pca.fit(X_std)</span>
<span id="cb2-28"></span>
<span id="cb2-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explained variance ratio:"</span>, np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(pca.explained_variance_ratio_, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb2-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"First two loading vectors:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(pca.components_[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>].T, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="sparse-pca" class="level3">
<h3 class="anchored" data-anchor-id="sparse-pca">Sparse PCA</h3>
<p>Sparse PCA modifies the loading vectors so that many coordinates are exactly zero. A convenient way to write the idea is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmax_%7B%5C%7Cv%5C%7C_2=1%7D%20v'Sv%0A%5Cqquad%0A%5Ctext%7Bsubject%20to%20%7D%20%5C%7Cv%5C%7C_1%20%5Cle%20c,%0A"></p>
<p>or, equivalently, with an <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty on the loadings. The point is not to improve the mathematics of PCA. The point is to make the components readable.</p>
<p>This matters when <img src="https://latex.codecogs.com/png.latex?p"> is large and the classical loading vectors spread small weight across almost every variable. In genomics, marketing, or text applications, that is often useless from a substantive perspective. Sparse PCA forces each component to be built from a smaller set of variables. The tradeoff is that you lose some variance explained, orthogonality becomes less clean, and the components can be more sensitive to tuning choices. In practice, I reach for Sparse PCA when interpretation matters at least as much as compression.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("elasticnet")</span></span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(elasticnet)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb3-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb3-6">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span></span>
<span id="cb3-7"></span>
<span id="cb3-8">latent_factor <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb3-9">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(</span>
<span id="cb3-10">  latent_factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>),</span>
<span id="cb3-11">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>),</span>
<span id="cb3-12">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>),</span>
<span id="cb3-13">  <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>),</span>
<span id="cb3-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)), n, p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb3-15">)</span>
<span id="cb3-16"></span>
<span id="cb3-17">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>(X)</span>
<span id="cb3-18"></span>
<span id="cb3-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ask for two sparse components with at most 4 nonzero loadings each</span></span>
<span id="cb3-20">spca_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">spca</span>(X, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">K =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"predictor"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sparse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"varnum"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">para =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb3-21"></span>
<span id="cb3-22"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(spca_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>loadings[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.decomposition <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> SparsePCA</span>
<span id="cb4-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.preprocessing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> StandardScaler</span>
<span id="cb4-4"></span>
<span id="cb4-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb4-6">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb4-7">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span></span>
<span id="cb4-8"></span>
<span id="cb4-9">latent_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n)</span>
<span id="cb4-10">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.column_stack([</span>
<span id="cb4-11">    latent_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,</span>
<span id="cb4-12">    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,</span>
<span id="cb4-13">    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,</span>
<span id="cb4-14">    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> latent_factor <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,</span>
<span id="cb4-15">    np.random.randn(n, p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb4-16">])</span>
<span id="cb4-17"></span>
<span id="cb4-18">X_std <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> StandardScaler().fit_transform(X)</span>
<span id="cb4-19"></span>
<span id="cb4-20">spca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SparsePCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, random_state<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb4-21">spca.fit(X_std)</span>
<span id="cb4-22"></span>
<span id="cb4-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sparse loadings:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(spca.components_.T, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="kernel-pca" class="level3">
<h3 class="anchored" data-anchor-id="kernel-pca">Kernel PCA</h3>
<p>Kernel PCA keeps the variance-maximization logic but applies it in a nonlinear feature space. Instead of diagonalizing the covariance matrix of <img src="https://latex.codecogs.com/png.latex?X">, we diagonalize a centered kernel matrix</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AK_%7Bij%7D%20=%20k(x_i,%20x_j),%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?k(%5Ccdot,%5Ccdot)"> might be a radial basis function kernel or a polynomial kernel. PCA is then performed on the centered version of <img src="https://latex.codecogs.com/png.latex?K"> rather than on the original variables.</p>
<p>This is useful when the data lie on a curved manifold rather than near a linear subspace. The classic example is concentric circles: ordinary PCA sees almost no useful low-dimensional linear structure, while Kernel PCA can often unfold the geometry. The price is interpretability. Classical PCA gives loading vectors in the original variables; Kernel PCA gives components in an implicit feature space. In practice, that makes it more of a nonlinear embedding method than a variable-summary tool. It is also sensitive to kernel choice and scale, so I do not treat it as a push-button replacement for standard PCA.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("kernlab")</span></span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(kernlab)</span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb5-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb5-6">angles <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> pi)</span>
<span id="cb5-7">radius <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">each =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span>
<span id="cb5-8"></span>
<span id="cb5-9">X_circle <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(</span>
<span id="cb5-10">  radius <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cos</span>(angles),</span>
<span id="cb5-11">  radius <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sin</span>(angles)</span>
<span id="cb5-12">)</span>
<span id="cb5-13"></span>
<span id="cb5-14">kpca_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">kpca</span>(</span>
<span id="cb5-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> X_circle,</span>
<span id="cb5-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">kernel =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rbfdot"</span>,</span>
<span id="cb5-17">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">kpar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sigma =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb5-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">features =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb5-19">)</span>
<span id="cb5-20"></span>
<span id="cb5-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rotated</span>(kpca_fit))</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb6-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.decomposition <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> KernelPCA</span>
<span id="cb6-3"></span>
<span id="cb6-4">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb6-5">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb6-6">angles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.pi, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n)</span>
<span id="cb6-7">radius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat([<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>], repeats<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span></span>
<span id="cb6-8"></span>
<span id="cb6-9">X_circle <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.column_stack([</span>
<span id="cb6-10">    radius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.cos(angles),</span>
<span id="cb6-11">    radius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.sin(angles),</span>
<span id="cb6-12">])</span>
<span id="cb6-13"></span>
<span id="cb6-14">kpca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> KernelPCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, kernel<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rbf"</span>, gamma<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb6-15">X_embedded <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> kpca.fit_transform(X_circle)</span>
<span id="cb6-16"></span>
<span id="cb6-17"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(X_embedded[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="probabilistic-pca" class="level3">
<h3 class="anchored" data-anchor-id="probabilistic-pca">Probabilistic PCA</h3>
<p>Probabilistic PCA (PPCA) replaces the deterministic projection view with a latent variable model:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ax_i%20=%20%5Cmu%20+%20W%20z_i%20+%20%5Cvarepsilon_i,%0A%5Cqquad%0Az_i%20%5Csim%20N(0,%20I_q),%0A%5Cqquad%0A%5Cvarepsilon_i%20%5Csim%20N(0,%20%5Csigma%5E2%20I_p).%0A"></p>
<p>Here <img src="https://latex.codecogs.com/png.latex?z_i"> is a <img src="https://latex.codecogs.com/png.latex?q">-dimensional latent factor and <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon_i"> is isotropic Gaussian noise. Under maximum likelihood, the estimated subspace coincides with classical PCA in a particular limit, but the formulation buys you something important: a likelihood, uncertainty quantification, and a principled way to deal with missing values.</p>
<p>That makes PPCA attractive when PCA is part of a generative modeling workflow rather than just a preprocessing step. I especially like it when the data matrix has moderate missingness and I do not want to impute first and hope for the best. The main caveat is the isotropic-noise assumption. If feature-specific noise levels differ substantially, PPCA can be too restrictive and factor analysis may be the better model.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("pcaMethods")</span></span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pcaMethods)</span>
<span id="cb7-3"></span>
<span id="cb7-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb7-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">150</span></span>
<span id="cb7-6">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span></span>
<span id="cb7-7"></span>
<span id="cb7-8">latent_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb7-9">loadings_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), p, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb7-10">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> latent_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(loadings_true) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>), n, p)</span>
<span id="cb7-11"></span>
<span id="cb7-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Introduce missing values</span></span>
<span id="cb7-13">missing_index <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(X), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(X))</span>
<span id="cb7-14">X[missing_index] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb7-15"></span>
<span id="cb7-16">ppca_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pca</span>(X, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ppca"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nPcs =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seed =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb7-17"></span>
<span id="cb7-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Completed data and estimated scores</span></span>
<span id="cb7-19">X_completed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">completeObs</span>(ppca_fit)</span>
<span id="cb7-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scores</span>(ppca_fit)</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># pip install ppca-py</span></span>
<span id="cb8-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb8-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> ppca <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> PPCA</span>
<span id="cb8-4"></span>
<span id="cb8-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb8-6">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">150</span></span>
<span id="cb8-7">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span></span>
<span id="cb8-8"></span>
<span id="cb8-9">latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb8-10">loadings_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(p, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb8-11">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> loadings_true.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n, p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span></span>
<span id="cb8-12"></span>
<span id="cb8-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Introduce missing values</span></span>
<span id="cb8-14">missing_mask <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.rand(n, p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.10</span></span>
<span id="cb8-15">X[missing_mask] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.nan</span>
<span id="cb8-16"></span>
<span id="cb8-17">ppca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> PPCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb8-18">ppca.fit(X)</span>
<span id="cb8-19"></span>
<span id="cb8-20">scores, score_cov <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ppca.posterior_latent(X)</span>
<span id="cb8-21">X_imputed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ppca.sample_missing(X, n_draws<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-22"></span>
<span id="cb8-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated noise variance:"</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(ppca.noise_variance_, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb8-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"First five latent scores:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(scores[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="truncated-pca" class="level3">
<h3 class="anchored" data-anchor-id="truncated-pca">Truncated PCA</h3>
<p>This last flavor is a little different. Truncated PCA does not change the statistical target. It changes the computation. Instead of computing the full singular value decomposition, we directly approximate the top <img src="https://latex.codecogs.com/png.latex?k"> singular vectors:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AX%20%5Capprox%20U_k%20D_k%20V_k'.%0A"></p>
<p>When <img src="https://latex.codecogs.com/png.latex?n"> and <img src="https://latex.codecogs.com/png.latex?p"> are large, or when <img src="https://latex.codecogs.com/png.latex?X"> is sparse, that distinction matters a lot. If all you want are the first few components, computing the full decomposition is wasted effort.</p>
<p>For practitioners, this is often the most useful PCA variant of all because it makes the classical method scale. The catch is conceptual rather than mathematical: randomized or truncated PCA is not discovering a different notion of component. It is approximating the same principal subspace more cheaply. If the approximation error is small, great. If not, you have a computational shortcut, not a new estimator.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("irlba")</span></span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(irlba)</span>
<span id="cb9-3"></span>
<span id="cb9-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb9-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb9-6">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb9-7"></span>
<span id="cb9-8">latent_factors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-9">loadings_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), p, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-10">X_large <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> latent_factors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(loadings_true) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>), n, p)</span>
<span id="cb9-11"></span>
<span id="cb9-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fast approximation to the first 5 principal components</span></span>
<span id="cb9-13">pca_fast <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prcomp_irlba</span>(X_large, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale. =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb9-14"></span>
<span id="cb9-15">pca_fast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sdev<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(pca_fast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sdev<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb10-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.decomposition <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> PCA</span>
<span id="cb10-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.preprocessing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> StandardScaler</span>
<span id="cb10-4"></span>
<span id="cb10-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb10-6">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb10-7">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb10-8"></span>
<span id="cb10-9">latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb10-10">loadings_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(p, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb10-11">X_large <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> latent_factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> loadings_true.T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n, p) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb10-12"></span>
<span id="cb10-13">X_large <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> StandardScaler().fit_transform(X_large)</span>
<span id="cb10-14"></span>
<span id="cb10-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Randomized SVD computes an approximate leading subspace</span></span>
<span id="cb10-16">pca_fast <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> PCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, svd_solver<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"randomized"</span>, random_state<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb10-17">pca_fast.fit(X_large)</span>
<span id="cb10-18"></span>
<span id="cb10-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(pca_fast.explained_variance_ratio_, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Classical PCA is a variance-maximizing compression tool, not a generic device for finding the “most important” variables or latent causes.</li>
<li>Sparse PCA is the right upgrade when interpretability matters and dense loading vectors are getting in the way.</li>
<li>Kernel PCA is useful for nonlinear geometry, but you give up the clean loading-vector interpretation that makes ordinary PCA attractive.</li>
<li>Probabilistic PCA is worth using when likelihood, uncertainty, or missing data matter; otherwise classical PCA is usually simpler.</li>
<li>Truncated PCA is often the most practical choice on large matrices because it targets the same principal subspace at a much lower computational cost.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For the classical theory, Jolliffe’s <em>Principal Component Analysis</em> is still the standard reference and Jolliffe and Cadima (2016) is a concise modern review. Zou, Hastie, and Tibshirani (2006) is the canonical sparse PCA paper. Schölkopf, Smola, and Müller (1998) remains the core reference for Kernel PCA, while Tipping and Bishop (1999) is the paper to read for the probabilistic view. If your main concern is computation at scale, Halko, Martinsson, and Tropp (2011) is the right randomized linear algebra entry point.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Halko, N., Martinsson, P. G., &amp; Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. <em>SIAM Review</em>, 53(2), 217-288.</p>
<p>Jolliffe, I. T., &amp; Cadima, J. (2016). Principal component analysis: A review and recent developments. <em>Philosophical Transactions of the Royal Society A</em>, 374(2065), 20150202.</p>
<p>Schölkopf, B., Smola, A., &amp; Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. <em>Neural Computation</em>, 10(5), 1299-1319.</p>
<p>Tipping, M. E., &amp; Bishop, C. M. (1999). Probabilistic principal component analysis. <em>Journal of the Royal Statistical Society: Series B</em>, 61(3), 611-622.</p>
<p>Zou, H., Hastie, T., &amp; Tibshirani, R. (2006). Sparse principal component analysis. <em>Journal of Computational and Graphical Statistics</em>, 15(2), 265-286.</p>


</section>

 ]]></description>
  <category>machine learning</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-pca.html</guid>
  <pubDate>Sun, 05 Apr 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>Brief Overview of Treatment Effect Bounds</title>
  <link>https://vyasenov.github.io/blog/treatment-effects-bounds.html</link>
  <description><![CDATA[ 





<div class="reading-time">7 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>In applied causal work, the real problem is often not estimation but identification. Attrition, imperfect take-up, endogenous selection, and missing outcomes can all make the average treatment effect impossible to point-identify from the data at hand. In those settings, a precise estimate is not a sign of rigor. It is usually a sign that strong assumptions have been smuggled in.</p>
<p>Bounding methods take a more honest route. Rather than asking for the exact value of a treatment effect, they ask which values remain consistent with the observed data and a stated set of assumptions. The answer is an interval, not a point. That interval may be wide, but its width is itself informative: it tells you how much the design really buys you before additional structure is imposed.</p>
<p>This is why I think treatment effect bounds are worth knowing even for practitioners who usually work with point estimators. They are useful both as primary estimands and as a diagnostic. If weak-assumption bounds are already tight, your design is doing real work. If they are wide, that is a warning against overconfident causal claims.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>For each unit <img src="https://latex.codecogs.com/png.latex?i">, let <img src="https://latex.codecogs.com/png.latex?Y_i(1)"> and <img src="https://latex.codecogs.com/png.latex?Y_i(0)"> denote the potential outcomes under treatment and control, and let <img src="https://latex.codecogs.com/png.latex?D_i%20%5Cin%20%5C%7B0,1%5C%7D"> be the treatment indicator. When needed, I use <img src="https://latex.codecogs.com/png.latex?Z"> for an ordered instrument or covariate. The observed outcome is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20D_i%20Y_i(1)%20+%20(1-D_i)Y_i(0).%0A"></p>
<p>The target parameter is the average treatment effect</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctau%20=%20%5Cmathbb%7BE%7D%5BY(1)-Y(0)%5D.%0A"></p>
<p>When <img src="https://latex.codecogs.com/png.latex?%5Ctau"> is not point-identified, the object of interest becomes an identified set</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctau%20%5Cin%20%5B%5Cunderline%7B%5Ctau%7D,%20%5Coverline%7B%5Ctau%7D%5D,%0A"></p>
<p>where the endpoints depend on the observed distribution and the maintained assumptions. A bound is <em>sharp</em> if every value in that interval is attainable under some data-generating process consistent with those assumptions. Sharp is always good!</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="manski-bounds" class="level3">
<h3 class="anchored" data-anchor-id="manski-bounds">Manski Bounds</h3>
<p>Manski (1990) is the natural starting point because it assumes almost nothing beyond bounded outcomes. Suppose <img src="https://latex.codecogs.com/png.latex?Y%20%5Cin%20%5By_%7B%5Cmin%7D,%20y_%7B%5Cmax%7D%5D">, let <img src="https://latex.codecogs.com/png.latex?p%20=%20%5Cmathbb%7BP%7D(D=1)">, and define</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmu_1%20=%20%5Cmathbb%7BE%7D(Y%20%5Cmid%20D=1),%20%5Cqquad%20%5Cmu_0%20=%20%5Cmathbb%7BE%7D(Y%20%5Cmid%20D=0).%0A"></p>
<p>Then the missing counterfactual means satisfy</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5BY(1)%5D%20%5Cin%20%5Cleft%5Bp%5Cmu_1%20+%20(1-p)y_%7B%5Cmin%7D,%20%5C;%20p%5Cmu_1%20+%20(1-p)y_%7B%5Cmax%7D%5Cright%5D%0A"></p>
<p>and</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5BY(0)%5D%20%5Cin%20%5Cleft%5B(1-p)%5Cmu_0%20+%20py_%7B%5Cmin%7D,%20%5C;%20(1-p)%5Cmu_0%20+%20py_%7B%5Cmax%7D%5Cright%5D.%0A"></p>
<p>Combining them gives sharp bounds on the ATE:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctau%20%5Cin%20%5Cleft%5B%0A%5Cunderline%7B%5Ctau%7D,%20%5Coverline%7B%5Ctau%7D%0A%5Cright%5D.%0A"></p>
<p>where</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cunderline%7B%5Ctau%7D%20=%20p%5Cmu_1%20-%20(1-p)%5Cmu_0%20+%20(1-p)y_%7B%5Cmin%7D%20-%20py_%7B%5Cmax%7D%0A"></p>
<p>and</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Coverline%7B%5Ctau%7D%20=%20p%5Cmu_1%20-%20(1-p)%5Cmu_0%20+%20(1-p)y_%7B%5Cmax%7D%20-%20py_%7B%5Cmin%7D.%0A"></p>
<p>These bounds are usually wide, and that is exactly the point. Manski bounds tell you what the data alone can support before you add structure. In practice, I treat them as the baseline honesty check.</p>
</section>
<section id="tightening-manski-mtr-mts-and-miv" class="level3">
<h3 class="anchored" data-anchor-id="tightening-manski-mtr-mts-and-miv">Tightening Manski: MTR, MTS, and MIV</h3>
<p>The usual next step is to ask whether credible qualitative restrictions can narrow the interval. Manski and Pepper (2000) study three of the most useful ones. My first job market paper as a PhD candidate employed these restrictions to tighten the Manski bounds in the context of the labor market impact of immigration.</p>
<p>First, under <em>Monotone Treatment Response (MTR)</em>, treatment weakly helps everyone: <img src="https://latex.codecogs.com/png.latex?Y(1)%20%5Cge%20Y(0)%20%5Ctext%7B%20for%20every%20unit%20%7D."></p>
<p>MTR tightens the bounds by ruling out any configuration in which treatment hurts some units, so the lower bound rises and negative treatment effects become harder or impossible to sustain. For example, under MTR, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BY(1)%5Cmid%20D=0%5D"> cannot be below <img src="https://latex.codecogs.com/png.latex?%5Cmu_0"> (each control’s missing <img src="https://latex.codecogs.com/png.latex?Y(1)"> is at least that unit’s observed <img src="https://latex.codecogs.com/png.latex?Y(0)">), not merely <img src="https://latex.codecogs.com/png.latex?y_%7B%5Cmin%7D">; and <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BY(0)%5Cmid%20D=1%5D"> cannot exceed <img src="https://latex.codecogs.com/png.latex?%5Cmu_1">.</p>
<p>Second, under <em>Monotone Treatment Selection (MTS)</em>, treated units are systematically stronger than untreated units in terms of their potential outcomes. MTS tightens the bounds by imposing an ordering on who selects into treatment, so the observed outcomes in one group become informative about the missing potential outcomes in the other. For example, under MTS, <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BY(0)%5Cmid%20D=1%5D"> is bounded below by <img src="https://latex.codecogs.com/png.latex?%5Cmu_0">, not merely <img src="https://latex.codecogs.com/png.latex?y_%7B%5Cmin%7D">.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5BY(d)%5Cmid%20D=1%5D%20%5Cge%20%5Cmathbb%7BE%7D%5BY(d)%5Cmid%20D=0%5D,%20%5Cqquad%20d%20%5Cin%20%5C%7B0,1%5C%7D.%0A"></p>
<p>Third, under a <em>Monotone Instrumental Variable (MIV)</em> assumption, an ordered variable <img src="https://latex.codecogs.com/png.latex?Z"> shifts potential outcomes in a known direction:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmathbb%7BE%7D%5BY(d)%5Cmid%20Z=z_1%5D%20%5Cle%20%5Cmathbb%7BE%7D%5BY(d)%5Cmid%20Z=z_2%5D%20%5Cquad%20%5Ctext%7Bfor%20%7D%20z_1%20%5Cle%20z_2,%5C%20d%20%5Cin%20%5C%7B0,1%5C%7D.%0A"></p>
<p>In words, MIV lets us use the ordering in <img src="https://latex.codecogs.com/png.latex?Z"> to intersect bounds across instrument values, which can noticeably shrink the identified set. These assumptions get more powerful as the data scientist combines them together. In some cases, the resulting interval can be informative.</p>
</section>
<section id="balke-pearl-bounds-for-noncompliance" class="level3">
<h3 class="anchored" data-anchor-id="balke-pearl-bounds-for-noncompliance">Balke-Pearl Bounds for Noncompliance</h3>
<p>Balke and Pearl (1997) address randomized assignment with imperfect compliance. Instead of jumping directly to LATE under exclusion and monotonicity, they ask a broader question: what does the observed joint distribution of <img src="https://latex.codecogs.com/png.latex?(Y,D,Z)"> imply about the population treatment effect under weaker assumptions?</p>
<p>The answer is a sharp nonparametric bound obtained by optimizing over all latent compliance-response types consistent with the observed data:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7Bq%20%5Cin%20%5Cmathcal%7BQ%7D(P_%7BYDZ%7D)%7D%20%5Cmathbb%7BE%7D_q%5BY(1)-Y(0)%5D%0A%5C;%5Cle%5C;%0A%5Ctau%0A%5C;%5Cle%5C;%0A%5Cmax_%7Bq%20%5Cin%20%5Cmathcal%7BQ%7D(P_%7BYDZ%7D)%7D%20%5Cmathbb%7BE%7D_q%5BY(1)-Y(0)%5D.%0A"></p>
<p>This is best viewed as a separation between what the experiment identifies and what extra assumptions identify. Balke-Pearl bounds are often much wider than a LATE estimate, but they answer a different question. LATE is a point-identified effect for compliers under stronger structure. Balke-Pearl bounds are partial-identification statements about broader causal quantities. When the policy question is about the full eligible population rather than compliers, that distinction matters.</p>
</section>
<section id="lee-bounds-for-sample-selection" class="level3">
<h3 class="anchored" data-anchor-id="lee-bounds-for-sample-selection">Lee Bounds for Sample Selection</h3>
<p>Lee (2009) is the method I see most often in practice because the intuition is so transparent. Suppose treatment is randomized, but outcomes are only observed for selected units. Wages observed only for employed workers is the canonical example. If treatment changes employment, comparing observed wages across treatment arms is contaminated by selection.</p>
<p>Lee’s key assumption is <em>monotone selection</em>: treatment can move selection in only one direction for every unit. If treatment raises the probability of observation, then the treated group contains some “extra” observed units relative to control. Those units must be trimmed away from one tail or the other of the treated outcome distribution.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?S"> indicate whether the outcome is observed and suppose <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BP%7D(S=1%20%5Cmid%20D=1)%20%3E%20%5Cmathbb%7BP%7D(S=1%20%5Cmid%20D=0)">. The excess selected share in the treated group is</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cpi%20=%20%5Cfrac%7B%5Cmathbb%7BP%7D(S=1%20%5Cmid%20D=1)%20-%20%5Cmathbb%7BP%7D(S=1%20%5Cmid%20D=0)%7D%7B%5Cmathbb%7BP%7D(S=1%20%5Cmid%20D=1)%7D.%0A"></p>
<p>Trimming a fraction <img src="https://latex.codecogs.com/png.latex?%5Cpi"> from the upper tail gives one bound; trimming it from the lower tail gives the other.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Compute the selection rate in each treatment arm.</li>
<li>Identify the arm with the higher selection rate.</li>
<li>Trim the excess share from one tail and then the other of that arm’s observed outcome distribution.</li>
<li>Compare the trimmed means to the mean outcome in the arm with the lower selection rate.</li>
</ol>
</div>
</div>
<p>I like Lee bounds because they are easy to explain and easy to audit. The practical warning is equally simple: if treatment plausibly pushes some units into the sample and others out, the monotone-selection logic breaks. <!-- 
### MTE-Based Extrapolation

Kowalski (2016) is slightly different in spirit. She presented this paper while I was in graduate school, so the idea stuck with me. The goal is not just to bound treatment effects under weak assumptions, but to say more about populations beyond compliers by adding structure through the marginal treatment effect framework. If $u$ indexes resistance to treatment take-up and $p(z)=\mathbb{P}(D=1\mid Z=z)$, then

$$
\text{ATE} = \int_0^1 \text{MTE}(u)\,du,
\qquad
\text{LATE}(z,z') = \frac{1}{p(z)-p(z')} \int_{p(z')}^{p(z)} \text{MTE}(u)\,du.
$$

The intuition is that standard IV identifies an average treatment effect only over the slice of the selection margin moved by the instrument. With additional structure, you can extrapolate beyond that slice. This can be useful when the policy question is about always-takers, never-takers, or external validity more broadly. I would not treat this as a first-line bounding strategy, but it is a useful next step once the limits of LATE become the real issue.
--> ## Bottom Line</p>
<ul>
<li>Bounds are not a consolation prize. They are the right estimand when the data do not support point identification.</li>
<li>Manski bounds are the benchmark because they show what your design identifies before assumptions start doing the heavy lifting.</li>
<li>Monotonicity restrictions, Lee trimming, and Balke-Pearl bounds can be very informative, but only when their substantive assumptions are defensible.</li>
<li>Wide bounds are often the most important empirical result in the paper because they reveal how little the design alone can rule out.</li>
</ul>
</section>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a broad introduction, I would start with Manski’s <em>Partial Identification of Probability Distributions</em>, which remains the cleanest entry point into the logic of identification regions. Manski and Pepper (2000) is the canonical reference for monotone restrictions such as MTR and MIV. Balke and Pearl (1997) is still the core paper for noncompliance bounds, while Lee (2009) is the practical workhorse for attrition and sample selection. <!-- Kowalski (2016) is useful once the conversation shifts from partial identification toward extrapolating beyond compliers. --></p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Balke, A., &amp; Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. <em>Journal of the American Statistical Association</em>, 92(439), 1171-1176.</p>
<p>Kowalski, A. E. (2016). Doing more when you’re running LATE: Applying marginal treatment effect methods to examine treatment effect heterogeneity in experiments. <em>American Economic Journal: Applied Economics</em>, 8(2), 1-17.</p>
<p>Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. <em>Review of Economic Studies</em>, 76(3), 1071-1102.</p>
<p>Manski, C. F. (1990). Nonparametric bounds on treatment effects. <em>American Economic Review</em>, 80(2), 319-323.</p>
<p>Manski, C. F. (2003). <em>Partial Identification of Probability Distributions</em>. Springer.</p>
<p>Manski, C. F., &amp; Pepper, J. V. (2000). Monotone instrumental variables: With an application to the returns to schooling. <em>Econometrica</em>, 68(4), 997-1010.</p>


</section>

 ]]></description>
  <category>causal inference</category>
  <guid>https://vyasenov.github.io/blog/treatment-effects-bounds.html</guid>
  <pubDate>Thu, 02 Apr 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>What OLS Estimates in Causal Inference</title>
  <link>https://vyasenov.github.io/blog/interpret-OLS-causal-inference.html</link>
  <description><![CDATA[ 





<div class="reading-time">7 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>OLS is still the default causal estimator in a surprising amount of applied work. That is often understandable. Regression is simple, transparent, and often a reasonable first pass. The problem is interpretation. Once we move beyond randomized experiments with additive constant effects, the coefficient on treatment is not automatically the average treatment effect (ATE), or even an average treatment effect for a population we care about.</p>
<p>What makes this topic tricky is that there are really two separate questions. First, what population quantity does the OLS coefficient target? Second, under what assumptions can that quantity be interpreted causally? OLS itself does not assume a potential outcomes framework. It solves a least-squares projection problem. Potential outcomes enter only when we try to map that projection coefficient to objects like the ATE, ATT, or ATU.</p>
<p>Several somewhat related papers sharpen this distinction. This note provides a brief overview of some of the key developments in our understanding of OLS in causal inference. Taken together, these results explain both why OLS can be useful and why its causal interpretation is often more delicate than practitioners realize.</p>
<!-- Aronow and Samii (2016) show that regression adjustment can be understood as targeting a weighted causal estimand, but their characterization is asymptotic. Chattopadhyay and Zubizarreta (2023) derive exact finite-sample implied weights. Słoczyński (2022) shows that with heterogeneous treatment effects the OLS coefficient becomes a weighted average of group-specific effects, often interpretable as effects for treated and untreated units, with the smaller group receiving more weight. 
-->
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?Y_i"> be the observed outcome, <img src="https://latex.codecogs.com/png.latex?D_i%20%5Cin%20%5C%7B0,1%5C%7D"> a treatment indicator, and <img src="https://latex.codecogs.com/png.latex?X_i"> a vector of covariates. Potential outcomes are <img src="https://latex.codecogs.com/png.latex?Y_i(1)"> and <img src="https://latex.codecogs.com/png.latex?Y_i(0)">, so</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20D_iY_i(1)%20+%20(1-D_i)Y_i(0).%0A"></p>
<p>Define the conditional mean functions</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Am_d(x)=%5Cmathbb%7BE%7D%5BY(d)%5Cmid%20X=x%5D,%20%5Cqquad%20%5Ctau(x)=m_1(x)-m_0(x),%0A"></p>
<p>and the usual causal targets</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BATE%7D%20=%20%5Cmathbb%7BE%7D%5B%5Ctau(X)%5D,%20%5Cqquad%20%5Ctext%7BATT%7D%20=%20%5Cmathbb%7BE%7D%5B%5Ctau(X)%5Cmid%20D=1%5D,%20%5Cqquad%20%5Ctext%7BATU%7D%20=%20%5Cmathbb%7BE%7D%5B%5Ctau(X)%5Cmid%20D=0%5D.%0A"></p>
<p>Now consider the linear regression</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20%5Calpha%20+%20%5Ctau_%7B%5Ctext%7BOLS%7D%7D%20D_i%20+%20X_i'%5Cbeta%20+%20u_i.%0A"></p>
<p>The coefficient <img src="https://latex.codecogs.com/png.latex?%5Ctau_%7B%5Ctext%7BOLS%7D%7D"> is the population linear projection coefficient on <img src="https://latex.codecogs.com/png.latex?D">. By Frisch-Waugh-Lovell,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctau_%7B%5Ctext%7BOLS%7D%7D%20=%20%5Cfrac%7B%5Cmathbb%7BE%7D%5BV_iY_i%5D%7D%7B%5Cmathbb%7BE%7D%5BV_iD_i%5D%7D,%0A%5Cqquad%0AV_i%20=%20D_i%20-%20%5Cmathbb%7BL%7D(D_i%5Cmid%20X_i),%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BL%7D(D_i%5Cmid%20X_i)"> is the best linear predictor of <img src="https://latex.codecogs.com/png.latex?D_i"> using <img src="https://latex.codecogs.com/png.latex?X_i">. This expression is purely statistical.</p>
<p>The causal question is whether <img src="https://latex.codecogs.com/png.latex?%5Ctau_%7B%5Ctext%7BOLS%7D%7D"> coincides with a treatment effect parameter under additional assumptions.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="regression-is-a-projection-not-a-causal-model" class="level3">
<h3 class="anchored" data-anchor-id="regression-is-a-projection-not-a-causal-model">Regression Is a Projection, Not a Causal Model</h3>
<p>This is the first point I would emphasize in practice. Writing down</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20%5Calpha%20+%20%5Ctau%20D_i%20+%20X_i'%5Cbeta%20+%20u_i%0A"></p>
<p>does not, by itself, assume homogeneous treatment effects or even invoke potential outcomes. It simply defines the best linear predictor of <img src="https://latex.codecogs.com/png.latex?Y"> given <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X">. If the goal is prediction, that is the end of the story.</p>
<p>For causal interpretation, however, we need more. Under random assignment or selection on observables, plus enough structure on how outcomes vary with <img src="https://latex.codecogs.com/png.latex?X">, the projection coefficient may line up with a causal estimand. Under constant treatment effects and correct linear adjustment, that estimand is often the ATE. Once treatment effects vary with <img src="https://latex.codecogs.com/png.latex?X">, the coefficient generally becomes a weighted average of heterogeneous effects rather than the plain sample average.</p>
</section>
<section id="aronow-and-samii-asymptotic-view" class="level3">
<h3 class="anchored" data-anchor-id="aronow-and-samii-asymptotic-view">Aronow and Samii: Asymptotic View</h3>
<p>Aronow and Samii (2016) show that regression-adjusted estimators need not be representative of the sample as a whole. In large samples, the estimand targeted by regression can be written as a weighted average of conditional treatment effects, where the weights depend on how treatment assignment varies with covariates and on the linear adjustment built into the regression.</p>
<p>The key practical point is that OLS does not weight covariate strata equally. These weights are proportional to residualized treatment variation (via FWL), not to the precision of outcome estimates. In particular, they do not correspond to inverse-variance weights in general. So even under ignorability, the regression coefficient need not correspond to the ATE for the empirical covariate distribution. It is often better understood as an ATE for an implicit reweighted population. That is a subtle point, but it matters whenever overlap is uneven or the linear model fits some regions of the covariate space much better than others.</p>
</section>
<section id="chattopadhyay-and-zubizarreta-finite-sample-view" class="level3">
<h3 class="anchored" data-anchor-id="chattopadhyay-and-zubizarreta-finite-sample-view">Chattopadhyay and Zubizarreta: Finite-Sample View</h3>
<p>One limitation of the Aronow-Samii perspective is that it is asymptotic. Chattopadhyay and Zubizarreta (2023) go further by showing that common linear regression estimators admit exact finite-sample weighting representations. For a regression-adjusted ATE estimator,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Ctau%7D_%7B%5Ctext%7BOLS%7D%7D%20=%20%5Csum_%7Bi:D_i=1%7D%20w_i%5E%7B(1)%7DY_i%20-%20%5Csum_%7Bi:D_i=0%7D%20w_i%5E%7B(0)%7DY_i,%0A"></p>
<p>where the weights are functions of only <img src="https://latex.codecogs.com/png.latex?D"> and <img src="https://latex.codecogs.com/png.latex?X">, not the realized outcomes.</p>
<p>This is useful for two reasons. First, it makes regression adjustment look less mysterious: OLS is implicitly constructing a weighted comparison between treated and control outcomes. Second, the implied weights can be inspected directly. In their framework, the weights clarify when regression adjustment achieves exact balance on included covariates, how dispersed the weights are, and whether the regression is targeting a population that still looks like the observed sample. That is a much more practical diagnostic than simply reporting a coefficient table.</p>
</section>
<section id="słoczyński-heterogeneous-effects-view" class="level3">
<h3 class="anchored" data-anchor-id="słoczyński-heterogeneous-effects-view">Słoczyński: Heterogeneous Effects View</h3>
<p>Słoczyński (2022) asks what the OLS coefficient means when treatment effects are heterogeneous. His central result is that the coefficient on treatment is generally not the ATE. Instead, it is a convex combination of two group-specific effect parameters that, under additional conditions, can be interpreted as the ATT and the ATU. The striking part is the weighting: the smaller treatment arm gets the larger implicit weight.</p>
<p>So if treated units are rare, OLS tends to lean toward effects for treated units. If treated units are common, it leans toward effects for untreated units. The exact formula depends on the specification and on how treatment assignment varies with covariates, but the qualitative message is robust: heterogeneity changes the target, and OLS can overweight the effect for the smaller group.</p>
<p>This is one of those results that sounds surprising at first and obvious in hindsight. Regression learns treatment effects from residual variation in treatment status. When one group is small, comparisons involving that group carry disproportionate identifying content. The practical implication is straightforward: if you care specifically about the ATE or ATT, you should not assume OLS is giving it to you just because the regression includes controls.</p>
</section>
<section id="angrist-and-pischke-saturated-model-view" class="level3">
<h3 class="anchored" data-anchor-id="angrist-and-pischke-saturated-model-view">Angrist and Pischke: Saturated Model View</h3>
<p>The cleanest interpretation of regression comes from saturated models with discrete covariates, an approach emphasized by Angrist and coauthors. If <img src="https://latex.codecogs.com/png.latex?X"> takes only a small number of values and the regression fully saturates those cells, then OLS is just averaging within-cell treatment-control differences. In that case, regression is a dressed-up version of exact matching.</p>
<p>That perspective is helpful because it shows where the causal content comes from. The coefficient is credible when comparisons are being made within genuinely comparable covariate cells. But it also shows the limitation immediately: with continuous or high-dimensional covariates, literal saturation is impossible and the argument breaks down. At that point, OLS is no longer exact within-cell adjustment. It is a parametric approximation that extrapolates across covariate values. That is often reasonable, but it is no longer harmless.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>OLS does not inherently estimate a causal effect. It estimates a linear projection coefficient that becomes causal only under additional assumptions.</li>
<li>Aronow and Samii show that regression adjustment targets a weighted causal estimand in large samples rather than automatically targeting the sample ATE.</li>
<li>Chattopadhyay and Zubizarreta make this weighting interpretation exact in finite samples and turn it into a useful diagnostic tool.</li>
<li>With heterogeneous treatment effects, Słoczyński shows that OLS becomes a weighted average of group-specific effects, often interpretable as ATT- and ATU-type objects, and the smaller treatment arm gets more weight.</li>
<li>Saturated regressions with discrete covariates are the clean benchmark. With continuous <img src="https://latex.codecogs.com/png.latex?X">, standard OLS necessarily relies on approximation and implicit weighting.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>Aronow and Samii (2016) is the right place to start if you want the representativeness argument behind regression adjustment. Chattopadhyay and Zubizarreta (2023) is the most useful paper for understanding exact implied weights in finite samples. Słoczyński (2022) is now the canonical reference on how heterogeneous treatment effects distort the interpretation of the OLS coefficient. For the saturated-regression perspective, I would still point readers to Angrist and Pischke (2009), which makes clear why exact matching logic breaks down once covariates become continuous.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Angrist, J. D., &amp; Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist’s companion. <em>Princeton university press</em>.</p>
<p>Aronow, P. M., &amp; Samii, C. (2016). Does regression produce representative estimates of causal effects? <em>American Journal of Political Science</em>, 60(1), 250-267.</p>
<p>Chattopadhyay, A., &amp; Zubizarreta, J. R. (2023). On the implied weights of linear regression for causal inference. <em>Biometrika</em>, 110(3), 615-629.</p>
<p>Słoczyński, T. (2022). Interpreting OLS estimands when treatment effects are heterogeneous: Smaller groups get larger weights. <em>Review of Economics and Statistics</em>, 104(3), 501-509.</p>


</section>

 ]]></description>
  <category>causal inference</category>
  <category>parametric models</category>
  <guid>https://vyasenov.github.io/blog/interpret-OLS-causal-inference.html</guid>
  <pubDate>Wed, 01 Apr 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Lasso</title>
  <link>https://vyasenov.github.io/blog/flavors-lasso.html</link>
  <description><![CDATA[ 





<div class="reading-time">9 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>The Lasso (Least Absolute Shrinkage and Selection Operator), introduced by Tibshirani in 1996, has become one of the go-to tools for variable selection and shrinkage in regression problems. But the classic Lasso is just the starting point. Over the years, researchers have developed many variants of Lasso, each designed to address specific limitations or tailor the method to different kinds of data structures.</p>
<p>This article provides a tour of the most popular flavors of Lasso — from standard <img src="https://latex.codecogs.com/png.latex?%5Cell_1">-penalized regression to modern adaptations like Adaptive Lasso, Elastic Net, Square-root Lasso, and more. For each version, I’ll lay out the objective function, describe when it’s applicable, and summarize its key characteristics.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Before diving into the variants, let’s revisit what makes Lasso special. In a standard linear regression setup, we model <img src="https://latex.codecogs.com/png.latex?y%20=%20X%5Cbeta%20+%20%5Cepsilon,"></p>
<p>where:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?y"> is the outcome,</li>
<li><img src="https://latex.codecogs.com/png.latex?X"> is our design matrix,</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cbeta"> are the coefficients, and</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> is the error term.</li>
</ul>
<p>Traditional ordinary least squares (OLS) minimizes the sum of squared residuals without any constraint on the coefficients.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="standard-lasso" class="level3">
<h3 class="anchored" data-anchor-id="standard-lasso">Standard Lasso</h3>
<p>The standard Lasso solves the following optimization problem: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B2n%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda%20%5C%7C%20%5Cbeta%20%5C%7C_1%20%5Cright)%0A"></p>
<p>The appeal of Lasso is straightforward: it trades a convex penalty for exact zeros in the solution. In moderately high dimensions, this often works surprisingly well as a first pass.</p>
<p>The main issue shows up when predictors are correlated. Lasso will typically pick one variable from a correlated group and ignore the rest, and which one it picks can be unstable across folds or small perturbations of the data. At the same time, all coefficients are shrunk, including the large ones, which introduces bias that doesn’t go away even with large samples.</p>
<p>In practice, I treat standard Lasso as a baseline rather than a final model. If it’s stable and predictive, great. If not, it’s usually pointing to a structural issue in the design.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate data</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb1-6">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span></span>
<span id="cb1-7">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p), n, p)</span>
<span id="cb1-8">beta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Only 3 non-zero coefficients</span></span>
<span id="cb1-9">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> beta_true <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit standard Lasso</span></span>
<span id="cb1-12">lasso_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># alpha = 1 for Lasso</span></span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Cross-validation to select lambda</span></span>
<span id="cb1-15">cv_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-16">lambda_opt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> cv_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lambda.min</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get coefficients at optimal lambda</span></span>
<span id="cb1-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Lasso, LassoCV</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_regression</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate data</span></span>
<span id="cb2-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-7">X, y, coef_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_regression(n_samples<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, n_features<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, </span>
<span id="cb2-8">                                   n_informative<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, coef<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, </span>
<span id="cb2-9">                                   noise<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, random_state<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Lasso with cross-validation</span></span>
<span id="cb2-12">lasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LassoCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, random_state<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb2-13">lasso.fit(X, y)</span>
<span id="cb2-14"></span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Display results</span></span>
<span id="cb2-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Optimal lambda: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>lasso<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>alpha_<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-17"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of non-zero coefficients: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(lasso.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-18"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Selected coefficients:</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>lasso<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coef_[lasso.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="adaptive-lasso" class="level3">
<h3 class="anchored" data-anchor-id="adaptive-lasso">Adaptive Lasso</h3>
<p>Adaptive Lasso extends the standard Lasso by using data-driven weights for each coefficient: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B2n%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda%20%5Csum_%7Bj=1%7D%5Ep%20w_j%20%7C%20%5Cbeta_j%20%7C%20%5Cright)%0A"> where <img src="https://latex.codecogs.com/png.latex?w_j%20=%201%20/%20%7C%5Chat%7B%5Cbeta%7D_j%5E%7B%5Ctext%7Binit%7D%7D%7C%5E%5Cgamma"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_j%5E%7B%5Ctext%7Binit%7D%7D"> comes from an initial estimator like OLS or Ridge.</p>
<p>The idea here is to penalize coefficients unevenly. Variables that look important in a first-stage model get penalized less, while weaker ones get pushed harder toward zero. This reduces the bias on large coefficients and improves variable selection consistency under certain conditions.</p>
<p>In practice, Adaptive Lasso is less about prediction and more about recovering a meaningful support. If you care about which variables are selected—not just the predictive accuracy—it’s often worth the extra step.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Continue from previous example</span></span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Get initial estimates using Ridge</span></span>
<span id="cb3-5">ridge_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># alpha = 0 for Ridge</span></span>
<span id="cb3-6">cv_ridge <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb3-7">beta_init <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.vector</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_ridge, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>))[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove intercept</span></span>
<span id="cb3-8"></span>
<span id="cb3-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Compute adaptive weights</span></span>
<span id="cb3-10">gamma <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Common choice</span></span>
<span id="cb3-11">weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(beta_init) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-8</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span>gamma  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add small constant to avoid division by zero</span></span>
<span id="cb3-12"></span>
<span id="cb3-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Fit Adaptive Lasso</span></span>
<span id="cb3-14">adaptive_lasso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty.factor =</span> weights)</span>
<span id="cb3-15">cv_adaptive <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty.factor =</span> weights)</span>
<span id="cb3-16"></span>
<span id="cb3-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare coefficients</span></span>
<span id="cb3-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Standard Lasso non-zero:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adaptive Lasso non-zero:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_adaptive, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Ridge, Lasso</span>
<span id="cb4-2"></span>
<span id="cb4-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Get initial estimates using Ridge</span></span>
<span id="cb4-4">ridge <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Ridge(alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>)</span>
<span id="cb4-5">ridge.fit(X, y)</span>
<span id="cb4-6">beta_init <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ridge.coef_</span>
<span id="cb4-7"></span>
<span id="cb4-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Compute adaptive weights</span></span>
<span id="cb4-9">gamma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-10">weights <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(beta_init) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-8</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>gamma</span>
<span id="cb4-11"></span>
<span id="cb4-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Fit Adaptive Lasso (manual implementation via weighted penalty)</span></span>
<span id="cb4-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Scale features by weights</span></span>
<span id="cb4-14">X_weighted <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> weights</span>
<span id="cb4-15"></span>
<span id="cb4-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Lasso on weighted features</span></span>
<span id="cb4-17">adaptive_lasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Lasso(alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)</span>
<span id="cb4-18">adaptive_lasso.fit(X_weighted, y)</span>
<span id="cb4-19"></span>
<span id="cb4-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Transform back to original scale</span></span>
<span id="cb4-21">adaptive_coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> adaptive_lasso.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> weights</span>
<span id="cb4-22"></span>
<span id="cb4-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Standard Lasso non-zero: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(lasso.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb4-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Adaptive Lasso non-zero: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(adaptive_coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="relaxed-lasso" class="level3">
<h3 class="anchored" data-anchor-id="relaxed-lasso">Relaxed Lasso</h3>
<p>Relaxed Lasso separates selection from estimation. First, run Lasso to pick variables; then refit on that subset, either with OLS or partial shrinkage via a parameter <img src="https://latex.codecogs.com/png.latex?%5Cphi%20%5Cin%20%5B0,1%5D">. At <img src="https://latex.codecogs.com/png.latex?%5Cphi=0"> you recover Lasso, and at <img src="https://latex.codecogs.com/png.latex?%5Cphi=1"> you get post-selection OLS.</p>
<p>The point is to reduce shrinkage bias. Lasso is good at finding the support but tends to underestimate large coefficients. Relaxing the penalty after selection keeps sparsity while improving estimates.</p>
<p>In practice, this works well when you trust the selected variables but want better coefficient accuracy. The main risk is overfitting if too many variables are selected, so it’s worth tuning both <img src="https://latex.codecogs.com/png.latex?%5Clambda"> and <img src="https://latex.codecogs.com/png.latex?%5Cphi">.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(relaxnet)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For relaxed Lasso</span></span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit relaxed Lasso using glmnet (has built-in support)</span></span>
<span id="cb5-5">relaxed_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">relax =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb5-6">cv_relaxed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">relax =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb5-7"></span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Manual two-stage approach</span></span>
<span id="cb5-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 1: Standard Lasso selection</span></span>
<span id="cb5-10">lasso_coef <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb5-11">selected <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(lasso_coef <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb5-12"></span>
<span id="cb5-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 2: OLS on selected variables</span></span>
<span id="cb5-14"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(selected) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) {</span>
<span id="cb5-15">  X_selected <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X[, selected]</span>
<span id="cb5-16">  ols_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> X_selected)</span>
<span id="cb5-17">  </span>
<span id="cb5-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare coefficients</span></span>
<span id="cb5-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Lasso coefficients (selected):</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(lasso_coef[selected])</span>
<span id="cb5-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Relaxed (OLS) coefficients:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(ols_fit)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb5-23">}</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Manual two-stage relaxed Lasso</span></span>
<span id="cb6-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 1: Lasso selection</span></span>
<span id="cb6-5">lasso_coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lasso.coef_</span>
<span id="cb6-6">selected <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.where(lasso_coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb6-7"></span>
<span id="cb6-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Lasso selected </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(selected)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> variables"</span>)</span>
<span id="cb6-9"></span>
<span id="cb6-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 2: OLS on selected variables</span></span>
<span id="cb6-11"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(selected) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb6-12">    X_selected <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X[:, selected]</span>
<span id="cb6-13">    ols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression()</span>
<span id="cb6-14">    ols.fit(X_selected, y)</span>
<span id="cb6-15">    </span>
<span id="cb6-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare coefficient magnitudes</span></span>
<span id="cb6-17">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Lasso coefficients (mean abs): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(lasso_coef[selected])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb6-18">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Relaxed coefficients (mean abs): </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(ols.coef_)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb6-19">    </span>
<span id="cb6-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Often relaxed coefficients are larger in magnitude</span></span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="square-root-lasso" class="level3">
<h3 class="anchored" data-anchor-id="square-root-lasso">Square-root Lasso</h3>
<p>Square-root Lasso, also known as Scaled Lasso, modifies the objective function to: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B%5Csqrt%7Bn%7D%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%20+%20%5Clambda%20%5C%7C%20%5Cbeta%20%5C%7C_1%20%5Cright)%0A"></p>
<p>The crucial difference from standard Lasso is using the <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> norm directly (without squaring) in the loss term. This seemingly small change has important consequences: the estimator becomes scale-invariant, meaning you don’t need to estimate or know the error variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> to set the penalty parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> appropriately. In standard Lasso, the optimal choice of <img src="https://latex.codecogs.com/png.latex?%5Clambda"> depends on the unknown noise level, but square-root Lasso eliminates this dependence.</p>
<p>This variant is particularly valuable when you have unknown or heteroskedastic error variance, making it robust to variance misspecification. The scale-invariance also simplifies tuning: you can use theoretically-motivated choices for <img src="https://latex.codecogs.com/png.latex?%5Clambda"> without prior knowledge of the noise level. In practice, this often translates to more stable selection across different datasets and makes the method especially appealing in settings where variance estimation is challenging or the homoskedasticity assumption is questionable.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(scalreg)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For square-root Lasso</span></span>
<span id="cb7-2"></span>
<span id="cb7-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit square-root Lasso</span></span>
<span id="cb7-4">sqrt_lasso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scalreg</span>(X, y)</span>
<span id="cb7-5"></span>
<span id="cb7-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare with standard Lasso</span></span>
<span id="cb7-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Standard Lasso selected:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"variables</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb7-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Square-root Lasso selected:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(sqrt_lasso<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coefficients <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"variables</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Square-root Lasso is not in sklearn, but we can implement a simple version</span></span>
<span id="cb8-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LassoLars</span>
<span id="cb8-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy.optimize <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> minimize</span>
<span id="cb8-4"></span>
<span id="cb8-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Manual implementation using CVXPY (if available)</span></span>
<span id="cb8-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb8-7">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> cvxpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> cp</span>
<span id="cb8-8">    </span>
<span id="cb8-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define variables</span></span>
<span id="cb8-10">    beta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cp.Variable(X.shape[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb8-11">    </span>
<span id="cb8-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define objective: ||y - X*beta||_2 + lambda * ||beta||_1</span></span>
<span id="cb8-13">    lambda_sqrt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb8-14">    objective <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cp.Minimize(cp.norm(y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> lambda_sqrt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> cp.norm(beta, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb8-15">    </span>
<span id="cb8-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Solve</span></span>
<span id="cb8-17">    prob <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cp.Problem(objective)</span>
<span id="cb8-18">    prob.solve()</span>
<span id="cb8-19">    </span>
<span id="cb8-20">    sqrt_lasso_coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> beta.value</span>
<span id="cb8-21">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Square-root Lasso selected </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(sqrt_lasso_coef) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-6</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> variables"</span>)</span>
<span id="cb8-22">    </span>
<span id="cb8-23"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ImportError</span>:</span>
<span id="cb8-24">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Square-root Lasso requires cvxpy package"</span>)</span>
<span id="cb8-25">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Install with: pip install cvxpy"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="elastic-net" class="level3">
<h3 class="anchored" data-anchor-id="elastic-net">Elastic Net</h3>
<p>Elastic Net blends <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> regularization by minimizing: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B2n%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda_1%20%5C%7C%20%5Cbeta%20%5C%7C_1%20+%20%5Clambda_2%20%5C%7C%20%5Cbeta%20%5C%7C_2%5E2%20%5Cright)%0A"></p>
<p>This is often reparametrized as <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%5Cleft%5B%20%5Calpha%20%5C%7C%20%5Cbeta%20%5C%7C_1%20+%20(1-%5Calpha)%20%5C%7C%20%5Cbeta%20%5C%7C_2%5E2%20%5Cright%5D"> where <img src="https://latex.codecogs.com/png.latex?%5Calpha%20%5Cin%20%5B0,1%5D"> controls the mixing between <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> and <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> penalties.</p>
<p>Elastic Net fixes a key issue with Lasso: when predictors are highly correlated, Lasso tends to pick one arbitrarily and ignore the rest. Adding an <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> penalty induces a grouping effect, so correlated variables enter or leave together, while the <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> term still enforces sparsity.</p>
<p>This makes it a better default in settings with multicollinearity—common in practice. The mixing parameter <img src="https://latex.codecogs.com/png.latex?%5Calpha"> controls the trade-off: closer to 1 behaves like Lasso, closer to <img src="https://latex.codecogs.com/png.latex?0"> like Ridge. In practice, moderate values (e.g.&nbsp;<img src="https://latex.codecogs.com/png.latex?0.5">) work well, with cross-validation refining the choice.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create correlated predictors to demonstrate Elastic Net advantage</span></span>
<span id="cb9-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb9-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb9-6">X_base <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add correlated predictors</span></span>
<span id="cb9-8">X_corr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(X_base, X_base[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>), n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb9-9">beta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.2</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.3</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True coefficients for correlated pairs</span></span>
<span id="cb9-10">y_corr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X_corr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> beta_true <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb9-11"></span>
<span id="cb9-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Elastic Net with alpha = 0.5 (equal mix of $\ell_1$ and $\ell_2$)</span></span>
<span id="cb9-13">elastic_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X_corr, y_corr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb9-14"></span>
<span id="cb9-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare with pure Lasso (alpha = 1)</span></span>
<span id="cb9-16">lasso_corr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X_corr, y_corr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb9-17"></span>
<span id="cb9-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Elastic Net coefficients:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb9-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(elastic_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>))</span>
<span id="cb9-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Lasso coefficients:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb9-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(lasso_corr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>))</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ElasticNet, ElasticNetCV</span>
<span id="cb10-2"></span>
<span id="cb10-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create correlated predictors</span></span>
<span id="cb10-4">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb10-5">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb10-6">X_base <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb10-7">X_corr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.hstack([X_base, X_base[:, :<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>])</span>
<span id="cb10-8">beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.2</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.3</span>])</span>
<span id="cb10-9">y_corr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X_corr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n)</span>
<span id="cb10-10"></span>
<span id="cb10-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Elastic Net with l1_ratio = 0.5 (equal mix)</span></span>
<span id="cb10-12">elastic <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ElasticNetCV(l1_ratio<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb10-13">elastic.fit(X_corr, y_corr)</span>
<span id="cb10-14"></span>
<span id="cb10-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compare with Lasso</span></span>
<span id="cb10-16">lasso_corr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LassoCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb10-17">lasso_corr.fit(X_corr, y_corr)</span>
<span id="cb10-18"></span>
<span id="cb10-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Elastic Net coefficients:"</span>)</span>
<span id="cb10-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(elastic.coef_)</span>
<span id="cb10-21"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Elastic Net selected </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(elastic.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> variables"</span>)</span>
<span id="cb10-22"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Lasso selected </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(lasso_corr.coef_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> variables"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="group-lasso" class="level3">
<h3 class="anchored" data-anchor-id="group-lasso">Group Lasso</h3>
<p>Group Lasso extends the <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty to operate on predefined groups of variables: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B2n%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda%20%5Csum_%7Bg=1%7D%5EG%20%5C%7C%20%5Cbeta%5E%7B(g)%7D%20%5C%7C_2%20%5Cright)%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Cbeta%5E%7B(g)%7D"> represents the coefficients belonging to group <img src="https://latex.codecogs.com/png.latex?g">, and <img src="https://latex.codecogs.com/png.latex?%5C%7C%20%5Ccdot%20%5C%7C_2"> is the <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> norm applied within each group.</p>
<p>The key insight is that the <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> norm within groups combined with summation across groups creates a sparsity-inducing penalty at the group level. Either all coefficients in a group are set to zero, or all are kept (though possibly shrunk). This “all or nothing” behavior respects the natural grouping structure in your data.</p>
<p>Group Lasso is useful when variables come in meaningful groups. A common example is categorical features encoded as dummies—you usually want to include or exclude the whole variable, not individual levels. Similar structure appears in multi-task settings or grouped scientific measurements.</p>
<p>Instead of sparsity at the coefficient level, Group Lasso selects entire groups while allowing dense coefficients within them. This makes the model align better with how features are constructed.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(grpreg)</span>
<span id="cb11-2"></span>
<span id="cb11-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create data with natural groups</span></span>
<span id="cb11-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Suppose we have 3 categorical variables with 3, 4, and 5 levels</span></span>
<span id="cb11-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb11-6">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb11-7">X1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">model.matrix</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb11-8">X2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">model.matrix</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb11-9">X3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">model.matrix</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb11-10">X_grouped <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(X1, X2, X3)</span>
<span id="cb11-11"></span>
<span id="cb11-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define groups (which columns belong to which group)</span></span>
<span id="cb11-13">groups <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb11-14"></span>
<span id="cb11-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True model: only group 1 and 3 are relevant</span></span>
<span id="cb11-16">beta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb11-17">y_grouped <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X_grouped <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> beta_true <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb11-18"></span>
<span id="cb11-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Group Lasso</span></span>
<span id="cb11-20">group_lasso <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.grpreg</span>(X_grouped, y_grouped, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> groups, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grLasso"</span>)</span>
<span id="cb11-21"></span>
<span id="cb11-22"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Group Lasso coefficients by group:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-23">coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(group_lasso, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb11-24"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (g <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(groups)) {</span>
<span id="cb11-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Group %d: %d non-zero out of %d</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, </span>
<span id="cb11-26">              g, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(coefs[groups <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> g] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(groups <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> g)))</span>
<span id="cb11-27">}</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> MultiTaskLasso</span>
<span id="cb12-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Note: True group Lasso requires specialized packages</span></span>
<span id="cb12-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We'll demonstrate with a simplified example</span></span>
<span id="cb12-4"></span>
<span id="cb12-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate grouped structure</span></span>
<span id="cb12-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb12-7">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb12-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create 3 groups with 3, 4, 5 features each</span></span>
<span id="cb12-9">X_grouped <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span>
<span id="cb12-10">groups <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb12-11"></span>
<span id="cb12-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True coefficients (group 1 and 3 active, group 2 zero)</span></span>
<span id="cb12-13">beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.2</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb12-14">y_grouped <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X_grouped <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n)</span>
<span id="cb12-15"></span>
<span id="cb12-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For true Group Lasso, would need package like 'group-lasso' or 'celer'</span></span>
<span id="cb12-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Here we show conceptual grouping with manual implementation</span></span>
<span id="cb12-18"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"For Python Group Lasso, install specialized packages:"</span>)</span>
<span id="cb12-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  pip install group-lasso"</span>)</span>
<span id="cb12-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  pip install celer"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="fused-lasso" class="level3">
<h3 class="anchored" data-anchor-id="fused-lasso">Fused Lasso</h3>
<p>Fused Lasso adds a penalty on differences between adjacent coefficients: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%20=%20%5Carg%20%5Cmin_%7B%5Cbeta%7D%20%5Cleft(%20%5Cfrac%7B1%7D%7B2n%7D%20%5C%7C%20y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda_1%20%5C%7C%20%5Cbeta%20%5C%7C_1%20+%20%5Clambda_2%20%5Csum_%7Bj=2%7D%5Ep%20%7C%20%5Cbeta_j%20-%20%5Cbeta_%7Bj-1%7D%20%7C%20%5Cright)%0A"></p>
<p>This method introduces two types of penalties: the standard <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty <img src="https://latex.codecogs.com/png.latex?%5Clambda_1%20%5C%7C%20%5Cbeta%20%5C%7C_1"> encourages overall sparsity (setting coefficients to zero), while the fusion penalty <img src="https://latex.codecogs.com/png.latex?%5Clambda_2%20%5Csum_%7Bj=2%7D%5Ep%20%7C%20%5Cbeta_j%20-%20%5Cbeta_%7Bj-1%7D%20%7C"> encourages adjacent coefficients to be equal. The fusion penalty means that nearby coefficients in the ordering are pulled toward each other, creating piecewise-constant patterns in the coefficient profile.</p>
<p>Fused Lasso is useful when features have a natural ordering and coefficients are expected to vary smoothly or in blocks. Instead of treating coefficients independently, it encourages both sparsity and similarity between neighbors, leading to piecewise-constant patterns.</p>
<p>This shows up in time series, spatial data, or ordered genomic features. The two penalties control the trade-off: <img src="https://latex.codecogs.com/png.latex?%5Clambda_1"> drives sparsity, while <img src="https://latex.codecogs.com/png.latex?%5Clambda_2"> controls how strongly adjacent coefficients are fused.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(genlasso)</span>
<span id="cb13-2"></span>
<span id="cb13-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate data with ordered features (e.g., time series or spatial)</span></span>
<span id="cb13-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb13-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb13-6">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb13-7"></span>
<span id="cb13-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create design matrix with ordered features</span></span>
<span id="cb13-9">X_ordered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> p), n, p)</span>
<span id="cb13-10"></span>
<span id="cb13-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True coefficients with piecewise constant structure</span></span>
<span id="cb13-12">beta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb13-13">y_ordered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> X_ordered <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> beta_true <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n)</span>
<span id="cb13-14"></span>
<span id="cb13-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Fused Lasso</span></span>
<span id="cb13-16">fused_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fusedlasso</span>(y_ordered, X_ordered)</span>
<span id="cb13-17"></span>
<span id="cb13-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get coefficients at a specific lambda</span></span>
<span id="cb13-19">lambda_idx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Example index</span></span>
<span id="cb13-20">coefs_fused <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fused_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> fused_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lambda[lambda_idx])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>beta</span>
<span id="cb13-21"></span>
<span id="cb13-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Visualize coefficient profile</span></span>
<span id="cb13-23"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(coefs_fused, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"s"</span>, </span>
<span id="cb13-24">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fused Lasso Coefficient Profile"</span>,</span>
<span id="cb13-25">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Feature Index"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Coefficient"</span>,</span>
<span id="cb13-26">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb13-27"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lines</span>(beta_true, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lwd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb13-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">legend</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"topright"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True"</span>), </span>
<span id="cb13-29">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div></div>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fused Lasso implementation using sklearn and custom penalty</span></span>
<span id="cb14-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Lasso</span>
<span id="cb14-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb14-4"></span>
<span id="cb14-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate ordered features</span></span>
<span id="cb14-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb14-7">n, p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span></span>
<span id="cb14-8">X_ordered <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n, p)</span>
<span id="cb14-9"></span>
<span id="cb14-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Piecewise constant true coefficients</span></span>
<span id="cb14-11">beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate([</span>
<span id="cb14-12">    np.zeros(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), np.full(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), np.zeros(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), </span>
<span id="cb14-13">    np.full(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>), np.zeros(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb14-14">])</span>
<span id="cb14-15">y_ordered <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X_ordered <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">@</span> beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n)</span>
<span id="cb14-16"></span>
<span id="cb14-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Standard Lasso (for comparison)</span></span>
<span id="cb14-18">lasso_ordered <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Lasso(alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)</span>
<span id="cb14-19">lasso_ordered.fit(X_ordered, y_ordered)</span>
<span id="cb14-20"></span>
<span id="cb14-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For true Fused Lasso, specialized packages needed</span></span>
<span id="cb14-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Conceptual visualization</span></span>
<span id="cb14-23">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb14-24">plt.plot(beta_true, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'r--'</span>, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'True'</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb14-25">plt.plot(lasso_ordered.coef_, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'b-'</span>, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Standard Lasso'</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>)</span>
<span id="cb14-26">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Feature Index'</span>)</span>
<span id="cb14-27">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Coefficient Value'</span>)</span>
<span id="cb14-28">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Coefficient Profile: Fused Lasso Encourages Piecewise Constant Structure'</span>)</span>
<span id="cb14-29">plt.legend()</span>
<span id="cb14-30">plt.grid(alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb14-31">plt.show()</span>
<span id="cb14-32"></span>
<span id="cb14-33"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"For true Fused Lasso in Python, consider packages:"</span>)</span>
<span id="cb14-34"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  skfda (functional data analysis)"</span>)</span>
<span id="cb14-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  or implement using cvxpy with fusion penalty"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="graphical-lasso" class="level3">
<h3 class="anchored" data-anchor-id="graphical-lasso">Graphical Lasso</h3>
<p>Graphical Lasso applies <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalization to the estimation of precision matrices (inverse covariance matrices): <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5CTheta%7D%20=%20%5Carg%20%5Cmin_%7B%5CTheta%20%5Csucc%200%7D%20%5Cleft(%20-%5Clog%20%5Cdet%20%5CTheta%20+%20%5Ctext%7Btrace%7D(S%20%5CTheta)%20+%20%5Clambda%20%5C%7C%20%5CTheta%20%5C%7C_1%20%5Cright)%0A"> where <img src="https://latex.codecogs.com/png.latex?%5CTheta"> is the precision matrix, <img src="https://latex.codecogs.com/png.latex?S"> is the sample covariance matrix, and <img src="https://latex.codecogs.com/png.latex?%5CTheta%20%5Csucc%200"> ensures positive definiteness.</p>
<p>Graphical Lasso shifts the focus from regression to covariance structure, estimating a sparse precision matrix. A zero entry <img src="https://latex.codecogs.com/png.latex?%5CTheta_%7Bij%7D%20=%200"> means variables <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?j"> are conditionally independent given the rest, so the model directly encodes a network of relationships.</p>
<p>This is useful when the goal is to recover dependency structure rather than predict an outcome—common in genomics, finance, or neuroscience. The <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty enforces sparsity, leading to interpretable graphs where most connections are absent. In practice, the main challenge is tuning <img src="https://latex.codecogs.com/png.latex?%5Clambda"> to balance fit and sparsity.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glasso)</span>
<span id="cb15-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(igraph)</span>
<span id="cb15-3"></span>
<span id="cb15-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate multivariate data</span></span>
<span id="cb15-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb15-6">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb15-7">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb15-8"></span>
<span id="cb15-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a sparse precision matrix (true network structure)</span></span>
<span id="cb15-10">Theta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, p, p)</span>
<span id="cb15-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(Theta_true) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb15-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add some conditional dependencies</span></span>
<span id="cb15-13">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb15-14">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span></span>
<span id="cb15-15">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span></span>
<span id="cb15-16">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb15-17"></span>
<span id="cb15-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate data from this precision matrix</span></span>
<span id="cb15-19">Sigma <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">solve</span>(Theta_true)</span>
<span id="cb15-20">X_network <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mvrnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, p), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sigma =</span> Sigma)</span>
<span id="cb15-21"></span>
<span id="cb15-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute sample covariance</span></span>
<span id="cb15-23">S <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(X_network)</span>
<span id="cb15-24"></span>
<span id="cb15-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Graphical Lasso</span></span>
<span id="cb15-26">glasso_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glasso</span>(S, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rho =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># rho is the penalty parameter</span></span>
<span id="cb15-27"></span>
<span id="cb15-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract estimated precision matrix</span></span>
<span id="cb15-29">Theta_est <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> glasso_fit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>wi</span>
<span id="cb15-30"></span>
<span id="cb15-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Visualize network</span></span>
<span id="cb15-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create adjacency matrix (thresholded)</span></span>
<span id="cb15-33">adj_matrix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(Theta_est) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb15-34"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(adj_matrix) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb15-35"></span>
<span id="cb15-36"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot network</span></span>
<span id="cb15-37">graph_obj <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">graph_from_adjacency_matrix</span>(adj_matrix, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"undirected"</span>)</span>
<span id="cb15-38"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(graph_obj, </span>
<span id="cb15-39">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Estimated Conditional Dependence Network"</span>,</span>
<span id="cb15-40">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vertex.size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,</span>
<span id="cb15-41">     <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vertex.label.cex =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>)</span></code></pre></div></div>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.covariance <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GraphicalLassoCV</span>
<span id="cb16-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> networkx <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> nx</span>
<span id="cb16-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb16-4"></span>
<span id="cb16-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simulate multivariate data</span></span>
<span id="cb16-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb16-7">n, p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb16-8"></span>
<span id="cb16-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True sparse precision matrix</span></span>
<span id="cb16-10">Theta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.eye(p)</span>
<span id="cb16-11">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb16-12">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span></span>
<span id="cb16-13">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span></span>
<span id="cb16-14">Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Theta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb16-15"></span>
<span id="cb16-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate data</span></span>
<span id="cb16-17">Sigma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.linalg.inv(Theta_true)</span>
<span id="cb16-18">X_network <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.multivariate_normal(np.zeros(p), Sigma, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n)</span>
<span id="cb16-19"></span>
<span id="cb16-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit Graphical Lasso with cross-validation</span></span>
<span id="cb16-21">glasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GraphicalLassoCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb16-22">glasso.fit(X_network)</span>
<span id="cb16-23"></span>
<span id="cb16-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get estimated precision matrix</span></span>
<span id="cb16-25">Theta_est <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> glasso.precision_</span>
<span id="cb16-26"></span>
<span id="cb16-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Visualize network</span></span>
<span id="cb16-28">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb16-29"></span>
<span id="cb16-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create adjacency matrix (thresholded)</span></span>
<span id="cb16-31">adj_matrix <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(Theta_est) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb16-32">np.fill_diagonal(adj_matrix, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb16-33"></span>
<span id="cb16-34"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot using networkx</span></span>
<span id="cb16-35">G <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nx.from_numpy_array(adj_matrix)</span>
<span id="cb16-36">pos <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nx.spring_layout(G, seed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb16-37"></span>
<span id="cb16-38">plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb16-39">nx.draw(G, pos, with_labels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, node_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lightblue'</span>, </span>
<span id="cb16-40">        node_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>, font_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, font_weight<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bold'</span>)</span>
<span id="cb16-41">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Estimated Network Structure'</span>)</span>
<span id="cb16-42"></span>
<span id="cb16-43"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show precision matrix heatmap</span></span>
<span id="cb16-44">plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb16-45">plt.imshow(Theta_est, cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'RdBu_r'</span>, vmin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, vmax<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb16-46">plt.colorbar(label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Precision Matrix Entry'</span>)</span>
<span id="cb16-47">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Estimated Precision Matrix'</span>)</span>
<span id="cb16-48">plt.tight_layout()</span>
<span id="cb16-49">plt.show()</span>
<span id="cb16-50"></span>
<span id="cb16-51"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Sparsity: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>np<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(Theta_est) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2%}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>The Lasso family has expanded to include specialized methods (e.g., Adaptive, Elastic Net, Group Lasso) that address unique challenges like bias reduction, feature correlation, grouping structures, and network discovery.</li>
<li>Selection depends on data characteristics—correlated predictors (Elastic Net), grouped features (Group Lasso), ordered data (Fused Lasso), or bias concerns (Adaptive/Relaxed Lasso)—while all share a core principle of sparsity-promoting penalization.</li>
<li>Despite their differences, all variants rely on penalized optimization to achieve simplicity, offering tailored solutions for different modeling needs.</li>
<li>Modern tools (R: <code>glmnet</code>, <code>grpreg</code>; Python: <code>scikit-learn</code>, <code>group-lasso</code>) make these methods widely available.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a comprehensive treatment of penalized regression methods, see “The Elements of Statistical Learning” by Hastie, Tibshirani, and Friedman (2009), which covers Lasso and many variants in detail. “Statistical Learning with Sparsity” by Hastie, Tibshirani, and Wainwright (2015) provides a more recent and focused treatment. For theoretical properties and high-dimensional asymptotics, Bühlmann and van de Geer’s “Statistics for High-Dimensional Data” (2011) is excellent, but too technical and dense for most readers.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p>Belloni, A., Chernozhukov, V., &amp; Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. <em>Biometrika</em>, 98(4), 791–806.</p></li>
<li><p>Bühlmann, P., &amp; van de Geer, S. (2011). <em>Statistics for High-Dimensional Data: Methods, Theory and Applications</em>. Springer.</p></li>
<li><p>Friedman, J., Hastie, T., &amp; Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. <em>Biostatistics</em>, 9(3), 432–441.</p></li>
<li><p>Hastie, T., Tibshirani, R., &amp; Friedman, J. (2009). <em>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</em> (2nd ed.). Springer.</p></li>
<li><p>Hastie, T., Tibshirani, R., &amp; Wainwright, M. (2015). <em>Statistical Learning with Sparsity: The Lasso and Generalizations</em>. CRC Press.</p></li>
<li><p>Meinshausen, N. (2007). Relaxed lasso. <em>Computational Statistics &amp; Data Analysis</em>, 52(1), 374–393.</p></li>
<li><p>Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. <em>Journal of the Royal Statistical Society: Series B (Methodological)</em>, 58(1), 267–288.</p></li>
<li><p>Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., &amp; Knight, K. (2005). Sparsity and smoothness via the fused lasso. <em>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</em>, 67(1), 91–108.</p></li>
<li><p>Yuan, M., &amp; Lin, Y. (2006). Model selection and estimation in regression with grouped variables. <em>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</em>, 68(1), 49–67.</p></li>
<li><p>Zou, H., &amp; Hastie, T. (2005). Regularization and variable selection via the elastic net. <em>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</em>, 67(2), 301–320.</p></li>
<li><p>Zou, H. (2006). The adaptive lasso and its oracle properties. <em>Journal of the American Statistical Association</em>, 101(476), 1418–1429.</p></li>
</ul>


</section>

 ]]></description>
  <category>machine learning</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-lasso.html</guid>
  <pubDate>Sat, 14 Mar 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Oracle Property: What It Promises (and What It Doesn’t)</title>
  <link>https://vyasenov.github.io/blog/oracle-property.html</link>
  <description><![CDATA[ 





<div class="reading-time">4 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>In high-dimensional regression, we sometimes hear that a method possesses the oracle property. The phrase sounds impressive: it suggests that an estimator behaves as if the true sparsity pattern were known in advance—hence the name, as though an oracle had revealed the true support beforehand.</p>
<p>This note explains what the oracle property actually means, why it is considered desirable, and where its practical relevance is limited. The goal is to distinguish asymptotic guarantees from practical performance. As usual, I introduce some notation so that the discussion rests on a clear mathematical foundation and a shared framework.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Consider the linear model</p>
<p><img src="https://latex.codecogs.com/png.latex?Y%20=%20X%5Cbeta%20+%20%5Cvarepsilon,%20%5Cquad%20%5Cvarepsilon%20%5Csim%20(0,%20%5Csigma%5E2%20I_n),"></p>
<p>with <img src="https://latex.codecogs.com/png.latex?X%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D"> and <img src="https://latex.codecogs.com/png.latex?p"> potentially large. Let the true parameter vector be sparse:</p>
<p><img src="https://latex.codecogs.com/png.latex?S%20=%20%5C%7Bj%20:%20%5Cbeta_j%20%5Cneq%200%5C%7D,%20%5Cquad%20s%20=%20%7CS%7C."></p>
<p>Put simply, <img src="https://latex.codecogs.com/png.latex?S"> is the set of variables that are non-zero in the true parameter vector <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, and <img src="https://latex.codecogs.com/png.latex?s"> is the number of non-zero variables.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="definition" class="level3">
<h3 class="anchored" data-anchor-id="definition">Definition</h3>
<p>An estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta"> is said to have the oracle property if it can do two things:</p>
<ul>
<li>Selection consistency: <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BP%7D(%5Chat%20S%20=%20S)%20%5Cto%201,%20%5Ctext%7B%20where%20%7D%20%5Chat%20S%20=%20%5C%7Bj%20:%20%5Chat%5Cbeta_j%20%5Cneq%200%5C%7D,"></li>
<li>Asymptotic efficiency: <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D(%5Chat%5Cbeta_S%20-%20%5Cbeta_S)%0A%5Coverset%7Bd%7D%7B%5Clongrightarrow%7D%0A%5Cmathcal%7BN%7D(0,%20%5Csigma%5E2%20(X_S%5E%5Ctop%20X_S)%5E%7B-1%7D),"> which is the same limiting distribution as the OLS estimator that knows <img src="https://latex.codecogs.com/png.latex?S"> in advance.</li>
</ul>
<p>If the support <img src="https://latex.codecogs.com/png.latex?S"> were known, estimation reduces to low-dimensional OLS on <img src="https://latex.codecogs.com/png.latex?X_S">. That estimator is unbiased, efficient, and easy to analyze. Some of you will remember the Gauss-Markov theorem from your econometrics course which states that, the OLS estimator is the best linear unbiased estimator (BLUE) under homoskedasticity.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Oracle Property Definition">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Oracle Property Definition
</div>
</div>
<div class="callout-body-container callout-body">
<p>Can a data-driven procedure simultaneously discover <img src="https://latex.codecogs.com/png.latex?S"> and then estimate as efficiently as if <img src="https://latex.codecogs.com/png.latex?S"> were given?</p>
</div>
</div>
<p>This is an appealing theoretical benchmark for sparse estimators. You can hardly do better than that.</p>
</section>
<section id="which-methods-achieve-it" class="level3">
<h3 class="anchored" data-anchor-id="which-methods-achieve-it">Which Methods Achieve It</h3>
<p>Classical LASSO does not generally satisfy the oracle property. Its <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty introduces shrinkage bias that persists asymptotically.</p>
<p>Nonconvex penalties (e.g., SCAD and MCP) were explicitly designed to achieve the oracle property under regularity conditions. Adaptive LASSO can also achieve it when weights are constructed from a root-<img src="https://latex.codecogs.com/png.latex?n"> consistent pilot estimator.</p>
<p>The key mechanism is reduced shrinkage for large coefficients while still penalizing small ones.</p>
</section>
<section id="practical-implications" class="level3">
<h3 class="anchored" data-anchor-id="practical-implications">Practical Implications</h3>
<p>The oracle property is always <em>asymptotic</em>. There are never such guarantees in finite samples. It requires conditions such as:</p>
<ul>
<li>correct model specification,</li>
<li>suitable signal strength (minimum nonzero coefficient size),</li>
<li>regularity conditions on the design matrix,</li>
<li>appropriate tuning parameter rates.</li>
</ul>
<p>In finite samples, especially when signals are weak or highly correlated, procedures that theoretically satisfy the oracle property may not outperform simpler methods. In practice, prediction risk often matters more than exact support recovery.</p>
<p>There is also a conceptual point: the oracle benchmark assumes that the “true” model is sparse and well-defined. In many modern applications, sparsity is an approximation rather than a literal truth.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>The oracle property means consistent variable selection plus asymptotically efficient estimation on the true support.</li>
<li>Nonconvex penalties and adaptive LASSO can achieve it; standard LASSO typically does not.</li>
<li>The property is asymptotic and depends on strong conditions (signal strength, design assumptions, tuning rates).</li>
<li>In practice, predictive performance and stability often matter more than satisfying oracle-style guarantees.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>Fan and Li (2001) introduced SCAD and formalized the oracle property in penalized likelihood estimation. Zou (2006) shows how adaptive LASSO can achieve oracle behavior. Bühlmann and van de Geer’s Statistics for High-Dimensional Data provides a modern, rigorous treatment of sparsity, regularization paths, and inference in high-dimensional regimes.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Bühlmann, P., &amp; van de Geer, S. (2011). Statistics for High-Dimensional Data.</p>
<p>Fan, J., &amp; Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.</p>
<p>Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty.</p>
<p>Zou, H. (2006). The adaptive LASSO and its oracle properties.</p>


</section>

 ]]></description>
  <category>machine learning</category>
  <category>variable selection</category>
  <guid>https://vyasenov.github.io/blog/oracle-property.html</guid>
  <pubDate>Fri, 13 Mar 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>Why Some Confidence Intervals Are Not Symmetric</title>
  <link>https://vyasenov.github.io/blog/nonsymmetric-conf-int.html</link>
  <description><![CDATA[ 





<div class="reading-time">4 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Most of us were trained to think of a <img src="https://latex.codecogs.com/png.latex?95%5C%25"> confidence interval as</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%20%5Cpm%201.96%20%5Ccdot%20%5Cmathrm%7BSE%7D(%5Chat%7B%5Ctheta%7D)."></p>
<p>That template is deeply ingrained. It works beautifully for estimators whose sampling distributions are symmetric and well behaved. But have you ever come across a confidence interval with an off-center point estimate?</p>
<p>The “<img src="https://latex.codecogs.com/png.latex?%5Cpm"> margin of error” representation is not a defining property of confidence intervals. It is a <em>consequence of</em> symmetry. Once symmetry disappears because of skewed sampling distributions, nonlinear transformations, boundary constraints, or small-sample behavior, the interval need not be centered around the point estimate.</p>
<p>The goal of this note is to unpack where asymmetry comes from, when it is expected, and how different construction principles lead to intervals that look very different from the textbook <img src="https://latex.codecogs.com/png.latex?t">-interval. I will also illustrate the phenomenon with a bootstrap example in <code>R</code> and <code>Python</code>.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%5Cin%20%5CTheta"> denote a scalar parameter of interest, and let <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%20=%20%5Chat%7B%5Ctheta%7D(X_1,%5Cdots,X_n)"> be an estimator.</p>
<p>A <img src="https://latex.codecogs.com/png.latex?(1-%5Calpha)"> confidence interval is a random set <img src="https://latex.codecogs.com/png.latex?C(X)"> such that, by definition,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BP%7D_%5Ctheta%5Cbig(%5Ctheta%20%5Cin%20C(X)%5Cbig)%20%5Cge%201-%5Calpha%0A%5Cquad%20%5Ctext%7Bfor%20all%20%7D%20%5Ctheta%20%5Cin%20%5CTheta.%0A"></p>
<p>When <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> satisfies an asymptotic normality result, then a Wald-type interval takes the familiar form</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%20%5Cpm%20z_%7B%5Calpha/2%7D%5C,%5Cwidehat%7B%5Cmathrm%7BSE%7D%7D(%5Chat%7B%5Ctheta%7D),"></p>
<p>where the critical value <img src="https://latex.codecogs.com/png.latex?z_%7B%5Calpha/2%7D"> is the <img src="https://latex.codecogs.com/png.latex?(1-%5Calpha/2)"> quantile of the standard normal.</p>
<p>This interval is symmetric around <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> by construction. Its symmetry is inherited from the symmetry of the limiting Gaussian distribution. Remove that symmetry or step outside the world where the approximation is valid, and the interval will generally no longer be symmetric.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<p>I will now examine four common sources of confidence interval asymmetry.</p>
<section id="skewed-sampling-distributions" class="level3">
<h3 class="anchored" data-anchor-id="skewed-sampling-distributions">Skewed Sampling Distributions</h3>
<p>Symmetry of the interval reflects symmetry of the sampling distribution, not symmetry of the data.</p>
<p>Consider estimating a proportion <img src="https://latex.codecogs.com/png.latex?p"> from a binomial model. The MLE is <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D%20=%20X/n">. For moderate <img src="https://latex.codecogs.com/png.latex?n"> and <img src="https://latex.codecogs.com/png.latex?p"> near <img src="https://latex.codecogs.com/png.latex?0"> or <img src="https://latex.codecogs.com/png.latex?1">, the distribution of <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D"> is visibly skewed. A Wald interval,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D%20%5Cpm%20z_%7B%5Calpha/2%7D%5Csqrt%7B%5Cfrac%7B%5Chat%7Bp%7D(1-%5Chat%7Bp%7D)%7D%7Bn%7D%7D,"></p>
<p>may extend below <img src="https://latex.codecogs.com/png.latex?0"> or above <img src="https://latex.codecogs.com/png.latex?1">. That is a red flag: the procedure ignores the geometry of the parameter space.</p>
<p>Score intervals and logit-transformed intervals are asymmetric in <img src="https://latex.codecogs.com/png.latex?p"> precisely because they respect this skewness and the <img src="https://latex.codecogs.com/png.latex?%5B0,1%5D"> constraint. The asymmetry is not a flaw—it is the correction.</p>
</section>
<section id="nonlinear-transformations" class="level3">
<h3 class="anchored" data-anchor-id="nonlinear-transformations">Nonlinear Transformations</h3>
<p>Suppose <img src="https://latex.codecogs.com/png.latex?%5Cphi%20=%20g(%5Ctheta)"> for a nonlinear <img src="https://latex.codecogs.com/png.latex?g">. Even if <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> is approximately normal, the distribution of</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cphi%7D%20=%20g(%5Chat%7B%5Ctheta%7D)"></p>
<p>is generally not symmetric in finite samples.</p>
<p>A first-order delta method approximation gives</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D%5Cbig(%5Chat%7B%5Cphi%7D%20-%20%5Cphi%5Cbig)%0A%5C;%5Coverset%7Bd%7D%7B%5Clongrightarrow%7D%5C;%0A%5Cmathcal%7BN%7D%5Cleft(0,%20%5Cbig(g'(%5Ctheta)%5Cbig)%5E2%20V(%5Ctheta)%5Cright),%0A"></p>
<p>which suggests a symmetric interval in <img src="https://latex.codecogs.com/png.latex?%5Cphi">-space. However, mapping that interval back to <img src="https://latex.codecogs.com/png.latex?%5Ctheta">-space via <img src="https://latex.codecogs.com/png.latex?g%5E%7B-1%7D"> typically produces asymmetry.</p>
<p>This is routine in practice. Log-scale confidence intervals for positive parameters (e.g., rate ratios, hazard ratios) are symmetric in <img src="https://latex.codecogs.com/png.latex?%5Clog%20%5Ctheta"> but asymmetric in <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. The asymmetry reflects curvature in <img src="https://latex.codecogs.com/png.latex?g">.</p>
</section>
<section id="likelihood-based-intervals" class="level3">
<h3 class="anchored" data-anchor-id="likelihood-based-intervals">Likelihood-Based Intervals</h3>
<p>Likelihood-ratio intervals solve</p>
<p><img src="https://latex.codecogs.com/png.latex?2%5Cbig(%5Cell(%5Chat%7B%5Ctheta%7D)%20-%20%5Cell(%5Ctheta)%5Cbig)%20%5Cle%20%5Cchi%5E2_%7B1,1-%5Calpha%7D,"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)"> is the log-likelihood. When <img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)"> is not quadratic as is common in small samples or near boundaries, the resulting set is not symmetric around <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D">.</p>
<p>The quadratic approximation that underlies Wald intervals is a second-order Taylor expansion. If the likelihood is skewed, the quadratic approximation inherits bias, and symmetric intervals misrepresent uncertainty.</p>
</section>
<section id="bootstrap-percentile-intervals" class="level3">
<h3 class="anchored" data-anchor-id="bootstrap-percentile-intervals">Bootstrap Percentile Intervals</h3>
<p>Bootstrap percentile intervals are defined directly from empirical quantiles of the bootstrap distribution:</p>
<p><img src="https://latex.codecogs.com/png.latex?C_%7B%5Ctext%7Bperc%7D%7D%20=%20%5Cleft%5B%0A%5Chat%7B%5Ctheta%7D%5E*_%7B(%5Calpha/2)%7D,%0A%5Chat%7B%5Ctheta%7D%5E*_%7B(1-%5Calpha/2)%7D%0A%5Cright%5D,"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*"> are bootstrap replicates.</p>
<p>No symmetry is imposed. If the empirical distribution of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*"> is skewed, the interval is skewed. This is often desirable: the procedure adapts to the shape of the sampling distribution.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm: Percentile Bootstrap CI">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm: Percentile Bootstrap CI
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Draw <img src="https://latex.codecogs.com/png.latex?B"> bootstrap samples by resampling with replacement.</li>
<li>Compute <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E%7B*(b)%7D"> for each resample.</li>
<li>Form the interval from the empirical <img src="https://latex.codecogs.com/png.latex?%5Calpha/2"> and <img src="https://latex.codecogs.com/png.latex?1-%5Calpha/2"> quantiles of <img src="https://latex.codecogs.com/png.latex?%5C%7B%5Chat%7B%5Ctheta%7D%5E%7B*(b)%7D%5C%7D_%7Bb=1%7D%5EB">.</li>
</ol>
</div>
</div>
<p>The percentile method is not universally optimal, but it makes the asymmetry explicit instead of suppressing it.</p>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<p>We simulate from an exponential distribution, which is right-skewed. Even the (asymptotically normal) sample mean can have a noticeably skewed sampling distribution at moderate <img src="https://latex.codecogs.com/png.latex?n">.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate 50 observations from Exp(1)</span></span>
<span id="cb1-4">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rexp</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rate =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-5">sample_mean <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(x)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bootstrap distribution of the mean</span></span>
<span id="cb1-8">boot_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)))</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Percentile CI</span></span>
<span id="cb1-11">ci_percentile <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(boot_means, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.025</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.975</span>))</span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Symmetric normal approximation</span></span>
<span id="cb1-14">se <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(x))</span>
<span id="cb1-15">ci_symmetric <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(sample_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se,</span>
<span id="cb1-16">                  sample_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se)</span>
<span id="cb1-17"></span>
<span id="cb1-18">lower_distance <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> sample_mean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ci_percentile[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb1-19">upper_distance <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ci_percentile[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sample_mean</span>
<span id="cb1-20"></span>
<span id="cb1-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(lower_distance, upper_distance)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"></span>
<span id="cb2-3">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate 50 observations from Exp(1)</span></span>
<span id="cb2-6">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.exponential(scale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb2-7">sample_mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.mean(x)</span>
<span id="cb2-8"></span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bootstrap distribution of the mean</span></span>
<span id="cb2-10">boot_means <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [</span>
<span id="cb2-11">    np.mean(np.random.choice(x, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, replace<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span>
<span id="cb2-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span>
<span id="cb2-13">]</span>
<span id="cb2-14"></span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Percentile CI</span></span>
<span id="cb2-16">ci_percentile <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.percentile(boot_means, [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">97.5</span>])</span>
<span id="cb2-17"></span>
<span id="cb2-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Symmetric normal approximation</span></span>
<span id="cb2-19">se <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.std(x, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> np.sqrt(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x))</span>
<span id="cb2-20">ci_symmetric <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [</span>
<span id="cb2-21">    sample_mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se,</span>
<span id="cb2-22">    sample_mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> se</span>
<span id="cb2-23">]</span>
<span id="cb2-24"></span>
<span id="cb2-25">lower_distance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sample_mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ci_percentile[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb2-26">upper_distance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ci_percentile[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sample_mean</span>
<span id="cb2-27"></span>
<span id="cb2-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(lower_distance, upper_distance)</span></code></pre></div></div>
</div>
</div>
</div>
<p>In typical runs, the upper distance exceeds the lower distance. The right tail of the exponential distribution propagates into the bootstrap distribution of the mean. The percentile interval reflects that skewness; the Wald interval does not.</p>
<p>As <img src="https://latex.codecogs.com/png.latex?n"> grows, the central limit theorem compresses this asymmetry. At <img src="https://latex.codecogs.com/png.latex?n=50">, it is still visible. At <img src="https://latex.codecogs.com/png.latex?n=5000">, it is largely gone. The interval geometry tracks the sampling distribution geometry.</p>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Symmetric intervals arise from symmetric (often Gaussian) approximations; they are not a universal property of confidence intervals.</li>
<li>Skewness, nonlinear transformations, and boundary constraints naturally induce asymmetric intervals.</li>
<li>Likelihood-based and bootstrap methods often expose asymmetry that Wald intervals conceal.</li>
<li>If the parameter space or sampling distribution is asymmetric, an asymmetric interval is typically more faithful to the underlying uncertainty.</li>
</ul>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Casella, G., &amp; Berger, R. L. (2002). Statistical Inference.</p>
<p>Efron, B., &amp; Tibshirani, R. J. (1993). An Introduction to the Bootstrap.</p>


</section>

 ]]></description>
  <category>statistical inference</category>
  <category>hypothesis testing</category>
  <guid>https://vyasenov.github.io/blog/nonsymmetric-conf-int.html</guid>
  <pubDate>Tue, 10 Mar 2026 07:00:00 GMT</pubDate>
</item>
<item>
  <title>OLS with Fixed vs Random \(X\): What Actually Changes?</title>
  <link>https://vyasenov.github.io/blog/ols-fixed-random-x.html</link>
  <description><![CDATA[ 





<div class="reading-time">3 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>In regression courses, you will eventually hear the phrase: “OLS works whether <img src="https://latex.codecogs.com/png.latex?X"> is fixed or random.” That statement is correct, but dangerously compressed.</p>
<p>The distinction between fixed and random regressors is not about how you compute <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta">. The algebra is identical. The difference is in what is random, what we condition on, and how we interpret sampling statements.</p>
<p>The goal of this note is to make that distinction precise, and to clarify what does—and does not—depend on treating <img src="https://latex.codecogs.com/png.latex?X"> as fixed.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Consider the well-known linear model</p>
<p><img src="https://latex.codecogs.com/png.latex?Y%20=%20X%5Cbeta%20+%20%5Cvarepsilon,"></p>
<p>where:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?Y%20%5Cin%20%5Cmathbb%7BR%7D%5En">,</li>
<li><img src="https://latex.codecogs.com/png.latex?X%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D"> with full column rank,</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cbeta%20%5Cin%20%5Cmathbb%7BR%7D%5Ep">,</li>
<li><img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon%20%5Cin%20%5Cmathbb%7BR%7D%5En"> with <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5B%5Cvarepsilon%20%5Cmid%20X%5D%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7BVar%7D(%5Cvarepsilon%20%5Cmid%20X)%20=%20%5Csigma%5E2%20I">.</li>
</ul>
<p>The standard OLS estimator is</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta%20=%20(X%5E%5Ctop%20X)%5E%7B-1%7D%20X%5E%5Ctop%20Y."></p>
<p>The key question is: are we conditioning on <img src="https://latex.codecogs.com/png.latex?X">, or is <img src="https://latex.codecogs.com/png.latex?X"> itself a random object in the data-generating process?</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<p>Let’s take a closer look at the two cases.</p>
<section id="fixed-x-classical-linear-model" class="level3">
<h3 class="anchored" data-anchor-id="fixed-x-classical-linear-model">Fixed <img src="https://latex.codecogs.com/png.latex?X">: Classical Linear Model</h3>
<p>In the classical setup, <img src="https://latex.codecogs.com/png.latex?X"> is treated as fixed (non-stochastic). Then, all randomness comes from the error term <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon">.</p>
<p>Conditional on <img src="https://latex.codecogs.com/png.latex?X">,</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5B%5Chat%5Cbeta%20%5Cmid%20X%5D%20=%20%5Cbeta,"> <img src="https://latex.codecogs.com/png.latex?%5Coperatorname%7BVar%7D(%5Chat%5Cbeta%20%5Cmid%20X)%20=%20%5Csigma%5E2%20(X%5E%5Ctop%20X)%5E%7B-1%7D."></p>
<p>Inference is therefore <em>conditional</em> inference. Confidence intervals and <img src="https://latex.codecogs.com/png.latex?t">-tests are statements about the distribution of <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta"> given this specific design matrix, <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>This framework is natural in designed experiments, where <img src="https://latex.codecogs.com/png.latex?X"> is literally chosen by the researcher.</p>
</section>
<section id="random-x-econometric-view" class="level3">
<h3 class="anchored" data-anchor-id="random-x-econometric-view">Random <img src="https://latex.codecogs.com/png.latex?X">: Econometric View</h3>
<p>In most observational settings, <img src="https://latex.codecogs.com/png.latex?X"> is random. We observe i.i.d. draws <img src="https://latex.codecogs.com/png.latex?(X_i,%20Y_i)"> from an unknown joint distribution, <img src="https://latex.codecogs.com/png.latex?F_%7BX,Y%7D">. Under standard regularity conditions, the same OLS estimator, <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta">, satisfies</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta%0A%5C;%5Cxrightarrow%7Bp%7D%5C;%0A%5Cbeta%0A%5Cquad%20%5Ctext%7Bif%7D%20%5Cquad%0A%5Cmathbb%7BE%7D%5BX_i%20%5Cvarepsilon_i%5D%20=%200."></p>
<p>The asymptotic variance becomes</p>
<p><img src="https://latex.codecogs.com/png.latex?%20%5Coperatorname%7BAvar%7D(%5Chat%5Cbeta)=%5Cleft(%20%5Cmathbb%7BE%7D%5BX_i%20X_i%5E%5Ctop%5D%20%5Cright)%5E%7B-1%7D%5Cmathbb%7BE%7D%5BX_i%20X_i%5E%5Ctop%20%5Cvarepsilon_i%5E2%5D%5Cleft(%20%5Cmathbb%7BE%7D%5BX_i%20X_i%5E%5Ctop%5D%20%5Cright)%5E%7B-1%7D."></p>
<p>Under homoskedasticity, this simplifies to</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2%20%5Cleft(%20%5Cmathbb%7BE%7D%5BX_i%20X_i%5E%5Ctop%5D%20%5Cright)%5E%7B-1%7D."></p>
<p>The algebra mirrors the fixed-<img src="https://latex.codecogs.com/png.latex?X"> case, but the interpretation changes: we are no longer conditioning on a specific realization of <img src="https://latex.codecogs.com/png.latex?X">; we are <em>averaging over its distribution</em>.</p>
</section>
<section id="what-actually-changes" class="level3">
<h3 class="anchored" data-anchor-id="what-actually-changes">What Actually Changes?</h3>
<p>Three things matter.</p>
<p>First, the object of inference. With fixed <img src="https://latex.codecogs.com/png.latex?X">, inference is conditional on the design. With random <img src="https://latex.codecogs.com/png.latex?X">, inference is about repeated sampling of <img src="https://latex.codecogs.com/png.latex?F_%7BX,Y%7D">.</p>
<p>Second, exogeneity assumptions. In the fixed-<img src="https://latex.codecogs.com/png.latex?X"> model, we require <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5B%5Cvarepsilon%20%5Cmid%20X%5D%20=%200">. In the random-<img src="https://latex.codecogs.com/png.latex?X"> case, we need the same condition, but it now constrains the joint distribution: it says that once we know the regressors, there is no systematic remaining signal in the error term. Violations become statements about endogeneity, meaning <img src="https://latex.codecogs.com/png.latex?X"> is statistically related to omitted factors inside <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon">.</p>
<p>Third, robustness. Heteroskedasticity-robust standard errors are naturally derived in the random-<img src="https://latex.codecogs.com/png.latex?X"> framework, where the conditional variance may depend on <img src="https://latex.codecogs.com/png.latex?X_i">. In other words, different parts of the regressor distribution can come with different noise levels, so inference has to account for that variation rather than rely on a single common variance.</p>
<p>What does not change is the formula for <img src="https://latex.codecogs.com/png.latex?%5Chat%5Cbeta">. Nor does unbiasedness depend on <img src="https://latex.codecogs.com/png.latex?X"> being fixed; it depends on the conditional mean-zero assumption.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>The OLS estimator is algebraically identical whether <img src="https://latex.codecogs.com/png.latex?X"> is fixed or random.</li>
<li>Fixed-<img src="https://latex.codecogs.com/png.latex?X"> inference is conditional; random-<img src="https://latex.codecogs.com/png.latex?X"> inference averages over the joint distribution.</li>
<li>Consistency hinges on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5BX_i%20%5Cvarepsilon_i%5D%20=%200">, not on whether <img src="https://latex.codecogs.com/png.latex?X"> is stochastic.</li>
<li>Robust variance formulas arise naturally once <img src="https://latex.codecogs.com/png.latex?X"> is treated as random.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a classical treatment, see Greene’s <em>Econometric Analysis</em>, which clearly distinguishes fixed and stochastic regressors. Wooldridge’s <em>Econometric Analysis of Cross Section and Panel Data</em> provides a modern random-<img src="https://latex.codecogs.com/png.latex?X"> perspective with emphasis on exogeneity conditions and robust inference.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Greene, W. H. (2018). Econometric Analysis.</p>
<p>Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.</p>


</section>

 ]]></description>
  <category>parametric models</category>
  <category>statistical inference</category>
  <guid>https://vyasenov.github.io/blog/ols-fixed-random-x.html</guid>
  <pubDate>Sun, 08 Mar 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>Logistic Regression in Randomized Trials?</title>
  <link>https://vyasenov.github.io/blog/logit-randomized-experiments.html</link>
  <description><![CDATA[ 





<div class="reading-time">4 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Randomized controlled trials (RCTs) are the gold standard for causal inference. Random assignment guarantees that treatment is independent of potential outcomes. As a result, simple differences in observed outcomes identify causal effects without requiring outcome modeling.</p>
<p>With binary outcomes, however, data scientists often default to logistic regression. That instinct feels natural: the outcome is binary, the logit model is standard, and regression allows covariate adjustment. But does logistic regression actually respect what randomization gives us?</p>
<p>Freedman (2008) argues that it does not. Randomization justifies design-based estimators. Logistic regression introduces additional modeling assumptions that randomization does not validate. When those assumptions fail, the regression coefficient on treatment need not estimate the causal quantity of interest—even in large samples.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let there be <img src="https://latex.codecogs.com/png.latex?n"> subjects indexed by <img src="https://latex.codecogs.com/png.latex?i%20=%201,%20%5Cdots,%20n">. Each subject has:</p>
<ul>
<li>Treatment assignment <img src="https://latex.codecogs.com/png.latex?X_i%20%5Cin%20%5C%7B0,1%5C%7D"></li>
<li>Binary outcome <img src="https://latex.codecogs.com/png.latex?Y_i%20%5Cin%20%5C%7B0,1%5C%7D"></li>
<li>Covariates <img src="https://latex.codecogs.com/png.latex?Z_i"></li>
</ul>
<p>Each unit has two potential outcomes: <img src="https://latex.codecogs.com/png.latex?Y_i%5ET"> and <img src="https://latex.codecogs.com/png.latex?Y_i%5EC">. Define the finite-population averages</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_T%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20Y_i%5ET,%0A%5Cquad%0A%5Calpha_C%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20Y_i%5EC.%0A"></p>
<p>The causal contrast of interest is the <em>difference in log-odds</em>:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5CDelta%0A=%20%5Clog%5Cleft(%5Cfrac%7B%5Calpha_T%7D%7B1%20-%20%5Calpha_T%7D%5Cright)%0A-%20%5Clog%5Cleft(%5Cfrac%7B%5Calpha_C%7D%7B1%20-%20%5Calpha_C%7D%5Cright).%0A"></p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="what-randomization-identifies" class="level3">
<h3 class="anchored" data-anchor-id="what-randomization-identifies">What Randomization Identifies</h3>
<p>Because treatment is randomized, the sample analogues</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Calpha%7D_T%20=%20%5Cfrac%7B1%7D%7Bn_T%7D%5Csum_%7Bi%5Cin%20T%7D%20Y_i,%0A%5Cquad%0A%5Chat%7B%5Calpha%7D_C%20=%20%5Cfrac%7B1%7D%7Bn_C%7D%5Csum_%7Bi%5Cin%20C%7D%20Y_i%0A"></p>
<p>are unbiased for <img src="https://latex.codecogs.com/png.latex?%5Calpha_T"> and <img src="https://latex.codecogs.com/png.latex?%5Calpha_C">. The plug-in estimator</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5CDelta%7D%0A=%0A%5Clog%5Cleft(%5Cfrac%7B%5Chat%7B%5Calpha%7D_T%7D%7B1%20-%20%5Chat%7B%5Calpha%7D_T%7D%5Cright)%0A-%0A%5Clog%5Cleft(%5Cfrac%7B%5Chat%7B%5Calpha%7D_C%7D%7B1%20-%20%5Chat%7B%5Calpha%7D_C%7D%5Cright)%0A"></p>
<p>is therefore consistent and justified purely by the design.</p>
<p>No outcome model is required.</p>
</section>
<section id="what-logistic-regression-assumes" class="level3">
<h3 class="anchored" data-anchor-id="what-logistic-regression-assumes">What Logistic Regression Assumes</h3>
<p>A logistic regression specifies</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AP(Y_i%20=%201%20%5Cmid%20X_i,%20Z_i)%0A=%0A%5Cfrac%7B%5Cexp(%5Cbeta_1%20+%20%5Cbeta_2%20X_i%20+%20%5Cbeta_3%20Z_i)%7D%0A%7B1%20+%20%5Cexp(%5Cbeta_1%20+%20%5Cbeta_2%20X_i%20+%20%5Cbeta_3%20Z_i)%7D.%0A"></p>
<p>The coefficient <img src="https://latex.codecogs.com/png.latex?%5Cbeta_2"> is typically interpreted as the treatment effect. This interpretation relies on strong assumptions:</p>
<ul>
<li>The conditional log-odds is linear in <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">.</li>
<li>The functional form is correctly specified.</li>
<li>The model captures the true dependence of outcomes on covariates.</li>
</ul>
<p>Randomization does not validate any of these assumptions. It guarantees independence of treatment assignment—not correctness of the logit specification.</p>
<p>If the model is misspecified, the maximum likelihood estimator converges to a pseudo-true parameter: the value that best fits the assumed model, not necessarily the causal estimand <img src="https://latex.codecogs.com/png.latex?%5CDelta">.</p>
</section>
<section id="the-non-collapsibility-problem" class="level3">
<h3 class="anchored" data-anchor-id="the-non-collapsibility-problem">The Non-Collapsibility Problem</h3>
<p>There is a deeper issue. The logistic coefficient <img src="https://latex.codecogs.com/png.latex?%5Cbeta_2"> represents a <em>conditional</em> odds ratio. The estimand <img src="https://latex.codecogs.com/png.latex?%5CDelta"> is a <em>marginal</em> contrast. These quantities are generally not equal.</p>
<p>Odds ratios are non-collapsible: adding covariates changes the estimated coefficient even when there is no confounding. As a result, adjusting for <img src="https://latex.codecogs.com/png.latex?Z"> in a logit model can change the treatment coefficient even in a perfectly randomized experiment.</p>
<p>This is not bias from confounding. It is a structural property of the odds ratio. Thus, even with infinite data, <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_2"> need not converge to <img src="https://latex.codecogs.com/png.latex?%5CDelta">.</p>
</section>
<section id="a-safer-use-of-logistic-regression" class="level3">
<h3 class="anchored" data-anchor-id="a-safer-use-of-logistic-regression">A Safer Use of Logistic Regression</h3>
<p>If logistic regression is used, the coefficient itself should not be interpreted as the estimand. Instead, compute model-based plug-in predictions:</p>
<ol type="1">
<li>Fit the logistic model and obtain <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D">.</li>
<li>Predict probabilities under treatment and control for every unit: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7Bp%7D_i%5E%7B(T)%7D,%20%5Cquad%20%5Chat%7Bp%7D_i%5E%7B(C)%7D.%0A"></li>
<li>Average predicted probabilities: <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7B%5Calpha%7D_T%20=%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20%5Chat%7Bp%7D_i%5E%7B(T)%7D,%0A%5Cquad%0A%5Ctilde%7B%5Calpha%7D_C%20=%20%5Cfrac%7B1%7D%7Bn%7D%5Csum%20%5Chat%7Bp%7D_i%5E%7B(C)%7D.%0A"></li>
<li>Form <img src="https://latex.codecogs.com/png.latex?%0A%5Ctilde%7B%5CDelta%7D%0A=%0A%5Clog%5Cleft(%5Cfrac%7B%5Ctilde%7B%5Calpha%7D_T%7D%7B1-%5Ctilde%7B%5Calpha%7D_T%7D%5Cright)%0A-%0A%5Clog%5Cleft(%5Cfrac%7B%5Ctilde%7B%5Calpha%7D_C%7D%7B1-%5Ctilde%7B%5Calpha%7D_C%7D%5Cright).%0A"></li>
</ol>
<p>This estimator targets the correct marginal quantity. Even if the logit model is misspecified, it remains consistent under randomization. The coefficient <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_2"> does not share this guarantee.</p>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<p>We illustrate with a small randomized experiment. There are <img src="https://latex.codecogs.com/png.latex?n%20=%20200"> units; half are assigned to treatment (<img src="https://latex.codecogs.com/png.latex?X_i%20=%201">) and half to control (<img src="https://latex.codecogs.com/png.latex?X_i%20=%200">) by complete randomization. Each unit has a binary outcome <img src="https://latex.codecogs.com/png.latex?Y_i"> and a single covariate <img src="https://latex.codecogs.com/png.latex?Z_i">. We compute three quantities: the design-based plug-in estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5CDelta%7D">, the logistic regression coefficient <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_2"> on treatment, and the adjusted estimator <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5CDelta%7D"> that uses the fitted logit model to predict probabilities under treatment and control for every unit, then marginalizes and forms the log-odds contrast.</p>
<p>The code below generates data (with a true treatment effect on the log-odds scale), fits a logistic regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Z">, and reports <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5CDelta%7D">, <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_2">, and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5CDelta%7D">. In general these three numbers differ; <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5CDelta%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7B%5CDelta%7D"> target the marginal causal contrast, while <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cbeta%7D_2"> is a conditional parameter.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb1-3">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">each =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># complete randomization</span></span>
<span id="cb1-4">z <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True P(Y=1) depends on X and Z (logistic); treatment increases log-odds by 0.8</span></span>
<span id="cb1-7">beta_true <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># intercept, treatment, covariate</span></span>
<span id="cb1-8">eta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z</span>
<span id="cb1-9">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>eta))</span>
<span id="cb1-10">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbinom</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prob =</span> p)</span>
<span id="cb1-11"></span>
<span id="cb1-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Design-based plug-in: delta ---</span></span>
<span id="cb1-13">alpha_T_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb1-14">alpha_C_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb1-15">delta_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(alpha_T_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_T_hat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(alpha_C_hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_C_hat))</span>
<span id="cb1-16"></span>
<span id="cb1-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Logistic regression: beta_2 (coefficient on treatment) ---</span></span>
<span id="cb1-18">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> z, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> binomial)</span>
<span id="cb1-19">beta_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fit)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>]</span>
<span id="cb1-20"></span>
<span id="cb1-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Adjusted estimator: marginalize fitted probs, then log-odds contrast ---</span></span>
<span id="cb1-22">p_under_treat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> z), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>)</span>
<span id="cb1-23">p_under_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> z), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>)</span>
<span id="cb1-24">alpha_T_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(p_under_treat)</span>
<span id="cb1-25">alpha_C_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(p_under_control)</span>
<span id="cb1-26">delta_tilde <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(alpha_T_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_T_tilde)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(alpha_C_tilde <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_C_tilde))</span>
<span id="cb1-27"></span>
<span id="cb1-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Design-based delta_hat:  "</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(delta_hat, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-29"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Logistic coef (beta_2):  "</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(beta_2, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-30"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adjusted delta_tilde:    "</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(delta_tilde, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LogisticRegression</span>
<span id="cb2-3"></span>
<span id="cb2-4">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-5">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb2-6">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb2-7">np.random.shuffle(x)</span>
<span id="cb2-8">z <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb2-9"></span>
<span id="cb2-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># True P(Y=1) depends on X and Z (logistic); treatment increases log-odds by 0.8</span></span>
<span id="cb2-11">beta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>])   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># intercept, treatment, covariate</span></span>
<span id="cb2-12">eta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> beta_true[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z</span>
<span id="cb2-13">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.exp(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>eta))</span>
<span id="cb2-14">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.binomial(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, p, n)</span>
<span id="cb2-15"></span>
<span id="cb2-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Design-based plug-in: delta</span></span>
<span id="cb2-17">alpha_T_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y[x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].mean()</span>
<span id="cb2-18">alpha_C_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y[x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].mean()</span>
<span id="cb2-19">delta_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.log(alpha_T_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_T_hat)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> np.log(alpha_C_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_C_hat))</span>
<span id="cb2-20"></span>
<span id="cb2-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Logistic regression: beta_2 (coefficient on treatment)</span></span>
<span id="cb2-22">X_design <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.column_stack([np.ones(n), x, z])</span>
<span id="cb2-23">fit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LogisticRegression(C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e10</span>).fit(X_design, y)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># no penalty</span></span>
<span id="cb2-24">beta_2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fit.coef_[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb2-25"></span>
<span id="cb2-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adjusted estimator: marginalize fitted probs, then log-odds contrast</span></span>
<span id="cb2-27">p_under_treat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fit.predict_proba(np.column_stack([np.ones(n), np.ones(n), z]))[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb2-28">p_under_control <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fit.predict_proba(np.column_stack([np.ones(n), np.zeros(n), z]))[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb2-29">alpha_T_tilde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> p_under_treat.mean()</span>
<span id="cb2-30">alpha_C_tilde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> p_under_control.mean()</span>
<span id="cb2-31">delta_tilde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.log(alpha_T_tilde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_T_tilde)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> np.log(alpha_C_tilde <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> alpha_C_tilde))</span>
<span id="cb2-32"></span>
<span id="cb2-33"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Design-based delta_hat:  "</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(delta_hat, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb2-34"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Logistic coef (beta_2):  "</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(beta_2, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb2-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adjusted delta_tilde:    "</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(delta_tilde, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Randomization identifies causal effects without modeling.</li>
<li>Design-based estimators and plug-in approaches respect the randomized design. The logit coefficient does not.</li>
<li>Logistic regression introduces functional-form assumptions that randomization does not justify.</li>
<li>The treatment coefficient estimates a conditional odds ratio, not the marginal causal contrast defined by the experiment.</li>
<li>The logistic regression coefficient generally differs from the experimental estimand—even in large samples.</li>
</ul>
</section>
<section id="reference" class="level2">
<h2 class="anchored" data-anchor-id="reference">Reference</h2>
<p>Freedman, D. A. (2008). <em>Randomization Does Not Justify Logistic Regression</em>. Statistical Science, 23(2), 237–249. https://doi.org/10.1214/08-STS262</p>


</section>

 ]]></description>
  <category>causal inference</category>
  <category>parametric models</category>
  <guid>https://vyasenov.github.io/blog/logit-randomized-experiments.html</guid>
  <pubDate>Tue, 17 Feb 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>Randomization Inference: A Gentle Introduction</title>
  <link>https://vyasenov.github.io/blog/randomization-inference.html</link>
  <description><![CDATA[ 





<div class="reading-time">6 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Randomization inference offers a refreshing alternative to traditional parametric inference, providing exact control over Type I error rates without relying on large-sample approximations or strict distributional assumptions. Born out of Fisher’s famous tea-tasting experiment, the approach leverages the symmetry and structure induced by randomization itself to test hypotheses.</p>
<p>This blog post unpacks the theory and intuition behind randomization inference, drawing on the excellent review by Ritzwoller, Romano, and Shaikh (2025). I’ll cover the key ideas, notation, and algorithms involved, and also touch on modern applications like two-sample tests, regression, and conformal inference. Throughout, I’ll emphasize the practical considerations — when it works, why it works, and where caution is needed.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?W%20%5Cin%20%5C%7B0,1%5C%7D%5En"> denote the treatment assignment vector and <img src="https://latex.codecogs.com/png.latex?Y"> the observed outcomes. In potential-outcomes notation, each unit has <img src="https://latex.codecogs.com/png.latex?Y_i(1)"> and <img src="https://latex.codecogs.com/png.latex?Y_i(0)">, and we observe</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20W_i%20Y_i(1)%20+%20(1%20-%20W_i)%20Y_i(0).%0A"></p>
<p>The assignment mechanism is known. For example, under complete randomization with <img src="https://latex.codecogs.com/png.latex?n_T"> treated units, <img src="https://latex.codecogs.com/png.latex?W"> is uniformly distributed over all binary vectors with exactly <img src="https://latex.codecogs.com/png.latex?n_T"> ones.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?T(X)"> be a test statistic computed from the observed data <img src="https://latex.codecogs.com/png.latex?X%20=%20(Y,%20W)">.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?G"> denote the set of transformations consistent with the design (e.g., all treatment permutations preserving the treated count). Under a valid randomization hypothesis,</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AgX%20%5Coverset%7Bd%7D%7B=%7D%20X%20%5Cquad%20%5Ctext%7Bfor%20all%20%7D%20g%20%5Cin%20G.%0A"></p>
<p>This invariance is the engine of randomization inference.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="sharp-vs-regular-null-hypotheses" class="level3">
<h3 class="anchored" data-anchor-id="sharp-vs-regular-null-hypotheses">Sharp vs Regular Null Hypotheses</h3>
<p>The most important distinction in randomization inference is between <em>sharp</em> null hypotheses (which fully determine the unobserved potential outcomes) and <em>regular/weak</em> null hypotheses (which do not).</p>
<p>A <em>sharp null</em> specifies the treatment effect for every unit. The canonical example is Fisher’s <em>no-effect</em> null: <img src="https://latex.codecogs.com/png.latex?%0AH_0%5E%7B%5Ctext%7Bsharp%7D%7D:%5C;%20Y_i(1)%20=%20Y_i(0)%5Cquad%20%5Ctext%7Bfor%20all%20%7D%20i.%0A"> Under this null, the missing potential outcomes are <em>imputable</em> from the observed outcomes. That is what makes exact finite-sample randomization tests possible: for each candidate assignment <img src="https://latex.codecogs.com/png.latex?W'">, you can reconstruct the outcomes that would have been observed under <img src="https://latex.codecogs.com/png.latex?W'"> and recompute the test statistic.</p>
<p>A <em>regular/weak null</em> is something like “the average treatment effect is zero,” <img src="https://latex.codecogs.com/png.latex?%0AH_0%5E%7B%5Ctext%7Bweak%7D%7D:%5C;%20%5Cmathbb%7BE%7D%5BY(1)%20-%20Y(0)%5D%20=%200,%0A"> or a regression-style null about a parameter in a model. This null does <em>not</em> let you impute all missing potential outcomes, so the randomization distribution of a non-studentized statistic typically depends on nuisance features (e.g., heteroskedasticity). In that setting, exactness generally fails, and validity is recovered (when it is) by using statistics that are asymptotically pivotal, often via studentization.</p>
<p>The two different <img src="https://latex.codecogs.com/png.latex?p">-values are not comparable since they are based on different null hypotheses.</p>
</section>
<section id="exact-randomization-tests" class="level3">
<h3 class="anchored" data-anchor-id="exact-randomization-tests">Exact Randomization Tests</h3>
<p>If the randomization hypothesis holds, we can compute the distribution of <img src="https://latex.codecogs.com/png.latex?T(X)"> by applying all transformations in <img src="https://latex.codecogs.com/png.latex?G"> to the data. The <img src="https://latex.codecogs.com/png.latex?p">-value is simply the proportion of these transformed test statistics that are as extreme or more extreme than the observed <img src="https://latex.codecogs.com/png.latex?T(X)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7Bp%7D%20=%20%5Cfrac%7B1%7D%7B%7CG%7C%7D%20%5Csum_%7Bg%20%5Cin%20G%7D%20I%5C%7B%20T(gX)%20%5Cgeq%20T(X)%20%5C%7D.%0A"></p>
<p>Because the null implies invariance under <img src="https://latex.codecogs.com/png.latex?G">, this procedure achieves exact finite-sample control of the Type I error rate.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm: Randomization Test">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm: Randomization Test
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Choose a test statistic <img src="https://latex.codecogs.com/png.latex?T(X)">.</li>
<li>Define the group <img src="https://latex.codecogs.com/png.latex?G"> of transformations.</li>
<li>Compute <img src="https://latex.codecogs.com/png.latex?T(X)"> on the observed data.</li>
<li>Apply all (or a random sample of) transformations <img src="https://latex.codecogs.com/png.latex?g%20%5Cin%20G"> to the data and recompute <img src="https://latex.codecogs.com/png.latex?T(gX)">.</li>
<li>Calculate the <img src="https://latex.codecogs.com/png.latex?p">-value as the proportion of transformed statistics as or more extreme than <img src="https://latex.codecogs.com/png.latex?T(X)">.</li>
</ol>
</div>
</div>
<p>Because the null implies invariance under <img src="https://latex.codecogs.com/png.latex?G">, this test controls Type I error exactly in finite samples.</p>
<p>In practice, <img src="https://latex.codecogs.com/png.latex?%7CG%7C"> can be large. Monte Carlo sampling of transformations provides an accurate approximation, with a simple +1 adjustment ensuring exactness under random sampling.</p>
</section>
<section id="when-exactness-fails" class="level3">
<h3 class="anchored" data-anchor-id="when-exactness-fails">When Exactness Fails</h3>
<p>Under weak nulls, permutation tests are no longer automatically valid. The permutation distribution of a statistic may not match its true sampling distribution.</p>
<p>The difference in means illustrates the issue. If treatment and control variances differ, the raw difference in means can severely over-reject under permutation. The statistic is not pivotal.</p>
<p>Studentization resolves the problem. Scaling by an estimated standard error produces an asymptotically pivotal statistic whose limiting null distribution does not depend on nuisance parameters. Rank-based procedures (e.g., Wilcoxon–Mann–Whitney) achieve a similar goal.</p>
<p>The general principle is simple: asymptotic validity requires asymptotic pivotality.</p>
</section>
<section id="strengths-and-limitations" class="level3">
<h3 class="anchored" data-anchor-id="strengths-and-limitations">Strengths and Limitations</h3>
<p>Randomization inference is particularly powerful when the randomization scheme is known and controlled, as in experiments, when the test statistic is chosen to be pivotal, and when exact finite-sample error control is important.</p>
<p>However, it becomes less effective when covariates are correlated with treatment assignment but not properly accounted for, or when the sample size is too small to approximate the randomization distribution reliably through subsampling.</p>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<p>We illustrate the procedure with a small randomized experiment. There are <img src="https://latex.codecogs.com/png.latex?n%20=%2020"> units; exactly <img src="https://latex.codecogs.com/png.latex?n_%7B%5Ctext%7Btreat%7D%7D%20=%2010"> receive treatment under complete randomization, so <img src="https://latex.codecogs.com/png.latex?W"> is uniformly distributed over all binary vectors with ten ones. Each unit has potential outcomes <img src="https://latex.codecogs.com/png.latex?Y_i(0)"> and <img src="https://latex.codecogs.com/png.latex?Y_i(1)%20=%20Y_i(0)%20+%20%5Ctau"> with a constant effect <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%201">, and we observe <img src="https://latex.codecogs.com/png.latex?Y_i%20=%20W_i%20Y_i(1)%20+%20(1%20-%20W_i)%20Y_i(0)">. The test statistic is the difference in means, <img src="https://latex.codecogs.com/png.latex?T%20=%20%5Cbar%7BY%7D_1%20-%20%5Cbar%7BY%7D_0">.</p>
<p>We test Fisher’s sharp null of no effect: <img src="https://latex.codecogs.com/png.latex?Y_i(1)%20=%20Y_i(0)"> for all <img src="https://latex.codecogs.com/png.latex?i">. Under this null, the observed <img src="https://latex.codecogs.com/png.latex?Y"> would be the same under any assignment, so we can build the randomization distribution by repeatedly permuting <img src="https://latex.codecogs.com/png.latex?W"> (keeping the number of treated units fixed), recomputing <img src="https://latex.codecogs.com/png.latex?T"> for each permuted assignment, and then computing the proportion of those values that are as or more extreme than the observed <img src="https://latex.codecogs.com/png.latex?T">. That proportion is the randomization <img src="https://latex.codecogs.com/png.latex?p">-value.</p>
<p>The code below does exactly that, using <img src="https://latex.codecogs.com/png.latex?B%20=%205000"> random permutations and a standard +1 adjustment for the Monte Carlo <img src="https://latex.codecogs.com/png.latex?p">-value.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-2">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span></span>
<span id="cb1-3">n_treat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Potential outcomes (fixed) and assignment (random)</span></span>
<span id="cb1-6">y0 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-7">tau <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span></span>
<span id="cb1-8">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n_treat), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n_treat)))</span>
<span id="cb1-9">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> y0 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> tau <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> w</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Test statistic: difference in means</span></span>
<span id="cb1-12">t_obs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[w <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[w <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Randomization distribution under Fisher sharp null of no effect:</span></span>
<span id="cb1-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># under H0, y(1)=y(0), so observed outcomes are invariant to assignment.</span></span>
<span id="cb1-16">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span></span>
<span id="cb1-17">t_perm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(b, {</span>
<span id="cb1-18">  w_perm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(w)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># preserves treated count (complete randomization)</span></span>
<span id="cb1-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[w_perm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[w_perm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb1-20">})</span>
<span id="cb1-21"></span>
<span id="cb1-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Two-sided $p$-value with a +1 adjustment (Monte Carlo exactness)</span></span>
<span id="cb1-23">p_value <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(t_perm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(t_obs))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (b <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-24">p_value</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"></span>
<span id="cb2-3">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-4">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span></span>
<span id="cb2-5">n_treat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb2-6"></span>
<span id="cb2-7">y0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb2-8">tau <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span></span>
<span id="cb2-9">w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n_treat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n_treat))</span>
<span id="cb2-10">w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.permutation(w)</span>
<span id="cb2-11">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y0 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> tau <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> w</span>
<span id="cb2-12"></span>
<span id="cb2-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> diff_in_means(y, w):</span>
<span id="cb2-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> y[w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].mean() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> y[w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].mean()</span>
<span id="cb2-15"></span>
<span id="cb2-16">t_obs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> diff_in_means(y, w)</span>
<span id="cb2-17"></span>
<span id="cb2-18">b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5000</span></span>
<span id="cb2-19">t_perm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.empty(b)</span>
<span id="cb2-20"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(b):</span>
<span id="cb2-21">    w_perm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.permutation(w)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># preserves treated count</span></span>
<span id="cb2-22">    t_perm[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> diff_in_means(y, w_perm)</span>
<span id="cb2-23"></span>
<span id="cb2-24">p_value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(t_perm) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(t_obs))) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb2-25"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(p_value)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Randomization inference provides exact finite-sample error control when the randomization hypothesis holds.</li>
<li>Asymptotic validity can often be rescued by choosing asymptotically pivotal (studentized) test statistics.</li>
<li>Without studentization, permutation tests may fail badly in the presence of unequal variances.</li>
<li>Randomization tests are flexible and nonparametric, making them attractive for experimental data and beyond.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>The best starting point is the recent review by Ritzwoller, Romano, and Shaikh (2025). For foundational treatments on nonparametric inference, the go-to is Lehmann &amp; Romano’s two-volume door-stopper <em>Testing Statistical Hypotheses</em>. The practical guide by Good (2005) on permutation tests is also highly recommended.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p>Ritzwoller, D. M., Romano, J. P., &amp; Shaikh, A. M. (2025). Randomization Inference: Theory and Applications.</p></li>
<li><p>Lehmann, E. L., &amp; Romano, J. P. (2022). <em>Testing Statistical Hypotheses</em>. Springer.</p></li>
<li><p>Good, P. (2005). <em>Permutation, Parametric, and Bootstrap Tests of Hypotheses</em>. Springer.</p></li>
</ul>


</section>

 ]]></description>
  <category>causal inference</category>
  <category>statistical inference</category>
  <guid>https://vyasenov.github.io/blog/randomization-inference.html</guid>
  <pubDate>Thu, 12 Feb 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>Generalized Additive Models: What You Need to Know</title>
  <link>https://vyasenov.github.io/blog/generalized-additive-models.html</link>
  <description><![CDATA[ 





<div class="reading-time">6 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Generalized Additive Models (GAMs) are one of the most powerful and flexible tools in a data scientist’s toolbox for modeling complex, nonlinear relationships between covariates and an outcome. They generalize linear models by allowing smooth, nonparametric functions of the predictors while still maintaining interpretability and manageable computation. The core idea is simple: instead of forcing relationships to be straight lines, let the data speak for itself.</p>
<p>This article explains what you really need to know about GAMs, following the excellent review by Simon Wood (2025). I’ll go over the basics of how GAMs work, how smoothness is controlled, the computational strategies involved, and key pitfalls to watch out for. I’ll also walk through a code example in both <code>R</code> and <code>Python</code> to show how to fit and interpret these models in practice.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Consider an outcome variable <img src="https://latex.codecogs.com/png.latex?y"> and predictors <img src="https://latex.codecogs.com/png.latex?x_1,%20x_2,%20%5Cdots,%20x_p">. The simplest linear model is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay%20=%20%5Cbeta_0%20+%20%5Csum_%7Bj=1%7D%5Ep%20%5Cbeta_j%20x_j%20+%20%5Cvarepsilon.%0A"></p>
<p>The GAM replaces the linear terms <img src="https://latex.codecogs.com/png.latex?%5Cbeta_j%20x_j"> with smooth functions <img src="https://latex.codecogs.com/png.latex?f_j(x_j)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay%20=%20%5Cbeta_0%20+%20%5Csum_%7Bj=1%7D%5Ep%20f_j(x_j)%20+%20%5Cvarepsilon.%0A"></p>
<p>More generally, for non-Gaussian outcomes, GAMs use a link function <img src="https://latex.codecogs.com/png.latex?g(%5Ccdot)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ag(%5Cmathbb%7BE%7D%5By%5D)%20=%20%5Cbeta_0%20+%20%5Csum_%7Bj=1%7D%5Ep%20f_j(x_j).%0A"></p>
<p>Each <img src="https://latex.codecogs.com/png.latex?f_j"> is estimated from the data and constrained to be “smooth” through penalization.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="what-makes-a-gam" class="level3">
<h3 class="anchored" data-anchor-id="what-makes-a-gam">What Makes a GAM?</h3>
<p>The backbone of a GAM is its smooth terms. These are typically represented using splines — basis functions that piece together polynomials smoothly at specified knots. But not just any spline will do! In GAMs, smoothness is enforced through penalty terms that discourage excessive wiggliness.</p>
<p>For example, for a cubic spline, the penalty is usually the integral of the squared second derivative:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cint%20(f''(x))%5E2%20%5C,%20dx.%0A"></p>
<p>In coefficient form, estimation solves</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7Bf_1,%5Cdots,f_p%7D%20%5Cleft%5C%7B%20%5Csum_%7Bi=1%7D%5En%20%5Cleft(y_i%20-%20%5Cbeta_0%20-%20%5Csum_%7Bj=1%7D%5Ep%20f_j(x_%7Bij%7D)%5Cright)%5E2%20+%20%5Csum_%7Bj=1%7D%5Ep%20%5Clambda_j%20%5Cint%20%5Cleft(f_j''(x)%5Cright)%5E2%20%5C,%20dx%20%5Cright%5C%7D,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Clambda_j"> is the smoothing parameter.</p>
<p>Everything in a GAM flows from this penalized least-squares (or penalized likelihood) objective. The balance between fitting the data and keeping the function smooth is controlled by smoothing parameters (<img src="https://latex.codecogs.com/png.latex?%5Clambda">). This is regularization: in particular, the standard spline roughness penalties are quadratic (ridge-like). A higher <img src="https://latex.codecogs.com/png.latex?%5Clambda"> makes the function flatter; a lower <img src="https://latex.codecogs.com/png.latex?%5Clambda"> allows more flexibility.</p>
</section>
<section id="how-smoothness-is-estimated" class="level3">
<h3 class="anchored" data-anchor-id="how-smoothness-is-estimated">How Smoothness Is Estimated</h3>
<p>Model selection in GAMs involves three related but distinct questions:</p>
<ul>
<li>How smooth should each function be? (smoothing parameter selection, <img src="https://latex.codecogs.com/png.latex?%5Clambda">)</li>
<li>How flexible is the basis? (choice of basis dimension <img src="https://latex.codecogs.com/png.latex?k">)</li>
<li>Which smooth terms should be included at all? (term selection, <img src="https://latex.codecogs.com/png.latex?f_j">)</li>
</ul>
<p>The basis dimension <img src="https://latex.codecogs.com/png.latex?k"> controls the maximum possible flexibility (how rich the spline basis is), while the smoothing parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> controls how much of that flexibility is actually used. Intuitively, <img src="https://latex.codecogs.com/png.latex?k"> sets the size of the function space you search over; <img src="https://latex.codecogs.com/png.latex?%5Clambda"> determines the effective degrees of freedom (wiggliness) within that space. In practice, you choose <img src="https://latex.codecogs.com/png.latex?k"> “large enough” and let <img src="https://latex.codecogs.com/png.latex?%5Clambda"> do the regularization; if <img src="https://latex.codecogs.com/png.latex?k"> is too small, the smooth can be forced to underfit no matter how you tune <img src="https://latex.codecogs.com/png.latex?%5Clambda">.</p>
<p>There are two main strategies to estimate <img src="https://latex.codecogs.com/png.latex?%5Clambda">:</p>
<ol type="1">
<li><strong>Cross-validation (CV)</strong>: Minimize prediction error by holding out parts of the data. You are familiar with this from traditional machine learning models.</li>
<li><strong>Marginal likelihood (REML)</strong>: An empirical Bayes approach that tends to perform well in practice.</li>
</ol>
<p>The marginal likelihood approach treats smooth coefficients as random effects with Gaussian priors (a mixed-model representation), and often yields better-behaved uncertainty quantification than ad hoc tuning.</p>
<p>Similarly, there are two common tools for model selection. The well-known <em>Akaike Information Criterion (AIC)</em> controls the trade-off between goodness of fit and model complexity. Alternatively, one can employ <em>hypothesis testing</em> to check whether each <img src="https://latex.codecogs.com/png.latex?f_j"> is significantly different from zero.</p>
<p>With <img src="https://latex.codecogs.com/png.latex?%5Clambda">, <img src="https://latex.codecogs.com/png.latex?k">, and <img src="https://latex.codecogs.com/png.latex?f_j"> selected, we can fit the GAM and make predictions. Let’s shift the focus to a few more nuanced, but important, topics.</p>
</section>
<section id="why-rank-reduction-matters" class="level3">
<h3 class="anchored" data-anchor-id="why-rank-reduction-matters">Why Rank Reduction Matters</h3>
<p>Full spline bases can be large and computationally expensive. To address this, GAMs often use <em>low-rank spline bases</em> (e.g., thin plate regression splines): you represent each smooth with a modest number of basis functions (controlled by <img src="https://latex.codecogs.com/png.latex?k">), rather than using a very large “full” basis. This keeps computation tractable while retaining most of the flexibility practitioners want. Consequently, GAM fitting scales better to larger datasets while preserving interpretability.</p>
</section>
<section id="beyond-the-mean" class="level3">
<h3 class="anchored" data-anchor-id="beyond-the-mean">Beyond the Mean</h3>
<p>GAMs aren’t limited to modeling the mean and naturally extend to modeling other aspects of the distribution. They can handle <em>location, scale, and shape</em> modeling — meaning that the variance, skewness, or other distributional parameters can also depend on smooth functions of predictors. This generalization brings GAMs into the world of generalized additive models for location, scale, and shape (GAMLSS).</p>
<p>They can even be extended to quantile regression and non-exponential family distributions, making them incredibly versatile. However, while GAMs allow flexible modeling of conditional expectations, they do not by themselves address common thorny issues such as endogeneity, causal identification, or selection bias. They simply allow for more depth in modeling the relationship between the outcome and the covariates and thus should be utilized in the context of machine learning/prediction.</p>
</section>
<section id="hypothesis-testing" class="level3">
<h3 class="anchored" data-anchor-id="hypothesis-testing">Hypothesis Testing</h3>
<p>Testing whether a smooth term is zero corresponds to testing whether its associated function is identically zero. Because smooth terms are penalized, the effective degrees of freedom are estimated from the data, and the resulting test statistics rely on large-sample approximations. The reported <img src="https://latex.codecogs.com/png.latex?p">-values are therefore approximate and should be interpreted as heuristic diagnostics rather than exact finite-sample guarantees.</p>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(mgcv)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-3">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb1-4">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb1-5">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sin</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb1-6">model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gam</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">s</span>(x), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"REML"</span>)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model)</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(model, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">residuals =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> statsmodels.api <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> sm</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> statsmodels.gam.api <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GLMGam, BSplines</span>
<span id="cb2-5"></span>
<span id="cb2-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-7">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span></span>
<span id="cb2-8">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, n)</span>
<span id="cb2-9">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.sin(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.normal(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, n)</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Build a cubic B-spline basis for x</span></span>
<span id="cb2-12">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x[:, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb2-13">bs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BSplines(X, df<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>], degree<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>], knot_kwds<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lower_bound"</span>: x.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"upper_bound"</span>: x.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>()}])</span>
<span id="cb2-14"></span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gaussian GAM (identity link) via the GLM-GAM interface</span></span>
<span id="cb2-16">exog <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.ones((n, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># intercept only</span></span>
<span id="cb2-17">gam <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GLMGam(y, smoother<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bs, exog<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>exog).fit()</span>
<span id="cb2-18"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(gam.summary())</span>
<span id="cb2-19"></span>
<span id="cb2-20">plt.figure()</span>
<span id="cb2-21">XX <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.linspace(x.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(), x.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)[:, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb2-22">exog_pred <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.ones((<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(XX), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb2-23">plt.plot(XX[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], gam.predict(exog<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>exog_pred, exog_smooth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>XX), label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GAM fit"</span>)</span>
<span id="cb2-24">plt.scatter(x, y, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb2-25">plt.legend()</span>
<span id="cb2-26">plt.show()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>GAMs allow flexible, nonlinear modeling while retaining interpretability.</li>
<li>Smoothness is controlled by penalties, estimated via CV or marginal likelihood (REML).</li>
<li>Rank reduction makes GAMs computationally feasible even with large datasets.</li>
<li>GAMs generalize beyond means to scale, shape, and quantile modeling.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>The recent review by Simon Wood (2025) is the most comprehensive and readable guide to modern GAMs. For practical hands-on work, Wood’s book <em>Generalized Additive Models: An Introduction with R</em> (2017) remains the go-to resource. See also Hastie (2017). For Bayesian extensions check Rue et al.&nbsp;(2009).</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p>Hastie, T. J. (2017). Generalized additive models. Statistical models in S, 249-307.</p></li>
<li><p>Rue, H., Martino, S., &amp; Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. <em>Journal of the Royal Statistical Society Series B: Statistical Methodology</em>, 71(2), 319-392.</p></li>
<li><p>Wood, S. N. (2025). Generalized Additive Models. <em>Annual Review of Statistics and Its Application</em>, 12, 497–526.</p></li>
<li><p>Wood, S. N. (2017). <em>Generalized Additive Models: An Introduction with R</em>. CRC Press.</p></li>
</ul>


</section>

 ]]></description>
  <category>semiparametric models</category>
  <category>machine learning</category>
  <guid>https://vyasenov.github.io/blog/generalized-additive-models.html</guid>
  <pubDate>Thu, 12 Feb 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>Understanding Correlated Random Effects Models</title>
  <link>https://vyasenov.github.io/blog/correlated-random-effects.html</link>
  <description><![CDATA[ 





<div class="reading-time">8 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>For decades, panel data analysis has largely revolved around a familiar dichotomy: fixed effects (FE) versus random effects (RE). More recently, generalized fixed effects and difference-in-differences designs have surged in popularity, particularly in causal inference. Yet between FE and RE lies a more general and conceptually illuminating framework: the correlated random effects (CRE) model. Although it receives less attention today, CRE remains a powerful tool for understanding the foundations of panel data methods.</p>
<p>Fixed effects models eliminate all time-invariant unobserved heterogeneity but sacrifice the ability to estimate the effects of time-invariant covariates. Random effects models, by contrast, retain those variables but rely on a strong assumption: that the unobserved individual-specific effects are uncorrelated with the regressors. When this assumption fails—as it often does—the RE estimator becomes biased. The correlated random effects (CRE), also known as the hybrid model, relaxes this assumption by explicitly modeling the potential correlation.</p>
<p>In this article, I examine the intuition behind the CRE model, explain how it bridges FE and RE, and show how it decomposes within- and between-unit variation. I conclude with a hands-on implementation in both <code>R</code> and <code>Python</code> to demonstrate how the model works in practice. The focus is on the linear versions of these models, and extending these ideas to nonlinear models is not always straightforward.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let us consider a standard panel data setup where we observe units <img src="https://latex.codecogs.com/png.latex?i=1,%5Cdots,N"> over time periods <img src="https://latex.codecogs.com/png.latex?t%20=%201,%20%5Cdots,%20T">. The outcome is <img src="https://latex.codecogs.com/png.latex?y_%7Bit%7D">, and <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D"> is a vector of time-varying covariates.</p>
<p>The linear panel data model is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bit%7D%20=%20x_%7Bit%7D'%5Cbeta%20+%20%5Calpha_i%20+%20%5Cvarepsilon_%7Bit%7D%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> is the individual-specific effect and <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon_%7Bit%7D"> is the idiosyncratic error term. Our goal is to consistently estimate the causal effect of time-varying regressors (a component of <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D">) when unobserved heterogeneity may be correlated with them.</p>
<p>The core differences between FE and RE models lie in the way they handle <img src="https://latex.codecogs.com/png.latex?%5Calpha_i">, and the assumptions they make about the relationship between <img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> and <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D">.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="refresher-on-fixed-and-random-effects" class="level3">
<h3 class="anchored" data-anchor-id="refresher-on-fixed-and-random-effects">Refresher on Fixed and Random Effects</h3>
<p>In panel data models, the goal is often to account for unobserved heterogeneity across units (e.g., individuals, firms, regions). Two popular approaches to handle this are <em>fixed effects (FE)</em> and <em>random effects (RE)</em> models. Understanding these two approaches is critical before we dive into correlated random effects.</p>
<section id="fixed-effects-fe-model" class="level4">
<h4 class="anchored" data-anchor-id="fixed-effects-fe-model">Fixed Effects (FE) Model</h4>
<p>The fixed effects model controls for all time-invariant characteristics of the units by allowing each unit to have its own intercept. The key feature of FE models is that <em><img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> is treated as a set of unknown parameters to be estimated (or differenced out)</em>. Importantly, <img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> is allowed to be correlated with the regressors <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D"> (i.e., <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCov%7D(x_%7Bit%7D,%20%5Calpha_i)%20%5Cneq%200">). This addresses endogeneity driven by time-invariant omitted variables, but it does not, by itself, resolve endogeneity arising from time-varying confounding, simultaneity, or reverse causality (which lives in <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon_%7Bit%7D">).</p>
<p>Fixed effects estimation often proceeds by <em>demeaning</em> the data within each unit (also known as the “within transformation”), removing <img src="https://latex.codecogs.com/png.latex?%5Calpha_i">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bit%7D%20-%20%5Cbar%7By%7D_i%20=%20(x_%7Bit%7D%20-%20%5Cbar%7Bx%7D_i)'%5Cbeta%20+%20(%5Cvarepsilon_%7Bit%7D%20-%20%5Cbar%7B%5Cvarepsilon%7D_i),%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cbar%7By%7D_i"> and <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bx%7D_i"> are the within-unit means. This is convenient but comes at the cost of not estimating the time-invariant effects of the covariates, which can be of interest in many applications. Even if one attempts to consistently estimate the <img src="https://latex.codecogs.com/png.latex?%5Calpha_i">’s parameters, this is usually not feasible due to the relative short panels typically used in empirical work.</p>
<p>Fixed effects are especially popular in causal inference because they remove bias from any time-invariant omitted variables. They can be seen as a generalization of the familiar <em>difference-in-differences (DiD)</em> approach, which is just a special case of FE with two time periods and a treatment indicator. They can also fairly easily be extended to triple difference designs, staggered adoption designs, and other more complex causal inference settings.</p>
<p>An example would be an analysis of state-level minimum wage changes on employment outcomes. Different states adopted minimum wage changes at different times, so a simple difference-in-differences analysis would be inappropriate. However, a fixed effects model can be used to estimate the effect of the minimum wage on employment outcomes, holding constant the state-specific time-invariant characteristics (e.g., state-level demographics, permanent economic conditions, policy environment, etc.).</p>
</section>
<section id="random-effects-re-model" class="level4">
<h4 class="anchored" data-anchor-id="random-effects-re-model">Random Effects (RE) Model</h4>
<p>In the RE model, <em><img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> is treated as a random variable</em> drawn from a distribution (usually assumed to be normal):</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_i%20%5Csim%20N(0,%20%5Csigma_%5Calpha%5E2).%0A"></p>
<p>The crucial assumption in RE models is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BCov%7D(x_%7Bit%7D,%20%5Calpha_i)%20=%200.%0A"></p>
<p>Equivalently, RE assumes <img src="https://latex.codecogs.com/png.latex?E%5B%5Calpha_i%20%5Cmid%20X_i%5D%20=%200">, where <img src="https://latex.codecogs.com/png.latex?X_i%20=%20(x_%7Bi1%7D,%20%5Cdots,%20x_%7BiT%7D)">. This allows for more efficient estimation through Generalized Least Squares (GLS), but if the assumption fails, the RE estimates will be biased and inconsistent. The RE model is not commonly used in causal inference because, unlike the FE model, it rules out correlation between covariates and time-invariant unobserved heterogeneity. In short, the FE model is robust but discards between-unit variation, while the RE model is more efficient but relies on a strong independence assumption between covariates and unobserved heterogeneity. The Hausman test evaluates whether the additional orthogonality restrictions imposed by the random effects model are supported by the data.</p>
</section>
</section>
<section id="correlated-random-effects-cre-model" class="level3">
<h3 class="anchored" data-anchor-id="correlated-random-effects-cre-model">Correlated Random Effects (CRE) Model</h3>
<section id="intuition" class="level4">
<h4 class="anchored" data-anchor-id="intuition">Intuition</h4>
<p>The <em>correlated random effects (CRE)</em> model differs from standard fixed and random effects by explicitly modeling the correlation between the unit-specific effects <img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> and the covariates <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D">. Instead of assuming independence (as in RE) or differencing out the effects entirely (as in FE), CRE includes the <em>unit-level means of the covariates</em> as additional regressors, allowing for consistent estimation while still retaining the ability to estimate time-invariant variables.</p>
<p>The correlated random effects (CRE) model offers a middle ground between FE and RE approaches. Traditional RE models assume that unobserved heterogeneity is uncorrelated with covariates. FE models remove all unit-level heterogeneity but cannot estimate time-invariant covariates. CRE models address these limitations by including group means of time-varying covariates, decomposing variation into within and between components. Instead of pretending the individual effect is unrelated to observed covariates, we model exactly how it is related — through the individual’s average covariate values.</p>
</section>
<section id="estimation-and-inference" class="level4">
<h4 class="anchored" data-anchor-id="estimation-and-inference">Estimation and Inference</h4>
<p>One way to motivate CRE (Mundlak) is to model the conditional mean of the unit effect as a function of unit-level covariate averages. In the linear case, write:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Calpha_i%20=%20a%20+%20%5Cgamma%20%5Cbar%7Bx%7D_i%20+%20u_i,%20%5Cqquad%20E%5Bu_i%20%5Cmid%20X_i%5D%20=%200,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cbar%7Bx%7D_i"> is the individual mean of <img src="https://latex.codecogs.com/png.latex?x_%7Bit%7D">. Substituting into the outcome equation yields:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ay_%7Bit%7D%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20x_%7Bit%7D%20+%20%5Cgamma%20%5Cbar%7Bx%7D_i%20+%20u_i%20+%20%5Cvarepsilon_%7Bit%7D,%0A"></p>
<p>In practice, you include the unit means for each time-varying regressor (and for any transformations/interactions you want the CRE adjustment to apply to). Estimation uses RE-style methods on this augmented specification; the mean terms absorb the part of <img src="https://latex.codecogs.com/png.latex?%5Calpha_i"> that is correlated with <img src="https://latex.codecogs.com/png.latex?X_i">, leaving <img src="https://latex.codecogs.com/png.latex?u_i"> orthogonal. This also makes it easy to compare within and between effects (for a scalar <img src="https://latex.codecogs.com/png.latex?x">, the between effect is <img src="https://latex.codecogs.com/png.latex?%5Cbeta_1%20+%20%5Cgamma">).</p>
</section>
<section id="advantages-and-challenges" class="level4">
<h4 class="anchored" data-anchor-id="advantages-and-challenges">Advantages and Challenges</h4>
<p>The CRE model offers several advantages. It allows estimation of time-invariant variables, decomposes effects into within- and between-unit components, improves efficiency under relaxed assumptions, and provides a diagnostic check on the plausibility of random effects assumptions.</p>
<p>It is well suited for repeated measures data where both time-varying and time-invariant predictors matter, especially when there is potential endogeneity between covariates and individual effects. Typical applications include policy evaluation, health research, and education studies.</p>
<p>However, CRE models still rely on the random intercept assumption, do not address endogeneity driven by time-varying unobservables (e.g., simultaneity or reverse causality), require care with interaction terms, and may produce biased estimates when the number of clusters is small.</p>
</section>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plm)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-5">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span></span>
<span id="cb1-6">t <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb1-7">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb1-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">each =</span> t),</span>
<span id="cb1-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>t, n)</span>
<span id="cb1-10">)</span>
<span id="cb1-11">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb1-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> z),</span>
<span id="cb1-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb1-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">eps =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb1-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> alpha <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> eps</span>
<span id="cb1-19">  )</span>
<span id="cb1-20"></span>
<span id="cb1-21">pdata <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdata.frame</span>(data, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">index =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"id"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"time"</span>))</span>
<span id="cb1-22"></span>
<span id="cb1-23">fe_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> pdata, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"within"</span>)</span>
<span id="cb1-24">re_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> pdata, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"random"</span>)</span>
<span id="cb1-25">pdata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ave</span>(pdata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x, pdata<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FUN =</span> mean)</span>
<span id="cb1-26">cre_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mean_x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> pdata, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"random"</span>)</span>
<span id="cb1-27"></span>
<span id="cb1-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(fe_model)</span>
<span id="cb1-29"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(re_model)</span>
<span id="cb1-30"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(cre_model)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> statsmodels.formula.api <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> smf</span>
<span id="cb2-4"></span>
<span id="cb2-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-6">n, t <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb2-7">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb2-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'id'</span>: np.repeat(np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), t),</span>
<span id="cb2-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'time'</span>: np.tile(np.arange(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, t<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), n)</span>
<span id="cb2-10">})</span>
<span id="cb2-11"></span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Induce correlation between x_it and alpha_i via an id-level latent z_i</span></span>
<span id="cb2-13">z <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n)</span>
<span id="cb2-14">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'z'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(z, t)</span>
<span id="cb2-15">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'z'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>t)</span>
<span id="cb2-16">alpha <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> z <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.randn(n)</span>
<span id="cb2-17">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'alpha'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.repeat(alpha, t)</span>
<span id="cb2-18"></span>
<span id="cb2-19">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'eps'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randn(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>t)</span>
<span id="cb2-20">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'alpha'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'eps'</span>]</span>
<span id="cb2-21">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean_x'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df.groupby(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'id'</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'x'</span>].transform(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mean'</span>)</span>
<span id="cb2-22"></span>
<span id="cb2-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Random-intercept RE model</span></span>
<span id="cb2-24">model_re <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> smf.mixedlm(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y ~ x"</span>, df, groups<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"id"</span>]).fit(reml<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb2-25"></span>
<span id="cb2-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># CRE (Mundlak) model: RE with unit means included</span></span>
<span id="cb2-27">model_cre <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> smf.mixedlm(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y ~ x + mean_x"</span>, df, groups<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"id"</span>]).fit(reml<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb2-28"></span>
<span id="cb2-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(model_re.summary())</span>
<span id="cb2-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(model_cre.summary())</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>CRE models relax the strict RE assumptions by modeling the correlation between unit effects and covariates.</li>
<li>They provide within and between estimates while allowing time-invariant variables.</li>
<li>Appropriate for longitudinal, multilevel, and policy evaluation studies.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>“Microeconometrics: Methods and Applications” by one of my PhD advisors, Colin Cameron, and his long-time coauthor Trivedi, is a classic textbook on panel data models with which I have spent countless hours. It’s a great starting point for most of the material in my blog. Schunck (2013) provides a comprehensive overview of CRE models. Mundlak’s foundational work is essential for understanding the theoretical basis. Tools like <code>R</code>’s <code>plm</code> and <code>Python</code>’s <code>statsmodels</code> can implement these models with the correct transformations.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p>Cameron, A. C., &amp; Trivedi, P. K. (2005). Microeconometrics: methods and applications. Cambridge university press.</p></li>
<li><p>Schunck, R. (2013). Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models. <em>The Stata Journal</em>, 13(1), 65-76.</p></li>
<li><p>Mundlak, Y. (1978). On the pooling of time series and cross section data. <em>Econometrica</em>, 46(1), 69–85.</p></li>
</ul>


</section>

 ]]></description>
  <category>causal inference</category>
  <guid>https://vyasenov.github.io/blog/correlated-random-effects.html</guid>
  <pubDate>Wed, 11 Feb 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Propensity Score Methods In Causal Inference</title>
  <link>https://vyasenov.github.io/blog/flavors-prop-score-methods.html</link>
  <description><![CDATA[ 





<div class="reading-time">10 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Introduced by Rosenbaum and Rubin in 1983, the propensity score, the probability of receiving treatment given observed covariates, has become the workhorse for handling confounding in observational studies.</p>
<p>But here’s the thing: the propensity score itself is just the starting point. It designates an entire class of statistical methods for treatment effect estimation. In practice, there are tons of ways to use propensity scores. You can match on them, stratify your sample, weight your observations, or plug them into doubly robust estimators that combine modeling of both the treatment and the outcome. You can tweak how you weight the units—downweighting those with extreme scores or focusing on the region where treated and control groups overlap.</p>
<p>In this post, I’ll explore the many flavors of propensity score methods. As always, the focus is on the intuition, the basic math, and practical considerations. Oh, there is also some <code>R</code> and <code>python</code> code.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>We’re operating in the familiar causal inference setup:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?D_i%20%5Cin%20%5C%7B0,%201%5C%7D">: treatment indicator.</li>
<li><img src="https://latex.codecogs.com/png.latex?X_i">: observed covariates.</li>
<li><img src="https://latex.codecogs.com/png.latex?Y_i(1),%20Y_i(0)">: potential outcomes.</li>
</ul>
<p>We conveniently invoke the traditional identification assumptions – conditional ignorability, overlap and SUTVA. As a refresher, the propensity score is simply: <img src="https://latex.codecogs.com/png.latex?%0Ae(X_i)%20=%20%5Cmathbb%7BP%7D(D_i%20=%201%20%5Cmid%20X_i).%0A"></p>
<p>The key seminal result from Rosenbaum and Rubin (1983) states: <img src="https://latex.codecogs.com/png.latex?%0A(Y(1),%20Y(0))%20%5Cperp%20D%20%5Cmid%20e(X),%0A"></p>
<p>meaning that, conditional on the propensity score, treatment assignment is as good as random. The main implication of this theorem is dimensionality reduction – the propensity score alone is “enough” to adjust for bias between the treatment and control groups.</p>
<p>A crucial but often overlooked point is that different propensity score–based estimators target different causal estimands (e.g., the Average Treatment Effect (ATE), the Average Treatment Effect on the Treated (ATT), or effects defined on overlap populations) so choosing a method implicitly means choosing which population’s effect you want to estimate.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="propensity-score-estimation" class="level3">
<h3 class="anchored" data-anchor-id="propensity-score-estimation">Propensity Score Estimation</h3>
<p>First things first. Before we even begin to discuss propensity score methods, we need to estimate the propensity score itself. This is commonly done via logistic regression (probit has really gone out of fashion). In very, very rare cases the propensity score is known and this step can be skipped. Occasionally, machine learning methods can be employed as well, but one has to be careful there. The subtlety is that, contrary to a traditional machine learning setup, our goal here is not finding the best fit. This is where machine learning methods can mislead us. Instead, we are after controlling for in-sample bias between the treatment and control groups.</p>
<p>The following examples apply several popular propensity score methods to the <code>Iris</code> dataset using both <code>R</code> and <code>Python</code>. For demonstration, we define an artificial binary treatment (<code>D</code>) based on <code>Petal.Length</code>. The outcome variable is <code>Sepal.Length</code>, and the predictors are the remaining covariates.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load necessary libraries</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(MatchIt)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load iris dataset and create treatment variable</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb1-6">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Petal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit propensity score model using logistic regression</span></span>
<span id="cb1-9">ps_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, </span>
<span id="cb1-10">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, </span>
<span id="cb1-11">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">link =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"logit"</span>))</span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(ps_model)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Import necessary libraries</span></span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LogisticRegression</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load iris dataset and create treatment variable</span></span>
<span id="cb2-8">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).frame</span>
<span id="cb2-9">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal length (cm)'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Prepare features and fit propensity score model</span></span>
<span id="cb2-12">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]]</span>
<span id="cb2-13">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>]</span>
<span id="cb2-14">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LogisticRegression(max_iter<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span>
<span id="cb2-15">model.fit(X, y)</span>
<span id="cb2-16"></span>
<span id="cb2-17"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Intercept: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>intercept_[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">, Coef: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coef_<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>flatten()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="nearest-neighbor-matching" class="level3">
<h3 class="anchored" data-anchor-id="nearest-neighbor-matching">Nearest Neighbor Matching</h3>
<p><em>Target Estimand</em>: Typically ATT.</p>
<p>This is often the first method people try after estimating the propensity score. Once <img src="https://latex.codecogs.com/png.latex?e(X)"> is estimated, treated units are matched to control units with the closest propensity scores (nearest neighbor). You can match one-to-one, one-to-many, with or without replacement.</p>
<p>This class of methods tends to work well when the number of controls is large enough to find good matches for treated units. The approach is simple and intuitive, reducing high-dimensional matching to a single dimension. However, it’s worth noting that balance on the propensity score doesn’t guarantee balance on covariates, and the method can be sensitive to poor matches when suitable controls are scarce.</p>
<p>Lastly, inference after matching is subtle; standard errors must account for the matching procedure, and naïve bootstrap methods are generally invalid. Matching with replacement introduces some additional complexity since some data points are used more than once.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">matchit_nn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matchit</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nearest"</span>)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(matchit_nn)</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> causalinference <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> CausalModel</span>
<span id="cb4-2"></span>
<span id="cb4-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Prepare data for CausalModel</span></span>
<span id="cb4-4">Y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>].values</span>
<span id="cb4-5">T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>].values</span>
<span id="cb4-6">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]].values</span>
<span id="cb4-7"></span>
<span id="cb4-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit causal model with nearest neighbor matching</span></span>
<span id="cb4-9">cm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> CausalModel(Y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>Y, D<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>T, X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X)</span>
<span id="cb4-10">cm.est_propensity_s()</span>
<span id="cb4-11">cm.est_via_matching(bias_adj<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb4-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(cm.estimates)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="caliper-matching" class="level3">
<h3 class="anchored" data-anchor-id="caliper-matching">Caliper Matching</h3>
<p><em>Target Estimand</em>: Typically ATT.</p>
<p>Caliper matching adds a threshold: only match treated and control units if their propensity scores are within a specified distance (the caliper). Often the caliper is set to <img src="https://latex.codecogs.com/png.latex?0.2"> times the standard deviation of the logit of the propensity score (Austin 2010).</p>
<p>This approach is particularly useful when you want to avoid bad matches that can arise in standard nearest neighbor matching. By imposing a maximum allowable distance, caliper matching prevents extreme mismatches and generally improves balance between treatment and control groups. The main tradeoff is that it may discard treated units if no control unit falls within the caliper distance, potentially reducing sample size and raising questions about external validity for the excluded observations.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate propensity scores for caliper calculation</span></span>
<span id="cb5-2">ps_for_caliper <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> binomial)</span>
<span id="cb5-3">ps_vals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(ps_for_caliper, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>)</span>
<span id="cb5-4">logit_ps <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(ps_vals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ps_vals))</span>
<span id="cb5-5"></span>
<span id="cb5-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate caliper (0.2 * SD of logit PS, as recommended by Rosenbaum &amp; Rubin)</span></span>
<span id="cb5-7">caliper_width <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(logit_ps)</span>
<span id="cb5-8"></span>
<span id="cb5-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Perform caliper matching</span></span>
<span id="cb5-10">matchit_caliper <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matchit</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, </span>
<span id="cb5-11">                           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, </span>
<span id="cb5-12">                           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nearest"</span>, </span>
<span id="cb5-13">                           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"glm"</span>,</span>
<span id="cb5-14">                           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">caliper =</span> caliper_width,</span>
<span id="cb5-15">                           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">std.caliper =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb5-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(matchit_caliper)</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate propensity scores</span></span>
<span id="cb6-2">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.predict_proba(X)[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate caliper (0.2 * SD of logit of propensity score)</span></span>
<span id="cb6-5">logit_ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.log(ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e-10</span>))  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Small constant to avoid division by zero</span></span>
<span id="cb6-6">caliper <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.std(logit_ps)</span>
<span id="cb6-7"></span>
<span id="cb6-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Simplified 1:1 caliper matching (with replacement)</span></span>
<span id="cb6-9">matched_pairs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb6-10">treated_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>].index</span>
<span id="cb6-11">control_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].index</span>
<span id="cb6-12"></span>
<span id="cb6-13"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> t_idx <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> treated_idx:</span>
<span id="cb6-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate distances in logit space</span></span>
<span id="cb6-15">    t_logit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> logit_ps[t_idx]</span>
<span id="cb6-16">    c_logits <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> logit_ps[control_idx]</span>
<span id="cb6-17">    distances <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(t_logit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> c_logits)</span>
<span id="cb6-18">    </span>
<span id="cb6-19">    min_dist <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> distances.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>()</span>
<span id="cb6-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> min_dist <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> caliper:</span>
<span id="cb6-21">        min_dist_idx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> control_idx[np.argmin(distances)]</span>
<span id="cb6-22">        matched_pairs.append((t_idx, min_dist_idx))</span>
<span id="cb6-23"></span>
<span id="cb6-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Matched </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(matched_pairs)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> out of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(treated_idx)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> treated units within caliper"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="stratification-blocking" class="level3">
<h3 class="anchored" data-anchor-id="stratification-blocking">Stratification / Blocking</h3>
<p><em>Target Estimand</em>: Typically ATE.</p>
<p>Here, the range of propensity scores is divided into <img src="https://latex.codecogs.com/png.latex?K"> strata (often quintiles), and treatment effects are estimated within each stratum, then averaged across strata.</p>
<p>Stratification is particularly appealing when matching isn’t feasible or when you prefer a more aggregate approach to adjustment. The method is straightforward to implement and achieves balance on average within each stratum. However, because it discretizes the propensity score into bins, the adjustment can be somewhat coarse, and bias may not be fully eliminated within each stratum, especially if there’s substantial heterogeneity in propensity scores within a given stratum.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">matchit_strat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matchit</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"subclass"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subclass =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb7-2">md <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">match.data</span>(matchit_strat)</span>
<span id="cb7-3">stratum_effects <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(md<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>subclass, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>), <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(s) {</span>
<span id="cb7-4">  sub <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> md[md<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>subclass <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> s, ]</span>
<span id="cb7-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb7-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length[sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length[sub<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb7-7">})</span>
<span id="cb7-8">ate_stratified <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(stratum_effects, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb7-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Stratified ATE:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(ate_stratified, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)))</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate propensity scores</span></span>
<span id="cb8-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> iris.columns:</span>
<span id="cb8-3">    iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.predict_proba(X)[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb8-4"></span>
<span id="cb8-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stratification by propensity score quintiles</span></span>
<span id="cb8-6">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps_stratum'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.qcut(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>], q<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, labels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>, duplicates<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'drop'</span>)</span>
<span id="cb8-7"></span>
<span id="cb8-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate treatment effect within each stratum</span></span>
<span id="cb8-9">stratum_effects <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb8-10"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> stratum <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps_stratum'</span>].unique():</span>
<span id="cb8-11">    stratum_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps_stratum'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> stratum]</span>
<span id="cb8-12">    treated <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stratum_data[stratum_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span>
<span id="cb8-13">    control <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stratum_data[stratum_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span>
<span id="cb8-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(treated) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(control) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb8-15">        effect <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> treated.mean() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> control.mean()</span>
<span id="cb8-16">        stratum_effects.append(effect)</span>
<span id="cb8-17"></span>
<span id="cb8-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Overall effect (simple average across strata)</span></span>
<span id="cb8-19">ate_stratified <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.mean(stratum_effects)</span>
<span id="cb8-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Stratified ATE: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ate_stratified<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="inverse-probability-weighting-ipw" class="level3">
<h3 class="anchored" data-anchor-id="inverse-probability-weighting-ipw">Inverse Probability Weighting (IPW)</h3>
<p><em>Target Estimand</em>: ATT/ATE.</p>
<p>IPW turns the propensity score into weights: <img src="https://latex.codecogs.com/png.latex?%0Aw_i%20=%20%5Cfrac%7BD_i%7D%7Be(X_i)%7D%20+%20%5Cfrac%7B1%20-%20D_i%7D%7B1%20-%20e(X_i)%7D.%0A"> This reweights the sample so that treated and control groups resemble each other on observed covariates.</p>
<p>This method is ideal when you want to utilize the entire dataset without discarding any units. IPW is conceptually simple and makes full use of all available observations. The main challenge, however, is its sensitivity to extreme propensity scores near 0 or 1. When units have very low or very high probabilities of treatment, the inverse weighting can produce extremely large weights, leading to unstable estimates with high variance. This is why trimming or other stabilization techniques are often employed alongside IPW.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(ps_model, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>)</span>
<span id="cb9-2">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps))</span>
<span id="cb9-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>weights)</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.predict_proba(X)[:,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb10-2">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'weights'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.where(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>]))</span>
<span id="cb10-3">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'weights'</span>].describe()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="augmented-ipw-aipw-doubly-robust-estimators" class="level3">
<h3 class="anchored" data-anchor-id="augmented-ipw-aipw-doubly-robust-estimators">Augmented IPW (AIPW) / Doubly Robust Estimators</h3>
<p><em>Target Estimand</em>: ATT/ATE.</p>
<p>Many modern estimators can be viewed as combining propensity score weighting with outcome modeling, yielding doubly robust estimators that remain consistent if either component is correctly specified. The key appeal: if either the propensity score model or the outcome model is correct (but not necessarily both), the estimator is consistent. This is called the <em>doubly robust</em> property.</p>
<p>The AIPW estimator for the ATE looks like: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Ctau%7D_%7B%5Ctext%7BAIPW%7D%7D%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cfrac%7BD_i%20(Y_i%20-%20%5Chat%7Bm%7D_1(X_i))%7D%7Be(X_i)%7D%20-%20%5Cfrac%7B(1%20-%20D_i)%20(Y_i%20-%20%5Chat%7Bm%7D_0(X_i))%7D%7B1%20-%20e(X_i)%7D%20+%20%5Chat%7Bm%7D_1(X_i)%20-%20%5Chat%7Bm%7D_0(X_i)%20%5Cright%5D,%0A"> where <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bm%7D_d(X)"> is the predicted outcome for treatment group <img src="https://latex.codecogs.com/png.latex?d">.</p>
<p>This approach is particularly valuable when you want robust estimation but are uncertain about whether your propensity score model or outcome model is correctly specified. The doubly robust property provides a safety net: you only need one of the two models to be correct. Additionally, AIPW makes efficient use of the available data. The cost is increased computational complexity, since both the treatment and outcome models must be estimated, and careful attention must be paid to how these models interact.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Manual AIPW implementation</span></span>
<span id="cb11-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Estimate propensity scores</span></span>
<span id="cb11-3">ps_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> binomial)</span>
<span id="cb11-4">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(ps_fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>)</span>
<span id="cb11-5"></span>
<span id="cb11-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Estimate outcome models for each treatment group</span></span>
<span id="cb11-7">outcome_treated <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, </span>
<span id="cb11-8">                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris[iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, ])</span>
<span id="cb11-9">outcome_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, </span>
<span id="cb11-10">                      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris[iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, ])</span>
<span id="cb11-11"></span>
<span id="cb11-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Predict potential outcomes for all units</span></span>
<span id="cb11-13">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mu1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(outcome_treated, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> iris)</span>
<span id="cb11-14">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mu0 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(outcome_control, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newdata =</span> iris)</span>
<span id="cb11-15"></span>
<span id="cb11-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 4: Calculate AIPW estimator</span></span>
<span id="cb11-17">aipw_component <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(iris, </span>
<span id="cb11-18">  (D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu1) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> ps) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> </span>
<span id="cb11-19">  ((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> D) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu0) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> ps)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb11-20">  (mu1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu0)</span>
<span id="cb11-21">)</span>
<span id="cb11-22">ate_aipw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(aipw_component)</span>
<span id="cb11-23"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AIPW ATE:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(ate_aipw, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)))</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Using EconML for Doubly Robust estimation</span></span>
<span id="cb12-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> econml.dr <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> DRLearner</span>
<span id="cb12-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb12-4"></span>
<span id="cb12-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Prepare data</span></span>
<span id="cb12-6">Y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>].values</span>
<span id="cb12-7">T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>].values</span>
<span id="cb12-8">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]].values</span>
<span id="cb12-9">W <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Covariates for confounding</span></span>
<span id="cb12-10"></span>
<span id="cb12-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit doubly robust learner</span></span>
<span id="cb12-12">dr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DRLearner(</span>
<span id="cb12-13">    model_propensity<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>LogisticRegression(max_iter<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),</span>
<span id="cb12-14">    model_regression<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>LinearRegression(),</span>
<span id="cb12-15">    model_final<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>LinearRegression(),</span>
<span id="cb12-16">    cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb12-17">)</span>
<span id="cb12-18">dr.fit(Y, T, X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, W<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>W)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># X=None for constant treatment effect</span></span>
<span id="cb12-19"></span>
<span id="cb12-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate ATE (ate() may return array; take scalar for display)</span></span>
<span id="cb12-21">ate_result <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dr.ate(X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>)</span>
<span id="cb12-22">ate_est <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(np.asarray(ate_result).flatten()[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb12-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Doubly Robust ATE: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ate_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.3f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="covariate-balancing-propensity-score-cbps" class="level3">
<h3 class="anchored" data-anchor-id="covariate-balancing-propensity-score-cbps">Covariate Balancing Propensity Score (CBPS)</h3>
<p><em>Target Estimand</em>: Typically ATE.</p>
<p>CBPS, introduced by Imai and Ratkovic (2014), directly estimates the propensity score while optimizing covariate balance. Instead of fitting a logistic regression and then checking balance, CBPS ensures balance is achieved <em>as part of the estimation process</em>. Example below in R (CBPS package); Python users can look to balance-focused weighting in other libraries.</p>
<p>This method shines when standard propensity score estimation leads to poor covariate balance. Rather than the typical iterate-and-check workflow, CBPS achieves good balance without requiring manual tuning, working directly toward the ultimate goal of creating comparable groups. The main drawbacks are that it’s more complex to implement than standard logistic regression and less widely available in standard statistical packages, though dedicated <code>R</code> packages do exist.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true" href="">R</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(CBPS)</span>
<span id="cb13-2">cbps_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">CBPS</span>(D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris)</span>
<span id="cb13-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(cbps_fit)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="overlap-weights" class="level3">
<h3 class="anchored" data-anchor-id="overlap-weights">Overlap Weights</h3>
<p><em>Target Estimand</em>: Overlap-weighted ATE.</p>
<p>Overlap weighting focuses on the region of common support—where treated and control units both exist—by assigning weights: <img src="https://latex.codecogs.com/png.latex?%0Aw_i%20=%20D_i%20(1%20-%20e(X_i))%20+%20(1%20-%20D_i)%20e(X_i).%0A"> This downweights units with extreme scores near 0 or 1 and emphasizes comparability.</p>
<p>This weighting scheme is ideal when you want to avoid extrapolation and focus inference on the region where treated and control units truly overlap. The approach naturally sidesteps the instability that plagues standard IPW when propensity scores approach the boundaries, and it targets what’s sometimes called the “overlap population.” The key consideration is that the resulting estimate represents the treatment effect for this overlap population, which may differ from the overall ATE or the ATT, depending on how representative the overlap region is of the full sample.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overlap weights</span></span>
<span id="cb14-2">iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>overlap_weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps, iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ps)</span>
<span id="cb14-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>overlap_weights)</span></code></pre></div></div>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate propensity scores if not already done</span></span>
<span id="cb15-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> iris.columns:</span>
<span id="cb15-3">    iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.predict_proba(X)[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb15-4"></span>
<span id="cb15-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate overlap weights</span></span>
<span id="cb15-6">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'overlap_weights'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.where(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>], iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ps'</span>])</span>
<span id="cb15-7">iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'overlap_weights'</span>].describe()</span></code></pre></div></div>
</div>
</div>
</div>
</section>
<section id="entropy-balancing" class="level3">
<h3 class="anchored" data-anchor-id="entropy-balancing">Entropy Balancing</h3>
<p><em>Target Estimand</em>: ATT/ATE.</p>
<p>Entropy balancing directly reweights the control group so that the moments of the covariates (mean, variance, etc.) match exactly between treated and control groups. Instead of matching or stratifying, this solves a constrained optimization problem that minimizes the Kullback-Leibler divergence of weights subject to balance constraints. Example below in R (<code>ebal</code> package).</p>
<p>This method is particularly useful when balance proves difficult to achieve with traditional weighting schemes. Entropy balancing guarantees exact balance on the chosen covariate moments and fully utilizes all available data without discarding observations. The analyst must specify which moments (typically means, and sometimes variances and skewness) should be balanced, and the results can be sensitive to these choices. Nevertheless, the method offers strong guarantees and has gained popularity for applications where precise balance is paramount.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-9-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-1" aria-controls="tabset-9-1" aria-selected="true" href="">R</a></li></ul>
<div class="tab-content">
<div id="tabset-9-1" class="tab-pane active" aria-labelledby="tabset-9-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ebal)</span>
<span id="cb16-2"></span>
<span id="cb16-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit entropy balancing (balance covariates between treated and control)</span></span>
<span id="cb16-4">eb_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ebalance</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Treatment =</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>D, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">X =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal.Width"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>)]))</span>
<span id="cb16-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(eb_fit)</span></code></pre></div></div>
</div>
</div>
</div>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Propensity score methods provide diverse approaches to estimate causal effects from observational data, each with unique strengths and trade-offs.</li>
<li>Some methods prioritize simplicity (e.g., stratification) or data retention (e.g., IPW), while others focus on robustness or balance (AIPW, CBPS).</li>
<li>Doubly robust methods like AIPW offer reliability even when one model is misspecified, while others (e.g., entropy balancing) guarantee perfect balance through optimization.</li>
<li>No single method is universally best. The choice hinges on practical considerations: sample size, covariate overlap between groups, and whether exact balance or data efficiency is prioritized.</li>
<li>All else equal, doubly robust estimators offer extra protection against biased results.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For the original introduction to propensity scores, see Rosenbaum and Rubin’s (1983) landmark paper. Imai and Ratkovic’s (2014) work on CBPS is a must-read for understanding balance-focused estimation. The textbook <em>Causal Inference for Statistics, Social, and Biomedical Sciences</em> by Imbens and Rubin (2015) provides excellent coverage of these methods. There are also great tutorials and vignettes in R packages like <code>MatchIt</code>, <code>twang</code>, and <code>WeightIt</code>.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><p>Rosenbaum, P. R., &amp; Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. <em>Biometrika</em>, 70(1), 41–55.</p></li>
<li><p>Imai, K., &amp; Ratkovic, M. (2014). Covariate balancing propensity score. <em>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</em>, 76(1), 243–263.</p></li>
<li><p>Imbens, G. W., &amp; Rubin, D. B. (2015). <em>Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction</em>. Cambridge University Press.</p></li>
<li><p>Hainmueller, J. (2012). Entropy balancing for causal effects. <em>Political Analysis</em>, 20(1), 25–46.</p></li>
</ul>


</section>

 ]]></description>
  <category>causal inference</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-prop-score-methods.html</guid>
  <pubDate>Thu, 22 Jan 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>The Wilcoxon-Mann-Whitney Test is Not a Test of Medians</title>
  <link>https://vyasenov.github.io/blog/wmw-test-fails-medians.html</link>
  <description><![CDATA[ 





<div class="reading-time">5 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Nonparametric tests like the Wilcoxon-Mann-Whitney (WMW) are among the most popular alternatives to the <img src="https://latex.codecogs.com/png.latex?t">- and <img src="https://latex.codecogs.com/png.latex?z">-tests in settings where normality assumptions break down. Often described as a “test of medians,” WMW is used when comparing two independent groups without making strong assumptions about the underlying distributions. It is also known as the Mann-Whitney-Wilcoxon (MWW) test or the Wilcoxon rank-sum test.</p>
<p>Despite this common interpretation, the WMW test is <em>not</em> a test of medians—at least not in general. <a href="https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1305291">Divine et al.&nbsp;(2018)</a> dive deep into this misconception and show convincingly how the WMW test can lead you astray if you’re specifically interested in comparing medians.</p>
<p>This article explains why that happens, provides some intuition and math, and shows you how to think more clearly about what the WMW test actually does.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?X_1,%20%5Cldots,%20X_m%20%5Csim%20F"> and <img src="https://latex.codecogs.com/png.latex?Y_1,%20%5Cldots,%20Y_n%20%5Csim%20G"> be two independent random samples from distributions <img src="https://latex.codecogs.com/png.latex?F"> and <img src="https://latex.codecogs.com/png.latex?G">, respectively. The Wilcoxon-Mann-Whitney statistic is based on the probability:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(X%20%3C%20Y)%20+%20%5Cfrac%7B1%7D%7B2%7DP(X%20=%20Y)"></p>
<p>This quantity is sometimes referred to as the <em>probability of superiority</em>.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5Ctheta_F"> and <img src="https://latex.codecogs.com/png.latex?%5Ctheta_G"> denote the medians of <img src="https://latex.codecogs.com/png.latex?F"> and <img src="https://latex.codecogs.com/png.latex?G">. We often want to test:</p>
<p><img src="https://latex.codecogs.com/png.latex?H_0:%20%5Ctheta_F%20=%20%5Ctheta_G"></p>
<p>But WMW does not directly test this hypothesis unless very specific conditions are met.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="what-does-wmw-actually-test" class="level3">
<h3 class="anchored" data-anchor-id="what-does-wmw-actually-test">What Does WMW Actually Test?</h3>
<p>The WMW test assesses whether one distribution tends to produce larger values than the other. More formally, it tests:</p>
<p><img src="https://latex.codecogs.com/png.latex?H_0:%20P(X%20%3C%20Y)%20+%20%5Cfrac%7B1%7D%7B2%7DP(X%20=%20Y)%20=%200.5"></p>
<p>This is equivalent to testing whether the distributions are stochastically equal, not whether the medians are equal.</p>
<p>The WMW test can be performed via rank sums. After combining both samples, we rank all observations from smallest to largest. The test statistic <img src="https://latex.codecogs.com/png.latex?W"> is the sum of ranks assigned to the first sample:</p>
<p><img src="https://latex.codecogs.com/png.latex?W%20=%20%5Csum_%7Bi=1%7D%5Em%20R(X_i)"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?R(X_i)"> is the rank of <img src="https://latex.codecogs.com/png.latex?X_i"> in the combined sample.</p>
<p>This rank-based formulation is mathematically equivalent to counting how many pairs <img src="https://latex.codecogs.com/png.latex?(X_i,%20Y_j)"> have <img src="https://latex.codecogs.com/png.latex?X_i%20%3C%20Y_j">, which relates to the probability interpretation above. Under the null hypothesis, the expected rank sum is approximately <img src="https://latex.codecogs.com/png.latex?m(m+n+1)/2">.</p>
</section>
<section id="understanding-stochastic-dominance" class="level3">
<h3 class="anchored" data-anchor-id="understanding-stochastic-dominance">Understanding Stochastic Dominance</h3>
<p>When we say the WMW test examines “stochastic dominance,” we mean it tests whether values from one distribution tend to exceed values from the other. Specifically, distribution <img src="https://latex.codecogs.com/png.latex?G"> stochastically dominates distribution <img src="https://latex.codecogs.com/png.latex?F"> if:</p>
<p><img src="https://latex.codecogs.com/png.latex?G(x)%20%5Cleq%20F(x)%20%5Ctext%7B%20for%20all%20%7D%20x"></p>
<p>with strict inequality for at least some values of <img src="https://latex.codecogs.com/png.latex?x">. Intuitively, this means a randomly selected value from <img src="https://latex.codecogs.com/png.latex?G"> is more likely to be larger than a randomly selected value from <img src="https://latex.codecogs.com/png.latex?F">.</p>
<p>This is quite different from comparing medians. Two distributions can have identical medians but exhibit stochastic dominance, or they can have different medians but neither stochastically dominates the other.</p>
</section>
<section id="when-does-it-coincide-with-a-median-test" class="level3">
<h3 class="anchored" data-anchor-id="when-does-it-coincide-with-a-median-test">When Does It Coincide with a Median Test?</h3>
<p>The WMW test only functions as a test of medians under symmetric distributions with equal shape and spread. If the shapes differ—say, one is skewed left and the other right—then even if the medians are the same, WMW can reject the null. Worse, it might <em>fail</em> to reject when the medians are different but the distributions have similar overall ranks.</p>
</section>
<section id="alternative-tests" class="level3">
<h3 class="anchored" data-anchor-id="alternative-tests">Alternative Tests</h3>
<p>If your research question specifically concerns differences in medians, more appropriate tests include:</p>
<ul>
<li><strong>Mood’s median test</strong>: A true test of median equality that uses contingency tables based on counts above and below the combined median.</li>
<li><strong>Quantile regression</strong>: For more complex designs, quantile regression directly models the median (or other quantiles) and tests differences between groups.</li>
<li><strong>Bootstrap confidence intervals</strong>: Calculating confidence intervals for the difference in medians via bootstrapping provides both a test and measure of uncertainty.</li>
</ul>
<p>These approaches directly address median differences rather than the stochastic ordering tested by WMW.</p>
</section>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An Example</h2>
<p>Let’s see this in action with a small simulation.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb1-2">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rexp</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rate =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)         <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Right-skewed</span></span>
<span id="cb1-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rexp</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rate =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>)       <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Also right-skewed, different rate</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(x)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Median of x</span></span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(y)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Median of y</span></span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wilcox.test</span>(x, y)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy.stats <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> mannwhitneyu</span>
<span id="cb2-3"></span>
<span id="cb2-4">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb2-5">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.exponential(scale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb2-6">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.exponential(scale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Higher scale = lower rate</span></span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Median x:"</span>, np.median(x))</span>
<span id="cb2-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Median y:"</span>, np.median(y))</span>
<span id="cb2-10"></span>
<span id="cb2-11">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mannwhitneyu(x, y, alternative<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'two-sided'</span>)</span>
<span id="cb2-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(res)</span></code></pre></div></div>
</div>
</div>
</div>
<p>This example demonstrates our point perfectly: The medians are clearly different (0.6334 vs.&nbsp;0.4865), and the WMW test correctly rejects the null hypothesis (p = 0.004). However, this rejection occurs because the exponential distributions with different rates create a consistent stochastic ordering, not because it’s specifically testing the medians.</p>
<p>Despite different medians, the WMW test might not reject the null. Or it might reject it <em>because</em> of shape differences, not the medians.</p>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>The Wilcoxon-Mann-Whitney test is not a general test of medians.</li>
<li>It tests for stochastic dominance or shift in distribution, not specifically median difference.</li>
<li>It behaves like a median test only under certain conditions (e.g., identical shape).</li>
<li>Be cautious interpreting WMW results as saying something about medians unless distributional assumptions are met.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a deeper dive, read the original Divine et al.&nbsp;(2018) paper. You might also want to look at literature on robust location tests or permutation-based alternatives that better target the median.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Divine, G. W., Norton, H. J., Barón, A. E., &amp; Juarez-Colunga, E. (2018). The Wilcoxon–Mann–Whitney procedure fails as a test of medians. <em>The American Statistician</em>, 72(3), 278–286.</p>
<p>Hollander, M., Wolfe, D. A., &amp; Chicken, E. (2013). <em>Nonparametric Statistical Methods</em> (3rd ed.). Wiley.</p>


</section>

 ]]></description>
  <category>statistical inference</category>
  <category>hypothesis testing</category>
  <guid>https://vyasenov.github.io/blog/wmw-test-fails-medians.html</guid>
  <pubDate>Tue, 20 Jan 2026 08:00:00 GMT</pubDate>
</item>
<item>
  <title>Unconditional Quantile Regression and Treatment Effects</title>
  <link>https://vyasenov.github.io/blog/unconditional-qreg.html</link>
  <description><![CDATA[ 





<div class="reading-time">9 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Quantile regression has become a widely used tool in econometrics and statistics, thanks to its ability to model the entire distribution of an outcome variable rather than just its mean. Traditional quantile regression (Koenker and Bassett, 1978), however, is <em>conditional</em> as it models quantiles of the outcome given a set of covariates. But in many policy and causal inference applications, we are interested in changes to the <em>unconditional</em> distribution of the outcome variable.</p>
<p>For example, suppose we want to understand the effect of a job training program on wage inequality. A standard quantile regression would tell us how the program shifts quantiles <em>given</em> certain characteristics like education or experience. In other words, focus is on the quantile of the residual term in the linear model. But we might instead want to estimate how the program shifts quantiles <em>in the population as a whole</em>. This is where <em>Unconditional Quantile Regression (UQR)</em> comes in.</p>
<p>The key breakthrough in this space was provided by Firpo, Fortin, and Lemieux (2009), who introduced a method based on the Recentered Influence Function (RIF). This allows us to estimate the effect of covariates on unconditional quantiles using simple linear regressions. Later, Frölich and Melly (2013) extended this framework to account for endogeneity, providing a way to estimate Unconditional Quantile Treatment Effects (UQTEs) in settings where treatment is not randomly assigned.</p>
<p>In this article, I’ll unpack the key ideas behind UQR, discuss how to estimate unconditional quantile treatment effects, and illustrate these concepts with an example in <code>R</code> and <code>python</code>.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>As a refresher, the <img src="https://latex.codecogs.com/png.latex?%5Ctau">-th quantile of an outcome variable <img src="https://latex.codecogs.com/png.latex?Y"> is defined as: <img src="https://latex.codecogs.com/png.latex?%0AQ_%5Ctau(Y)%20=%20%5Cinf%20%5C%7B%20q%20:%20P(Y%20%5Cleq%20q)%20%5Cgeq%20%5Ctau%20%5C%7D.%0A"></p>
<p>UQR allows us to estimate how covariates influence these unconditional quantiles.</p>
<p>Now consider adding a set of covariates <img src="https://latex.codecogs.com/png.latex?X">. In traditional quantile regression, we estimate the conditional quantile function:</p>
<p><img src="https://latex.codecogs.com/png.latex?Q_%5Ctau(Y%20%7C%20X)%20=%20%5Cinf%20%5C%7B%20q%20:%20P(Y%20%5Cleq%20q%20%7C%20X)%20%5Cgeq%20%5Ctau%20%5C%7D."></p>
<p>This tells us how the <img src="https://latex.codecogs.com/png.latex?%5Ctau">-th quantile of <img src="https://latex.codecogs.com/png.latex?Y"> changes with <img src="https://latex.codecogs.com/png.latex?X">. Traditional quantile regression models the impact on this conditional quantile.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="unconditional-quantile-regression" class="level3">
<h3 class="anchored" data-anchor-id="unconditional-quantile-regression">Unconditional Quantile Regression</h3>
<section id="definition" class="level4">
<h4 class="anchored" data-anchor-id="definition">Definition</h4>
<p>Firpo et al.&nbsp;(2009) introduced an elegant way to estimate UQR using influence functions. The influence function of a statistic measures how much that statistic changes when an observation is perturbed. The recentered influence function (RIF) for a quantile <img src="https://latex.codecogs.com/png.latex?Q_%5Ctau"> is given by:</p>
<p><img src="https://latex.codecogs.com/png.latex?RIF(Y;%20Q_%5Ctau)%20=%20Q_%5Ctau%20+%20%5Cfrac%7B%5Ctau%20-%201%5C%7BY%20%5Cleq%20Q_%5Ctau%5C%7D%7D%7Bf_Y(Q_%5Ctau)%7D."></p>
<p>Here, <img src="https://latex.codecogs.com/png.latex?f_Y(Q_%5Ctau)"> is the density of <img src="https://latex.codecogs.com/png.latex?Y"> at <img src="https://latex.codecogs.com/png.latex?Q_%5Ctau">, which can be estimated nonparametrically. This nonparametric density estimation is often done via kernel density estimation but may be imprecise in the tails.</p>
<p>Firpo et al.&nbsp;showed that regressing <img src="https://latex.codecogs.com/png.latex?RIF(Y;%20Q_%5Ctau)"> on covariates <img src="https://latex.codecogs.com/png.latex?X"> via OLS provides a valid estimate of how <img src="https://latex.codecogs.com/png.latex?X"> affects the <img src="https://latex.codecogs.com/png.latex?%5Ctau">-th quantile of <img src="https://latex.codecogs.com/png.latex?Y">. This method is remarkably simple but powerful—it transforms a quantile regression problem into a standard linear regression problem.</p>
<p>This idea also generalizes to other distributional statistics (Gini, variance) by using the corresponding influence functions.</p>
</section>
<section id="estimation" class="level4">
<h4 class="anchored" data-anchor-id="estimation">Estimation</h4>
<p>The estimation proceeds in three steps:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Estimate the sample quantile <img src="https://latex.codecogs.com/png.latex?q_%7B%5Ctau%7D">.</li>
<li>Estimate the density <img src="https://latex.codecogs.com/png.latex?f_Y(q_%7B%5Ctau%7D)">, typically via kernel density estimation.</li>
<li>Construct the RIF for each observation and regress it on the covariates.</li>
</ol>
</div>
</div>
<p>The basic regression is: <img src="https://latex.codecogs.com/png.latex?%0ARIF(Y;%20q_%7B%5Ctau%7D)%20=%20X'%20%5Cbeta%20+%20%5Cvarepsilon,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> now captures the effect of <img src="https://latex.codecogs.com/png.latex?X"> on the <img src="https://latex.codecogs.com/png.latex?%5Ctau">-th unconditional quantile.</p>
<p>The most common implementation is RIF-OLS, though alternatives include RIF-Logit and nonparametric first stages (RIF-NP).</p>
</section>
<section id="inference" class="level4">
<h4 class="anchored" data-anchor-id="inference">Inference</h4>
<p>Density estimation is a critical step that affects the quality of inference as poor estimates at the quantile point can lead to noisy estimates. Because of the multi-step estimation (quantile, density, RIF), standard error computation is more complex. Bootstrapping is commonly used and has been shown to perform well in practice.</p>
</section>
<section id="challenges" class="level4">
<h4 class="anchored" data-anchor-id="challenges">Challenges</h4>
<p>RIF-OLS is a linear model and as such it assumes a linear relationship between the RIF and covariates. If the true relationship is nonlinear, flexible methods (logit, nonparametric) are preferred. UQR is especially appealing for estimating treatment effects on the distribution of outcomes in quasi-experimental settings. When treatment is exogenous (conditional on the covariates), including treatment indicators in the RIF regression yields estimates of the treatment effect at various unconditional quantiles. This is a perfect segue for the next section.</p>
</section>
</section>
<section id="unconditional-quantile-treatment-effects" class="level3">
<h3 class="anchored" data-anchor-id="unconditional-quantile-treatment-effects">Unconditional Quantile Treatment Effects</h3>
<p>One limitation of UQR as formulated by Firpo et al.&nbsp;is that it assumes covariates are exogenous. But in many causal inference settings, treatment assignment is endogenous (e.g., workers self-select into training programs). Frölich and Melly (2013) extended the UQR framework to handle endogeneity using instrumental variables (IV). The authors built on earlier work by Chernozhukov and Hansen (2005) which pioneered the estimation of (conditional) quantile treatment effects in the presence of endogeneity.</p>
<p>Frölich and Melly showed that under standard IV assumptions—relevance and exclusion—the unconditional quantile treatment effect (UQTE) can be estimated using a two-step approach:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Estimate a propensity score model (or an instrumented version of <img src="https://latex.codecogs.com/png.latex?D">) to account for selection bias.</li>
<li>Use IV-based weighting to recover the counterfactual unconditional outcome distributions for compliers, and apply RIF methods to estimate UQTEs.</li>
</ol>
</div>
</div>
<p>This approach provides a way to estimate distributional treatment effects while addressing selection bias—a crucial tool in policy evaluation and applied econometrics.</p>
</section>
<section id="rank-invariance-in-qtes" class="level3">
<h3 class="anchored" data-anchor-id="rank-invariance-in-qtes">Rank Invariance in QTEs</h3>
<p>A crucial assumption often invoked in the estimation of quantile treatment effects (QTEs) is rank invariance. This assumption states that units maintain their rank in the outcome distribution after receiving the treatment. In other words, if a treated unit was at the 30th percentile of the untreated outcome distribution, it would remain at the 30th percentile of the treated distribution.</p>
<p>While this assumption simplifies identification and interpretation of QTEs, it can be highly restrictive. It rules out the possibility that treatment reshuffles individuals across the distribution—a scenario that might be not only plausible but central in many applications.</p>
<p>Consider a school voucher program that offers private school access to low-income students. The effect of such a program may be heterogeneous: for high-performing students, access might enhance performance due to better environments. But for low-performing students, the same access could lead to worse outcomes due to higher academic pressure or poor fit. As a result, the program could re-rank students in the outcome distribution, violating rank invariance.</p>
<p>In such settings, assuming rank invariance could lead to misleading conclusions about who benefits and who loses from treatment. Alternative approaches, like those based on quantile treatment effect bounds (e.g., Melly, 2005; Chernozhukov &amp; Hansen, 2005), are more robust to such violations.</p>
</section>
</section>
<section id="examples" class="level2">
<h2 class="anchored" data-anchor-id="examples">Examples</h2>
<section id="bitler-et-al.-2006" class="level3">
<h3 class="anchored" data-anchor-id="bitler-et-al.-2006">Bitler et al.&nbsp;(2006)</h3>
<p>When evaluating the effects of welfare reform, traditional analyses often focus on mean impacts, which can obscure critical insights into the distributional effects of policy changes. ​ Quantile Treatment Effects (QTE) provide a powerful tool for understanding how reforms impact different segments of the population, revealing heterogeneity that mean impacts fail to capture. ​ For example, the study “<em>What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments</em>” by Bitler, Gelbach, and Hoynes uses QTE to analyze Connecticut’s Jobs First program, a welfare reform initiative.</p>
<p>The authors find that while mean impacts suggest modest income gains, QTE reveal substantial variation: earnings effects are zero at the bottom, positive in the middle, and negative at the top of the distribution before time limits take effect. ​ After time limits, income effects are mixed, with gains concentrated in higher quantiles and losses at the lower end. ​ This nuanced approach highlights the importance of QTE in uncovering the true breadth of policy impacts, enabling data scientists to better inform decision-making and address equity concerns in policy design.</p>
</section>
<section id="code" class="level3">
<h3 class="anchored" data-anchor-id="code">Code</h3>
<p>Let’s illustrate these ideas with an example in <code>R</code> and <code>python</code>. We’ll use the <code>iris</code> dataset to estimate the effect of <code>Sepal.Length</code> on different quantiles of <code>Petal.Length</code> using UQR.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">list=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ls</span>())</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(quantreg)</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load dataset</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate unconditional quantiles</span></span>
<span id="cb1-8">taus <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.50</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>)</span>
<span id="cb1-9">q_vals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Petal.Length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">probs =</span> taus)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate quantiles</span></span>
<span id="cb1-10">f_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">density</span>(iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Petal.Length)</span>
<span id="cb1-11"></span>
<span id="cb1-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute RIF values</span></span>
<span id="cb1-13">rif_values <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(i) {</span>
<span id="cb1-14">  q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> q_vals[i]</span>
<span id="cb1-15">  f <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> f_hat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which.min</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(f_hat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> q))]</span>
<span id="cb1-16">  q <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ((taus[i] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Petal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> q)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> f)</span>
<span id="cb1-17">})</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Run RIF regression</span></span>
<span id="cb1-20">models <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(rif_values, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(rif) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(rif <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris))</span>
<span id="cb1-21"></span>
<span id="cb1-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print results</span></span>
<span id="cb1-23"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(models, summary)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy.stats <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gaussian_kde</span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load dataset</span></span>
<span id="cb2-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb2-8">iris_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-9">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data'</span>]</span>
<span id="cb2-10">iris.columns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sepal.Length'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sepal.Width'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Petal.Length'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Petal.Width'</span>]</span>
<span id="cb2-11"></span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate unconditional quantiles</span></span>
<span id="cb2-13">taus <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.50</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>]</span>
<span id="cb2-14">q_vals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.quantile(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Petal.Length'</span>], taus)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Estimate quantiles</span></span>
<span id="cb2-15">f_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> gaussian_kde(iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Petal.Length'</span>])</span>
<span id="cb2-16"></span>
<span id="cb2-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute RIF values</span></span>
<span id="cb2-18">rif_values <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb2-19"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i, tau <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(taus):</span>
<span id="cb2-20">    q <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> q_vals[i]</span>
<span id="cb2-21">    f <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> f_hat(q)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Density at the quantile</span></span>
<span id="cb2-22">    rif <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> q <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ((tau <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Petal.Length'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> q).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> f)</span>
<span id="cb2-23">    rif_values.append(rif)</span>
<span id="cb2-24"></span>
<span id="cb2-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Run RIF regression</span></span>
<span id="cb2-26">models <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb2-27"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> rif <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> rif_values:</span>
<span id="cb2-28">    model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression(fit_intercept<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-29">    model.fit(iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Sepal.Length'</span>]], rif)</span>
<span id="cb2-30">    models.append(model)</span>
<span id="cb2-31"></span>
<span id="cb2-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Print results</span></span>
<span id="cb2-33"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i, model <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(models):</span>
<span id="cb2-34">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Model </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">:"</span>)</span>
<span id="cb2-35">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Coefficient for Sepal.Length: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>coef_[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-36">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Intercept: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>model<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>intercept_<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</div>
</div>
<p>This simple example demonstrates how to estimate the effect of a covariate on unconditional quantiles using the RIF regression approach.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>UQR allows us to estimate the effect of covariates on unconditional quantiles, capturing total effects.</li>
<li>The RIF regression method transforms a quantile regression problem into a simple linear regression.</li>
<li>Frölich and Melly (2013) extend UQR to address endogeneity using instrumental variables.</li>
<li>These tools are invaluable for policy evaluation and causal inference.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a deeper dive into these methods, the foundational paper by Firpo, Fortin, and Lemieux (2009) provides a detailed introduction to UQR, while Frölich and Melly (2013) extend the framework to address endogeneity concerns. For a broader perspective on quantile regression, Koenker’s book <em>Quantile Regression</em> (2005) is a must-read.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Alejo, J., Favata, F., Montes-Rojas, G., &amp; Trombetta, M. (2021). Conditional vs unconditional quantile regression models: A guide to practitioners. Economía, 44(88), 76-93.</p>
<p>Bitler, M. P., Gelbach, J. B., &amp; Hoynes, H. W. (2006). What mean impacts miss: Distributional effects of welfare reform experiments. American Economic Review, 96(4), 988-1012.</p>
<p>Borah, B. J., &amp; Basu, A. (2013). Highlighting differences between conditional and unconditional quantile regression approaches through an application to assess medication adherence. Health economics, 22(9), 1052-1070.</p>
<p>Borgen, N. T. (2016). Fixed effects in unconditional quantile regression. The Stata Journal, 16(2), 403-415.</p>
<p>Chernozhukov, V., &amp; Hansen, C. (2005). An IV model of quantile treatment effects. Econometrica, 73(1), 245-261.</p>
<p>Firpo, S., Fortin, N. M., &amp; Lemieux, T. (2009). Unconditional quantile regressions. Econometrica, 77(3), 953-973.</p>
<p>Frölich, M., &amp; Melly, B. (2013). Unconditional quantile treatment effects under endogeneity. Journal of Business &amp; Economic Statistics, 31(3), 346-357.</p>
<p>Koenker, R. (2017). Quantile regression: 40 years on. Annual review of economics, 9(1), 155-176.</p>
<p>Koenker, R., &amp; Hallock, K. F. (2001). Quantile regression. Journal of economic perspectives, 15(4), 143-156.</p>
<p>Koenker, R., &amp; Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.</p>
<p>Sasaki, Y., Ura, T., &amp; Zhang, Y. (2022). Unconditional quantile regression with high‐dimensional data. Quantitative Economics, 13(3), 955-978.</p>


</section>

 ]]></description>
  <category>causal inference</category>
  <guid>https://vyasenov.github.io/blog/unconditional-qreg.html</guid>
  <pubDate>Sun, 21 Dec 2025 08:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Matching for Causal Inference</title>
  <link>https://vyasenov.github.io/blog/flavors-matching-methods.html</link>
  <description><![CDATA[ 





<div class="reading-time">12 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>If you’ve worked on causal inference with observational data, you’ve likely faced the fundamental challenge: the treated and control groups often look very different. Matching methods aim to fix that. The idea is simple and intuitive—let’s compare treated units to similar control units and mimic the conditions of a randomized experiment as best as we can.</p>
<p>But here’s the twist: there are multiple ways to define “similar.” Should we look for exact matches? Should we match on covariates directly or on some summary score like the propensity score? Should we optimize the matches globally or locally? Over the years, researchers have developed a wide variety of matching methods, each with its own advantages and pitfalls. The landscape can be overwhelming, especially if you’re new to causal inference.</p>
<p>In this article, I’ll walk through the most popular matching strategies for causal inference. I’ll talk about what each method does, when to use it, and where it might lead you astray. The focus is on the intuition and technical description—not on the code. Whether you’re doing matching for the first time or looking to expand your toolkit, you will find something useful here.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let’s set up the basic framework with minimal fluff. Suppose we have <img src="https://latex.codecogs.com/png.latex?n"> units indexed by <img src="https://latex.codecogs.com/png.latex?i%20=%201,%20%5Cdots,%20n">. Each unit has:</p>
<ul>
<li>A binary treatment indicator <img src="https://latex.codecogs.com/png.latex?D_i%20%5Cin%20%5C%7B0,%201%5C%7D">, where <img src="https://latex.codecogs.com/png.latex?D_i%20=%201"> for treated units and <img src="https://latex.codecogs.com/png.latex?D_i%20=%200"> for controls.</li>
<li>A vector of observed covariates <img src="https://latex.codecogs.com/png.latex?X_i">.</li>
<li>Potential outcomes <img src="https://latex.codecogs.com/png.latex?Y_i(1)"> and <img src="https://latex.codecogs.com/png.latex?Y_i(0)">, where <img src="https://latex.codecogs.com/png.latex?Y_i(1)"> is the outcome if treated, and <img src="https://latex.codecogs.com/png.latex?Y_i(0)"> if untreated. We observe only their realized outcome <img src="https://latex.codecogs.com/png.latex?Y_i%20=%20D_i%20Y_i(1)%20+%20(1%20-%20D_i)%20Y_i(0)">.</li>
</ul>
<p>We impose the usual assumptions of unconfoundedness (treatment assignment is independent of potential outcomes given covariates) and overlap (treated and control units have similar covariate distributions).</p>
<p>Our goal is to estimate treatment effects like the Average Treatment Effect (ATE) or the ATE on the Treated (ATT): <img src="https://latex.codecogs.com/png.latex?%0A%5Ctext%7BATT%7D%20=%20%5Cmathbb%7BE%7D%5BY(1)%20-%20Y(0)%20%5Cmid%20D%20=%201%5D.%0A"></p>
<p>The core idea behind matching is to find comparable untreated units for each treated unit so we can approximate <img src="https://latex.codecogs.com/png.latex?Y(0)"> for the treated group. We then discard the unmatched units and look at the difference in outcomes between treated and matched controls to estimate the treatment effect.</p>
<p>Let’s abuse notation a bit and define the sample-analogue of the ATT as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cwidehat%7B%5Ctext%7BATT%7D%7D=%5Cfrac%7B1%7D%7BN_%7B%5Ctext%7Btreated%7D%7D%7D%5Csum_%7Bi:D=1%7D%20Y(1)_i%20-%20%5Chat%7BY%7D(0)%5E%7B%5Ctext%7Bimputed%7D%7D_i."></p>
<p>These methods can be, and often are, combined with regression adjustments to reduce bias and improve efficiency and robustness, but I will leave that aside here.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<p>We are now ready to go through seven of the most popular matching approaches.</p>
<section id="exact-matching" class="level3">
<h3 class="anchored" data-anchor-id="exact-matching">Exact Matching</h3>
<p>Exact matching is the simplest—and most restrictive—approach to causal inference:</p>
<blockquote class="blockquote">
<p>Match treated and control units <em>exactly</em> on all observed covariates <img src="https://latex.codecogs.com/png.latex?X">.</p>
</blockquote>
<p>That is, if a treated unit has <img src="https://latex.codecogs.com/png.latex?X%20=%20x">, we look for control units with the exact same <img src="https://latex.codecogs.com/png.latex?X%20=%20x">. While this method is conceptually elegant and easy to understand, it’s rarely practical.</p>
<p>Exact matches become increasingly unlikely in high-dimensional settings or when covariates are continuous, where no two units are likely to be identical. In those cases, exact matching often fails to find matches for many treated units, leading to loss of sample size or biased estimates. Despite its limitations, exact matching is an important baseline: it helps clarify the assumptions behind more flexible methods.</p>
<p>Exact matching works when covariates are discrete, there aren’t too many of them and there is decent overlap between the treated and control groups. It becomes much more difficult (theoretically infeasible) to find matches as the number of covariates increases or in settings with continuous covariates. In practice, it can often lead to lots of unmatched units which often results in discarded data.</p>
<hr>
</section>
<section id="mahalanobis-distance-matching" class="level3">
<h3 class="anchored" data-anchor-id="mahalanobis-distance-matching">Mahalanobis Distance Matching</h3>
<p>Instead of requiring exact equality between covariates, Mahalanobis matching</p>
<blockquote class="blockquote">
<p>Uses a <em>continuous distance metric</em> to find treated and control units that are similar in terms of their covariate values.</p>
</blockquote>
<p>The Mahalanobis distance between two units <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?j">, with covariates <img src="https://latex.codecogs.com/png.latex?X_i"> and <img src="https://latex.codecogs.com/png.latex?X_j">, is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ad(X_i,%20X_j)%20=%20%5Csqrt%7B(X_i%20-%20X_j)%5E%5Ctop%20S%5E%7B-1%7D%20(X_i%20-%20X_j)%7D,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?S"> is the sample covariance matrix of the covariates <img src="https://latex.codecogs.com/png.latex?X">.</p>
<p>This metric accounts for both the scale and the correlation structure of the covariates. Unlike Euclidean distance, which treats each covariate as equally important and independent, Mahalanobis distance adjusts for the fact that some variables may be more variable than others, or may be correlated.</p>
<p>Intuitively, Mahalanobis distance answers the question: how many standard deviations apart are these two vectors, once we’ve accounted for the spread and correlation of the variables? A small Mahalanobis distance indicates that the two units are close in the joint covariate space, even if they differ somewhat along individual dimensions. It still becomes less reliable in high dimensions, where all units tend to be far from one another.</p>
<p>Unlike exact matching, Mahalanobis matching can handle continuous covariates and works well in high dimensions. It is also more flexible than exact matching, in that it can handle mixed discrete and continuous variables.</p>
<hr>
</section>
<section id="propensity-score-matching" class="level3">
<h3 class="anchored" data-anchor-id="propensity-score-matching">Propensity Score Matching</h3>
<p>Propensity Score Matching (PSM) is one of the most influential ideas in observational causal inference. Rosenbaum and Rubin’s foundational result shows that if treatment assignment is unconfounded given covariates <img src="https://latex.codecogs.com/png.latex?X">, then it is also unconfounded given the propensity score:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0Ae(X)%20=%20%5Cmathbb%7BP%7D(D%20=%201%20%5Cmid%20X),%0A"></p>
<p>the probability of receiving treatment conditional on observed covariates. In other words,</p>
<blockquote class="blockquote">
<p>Instead of matching on the full covariate vector <img src="https://latex.codecogs.com/png.latex?X">, we can just match on a <em>single scalar summary—</em>the estimated propensity score.</p>
</blockquote>
<p>This is the key idea: propensity scores reduce the curse of dimensionality. By summarizing the information in <img src="https://latex.codecogs.com/png.latex?X"> into one number that captures the likelihood of treatment, we make matching more feasible and scalable, especially when <img src="https://latex.codecogs.com/png.latex?X"> includes many variables.</p>
<p>In practice, the propensity score is rarely known and must be estimated—typically using logistic regression, probit models, or machine learning methods like random forests or gradient boosting. Once estimated, treated and control units are matched based on the closeness of their propensity scores, often using nearest-neighbor matching, caliper matching, or kernel methods. Trimming is therefore an important aspect of the process, where units with very high or very low propensity scores are excluded to improve balance and reduce bias.</p>
<p>PSM improves comparability between groups by balancing the covariates in expectation, but it comes with trade-offs. Matching on the propensity score alone does not guarantee covariate balance in any particular dataset, so it’s important to assess and diagnose balance post-matching. Moreover, PSM is sensitive to model misspecification and can perform poorly if the propensity score is estimated inaccurately or if the overlap between groups is weak.</p>
<p>Despite these caveats, PSM remains a popular and conceptually powerful tool, especially when combined with diagnostics and robustness checks. It can be particularly helpful when the number of covariates is large or mostly continuous.</p>
<hr>
</section>
<section id="coarsened-exact-matching" class="level3">
<h3 class="anchored" data-anchor-id="coarsened-exact-matching">Coarsened Exact Matching</h3>
<p>Coarsened Exact Matching (CEM) offers a practical compromise between the rigidity of exact matching and the flexibility needed for real-world data. The core idea is to</p>
<blockquote class="blockquote">
<p>Coarsen continuous covariates into broader, meaningful categories and then perform exact matching on these coarsened values.</p>
</blockquote>
<p>Formally, each covariate is discretized into bins, and treated and control units are matched only if they fall into the <em>same bin across all coarsened covariates</em>. This process reduces the granularity of the match criteria, increasing the likelihood of finding matches, while still ensuring comparability within the matched groups. Examples are turning age into 5-year intervals or income into quantile-based brackets.</p>
<p>By construction, CEM guarantees balance on the coarsened covariates—unlike propensity score matching, where balance must be checked and cannot be guaranteed a priori. CEM also allows researchers to control the level of approximation: the finer the bins, the closer it is to exact matching; the coarser the bins, the more matches you retain but the more heterogeneity you permit within matched pairs. Researchers can apply finer coarsening to critical variables and coarser groupings to less central ones.</p>
<p>However, CEM’s effectiveness depends heavily on the choice of binning. Poorly chosen coarsening can either lead to very few matches (if too fine) or poor covariate balance (if too coarse). There is a trade-off between retaining sample size and improving covariate similarity, and CEM makes this trade-off explicit and user-controllable.</p>
<hr>
</section>
<section id="optimal-matching" class="level3">
<h3 class="anchored" data-anchor-id="optimal-matching">Optimal Matching</h3>
<p>Optimal matching takes a <em>global approach</em> to the matching problem. Rather than matching each treated unit to its nearest control in isolation (as in nearest neighbor matching), it</p>
<blockquote class="blockquote">
<p>Finds the set of matched pairs that <em>minimizes the total distance</em> across all matched units.</p>
</blockquote>
<p>Formally, it solves:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmin_%7B%5Ctext%7Bmatching%7D%7D%20%5Csum_%7B(i,%20j)%20%5Cin%20%5Ctext%7Bpairs%7D%7D%20d(X_i,%20X_j),%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?d(X_i,%20X_j)"> is a distance measure between treated unit <img src="https://latex.codecogs.com/png.latex?i"> and control unit <img src="https://latex.codecogs.com/png.latex?j">.</p>
<p>The key benefit is that it avoids poor global matches that can arise when matching is done greedily or locally, one unit at a time. Optimal matching is especially useful when treatment and control groups differ significantly in size or distribution, and when you want to minimize overall imbalance rather than optimize matches for individual units.</p>
<p>However, because it solves a global optimization problem, it can be computationally intensive for large datasets. Also, while it minimizes overall distance, it doesn’t necessarily guarantee good covariate balance unless combined with preprocessing (e.g., matching on propensity scores or coarsened covariates).</p>
<p>Still, optimal matching is a powerful and principled method, particularly when used with careful distance choices and diagnostics.</p>
<hr>
</section>
<section id="genetic-matching" class="level3">
<h3 class="anchored" data-anchor-id="genetic-matching">Genetic Matching</h3>
<p>Genetic matching is an advanced matching method that uses a genetic algorithm to find an optimal weighting of covariates in the distance metric. The idea is to</p>
<blockquote class="blockquote">
<p>Automate the process of choosing how much weight each covariate should receive when determining similarity between treated and control units.</p>
</blockquote>
<p>Rather than manually selecting a distance metric like Mahalanobis or Euclidean, genetic matching searches over a space of weighted Mahalanobis distances, adjusting the weights to minimize covariate imbalance after matching. The optimization goal is to improve covariate balance. The result is a <em>customized distance metric</em> that gives higher weight to variables that are harder to balance and less to those that are already balanced.</p>
<p>Genetic matching can be used with or without propensity score preprocessing, and can accommodate interactions or higher-order terms. It’s especially powerful in settings with many covariates or complex imbalance patterns that simple metrics fail to capture.</p>
<p>However, the method is computationally intensive, often requiring many iterations of matching and balance assessment. Its performance also depends on the choice of balance metrics and tuning parameters in the genetic algorithm.</p>
<hr>
</section>
<section id="caliper-matching" class="level3">
<h3 class="anchored" data-anchor-id="caliper-matching">Caliper Matching</h3>
<p>Caliper matching introduces a distance threshold to restrict which treated and control units can be matched. Specifically,</p>
<blockquote class="blockquote">
<p>A treated unit is only matched to a control unit if the distance between them is <em>within a pre-specified caliper</em>.</p>
</blockquote>
<p>That is, if the difference falls below a set limit. For example, when matching on propensity scores, a common rule is to match only if the absolute difference in propensity scores is less than 0.1:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%7Ce(X_i%5E%7B%5Ctext%7Btreated%7D%7D)%20-%20e(X_j%5E%7B%5Ctext%7Bcontrol%7D%7D)%7C%20%3C%20%5Ctext%7Bcaliper%7D%0A"></p>
<p>This constraint helps <em>avoid poor matches</em>, especially when treated and control groups have limited overlap. Without calipers, nearest neighbor matching might pair units with very different covariate profiles, particularly in the tails of the propensity score distribution. These poor matches can increase bias and undermine the credibility of causal estimates.</p>
<p>Caliper matching is not a matching method on its own but rather a modification to existing strategies—most often to nearest neighbor matching. It can also be combined with optimal matching or Mahalanobis distance.</p>
<p>Choosing the right caliper width is important: too wide, and the constraint has little effect; too narrow, and many treated units may be left unmatched, reducing sample size and precision.</p>
<p>Caliper matching is particularly useful when the common support assumption is questionable—i.e., when treated and control groups do not overlap well in covariate space. In such cases, calipers serve as a safeguard to maintain the quality of matches by explicitly enforcing local comparability.</p>
<hr>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Matching methods are powerful tools for causal inference.</li>
<li>They come in many flavors, each with its own strengths and weaknesses.</li>
<li>No single method is best for all situations; the choice depends on the data, the research question, and the assumptions you are willing to make.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>The book <em>Causal Inference for Statistics, Social, and Biomedical Sciences</em> by Imbens and Rubin (2015) provides excellent coverage of matching and its theoretical underpinnings. I also recommend Stuart (2010)’s seminal review paper cited below. The <code>MatchIt</code> and <code>Matching</code> <code>R</code> packages documentation are also goldmines for practical implementation details.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Abadie, A., &amp; Imbens, G. W. (2016). Matching on the estimated propensity score. Econometrica, 84(2), 781-807.</p>
<p>Ben-Michael, E., Feller, A., Hirshberg, D. A., &amp; Zubizarreta, J. R. (2021). The balancing act in causal inference. arXiv preprint arXiv:2110.14831.</p>
<p>Diamond, A., &amp; Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932-945.</p>
<p>Iacus, S. M., King, G., &amp; Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political analysis, 20(1), 1-24.</p>
<p>Imbens, G. W. (2015). Matching methods in practice: Three examples. Journal of Human Resources, 50(2), 373-419.</p>
<p>Imbens, G. W., &amp; Rubin, D. B. (2015). <em>Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction</em>. Cambridge University Press.</p>
<p>Rosenbaum, P. R., &amp; Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. <em>Biometrika</em>, 70(1), 41–55.</p>
<p>Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. <em>Statistical Science</em>, 25(1), 1–21.</p>
<p>Rosenbaum, P. R. (2002). <em>Observational Studies</em>. Springer.</p>


</section>

 ]]></description>
  <category>causal inference</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-matching-methods.html</guid>
  <pubDate>Tue, 27 May 2025 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Variable Selection</title>
  <link>https://vyasenov.github.io/blog/flavors-var-selection.html</link>
  <description><![CDATA[ 





<div class="reading-time">11 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>If you’ve ever worked with high-dimensional data, you’ve likely faced a familiar challenge: too many variables. Some features are pure noise, others are redundant or collinear, and only a handful truly matter. The question is: how do you tell the difference? This challenge lies at the heart of what we call variable selection.</p>
<p>Over time, statisticians and machine learning researchers have created a diverse toolbox of techniques to tackle this problem—each rooted in different ideas, with its own strengths and trade-offs. Some methods apply penalties to shrink coefficients, like Lasso and Ridge. Others use geometric insights, like Principal Components Analysis (PCA). There are methods built on randomization, like Model-X Knockoffs, and some that rely on greedy or stepwise searches, such as Forward Selection and Least Angle Regression (LAR).</p>
<p>In this post, I’ll take a guided tour through these approaches—what they do, when to use them, and why they work. I’ll also explore their limitations, because no method is a silver bullet. The goal isn’t to pick a winner, but to help you figure out which tool fits your problem. Think of it as a field guide to variable selection, focused on ideas and intuition—so you can navigate the landscape with more confidence and clarity. And, yes, there will be plenty of <code>R</code> and <code>Python</code> code snippets to illustrate each method in action.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Suppose we observe data <img src="https://latex.codecogs.com/png.latex?(Y,%20X)">, where <img src="https://latex.codecogs.com/png.latex?Y%20%5Cin%20%5Cmathbb%7BR%7D%5En"> is the outcome vector and <img src="https://latex.codecogs.com/png.latex?X%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D"> is the matrix of predictors (covariates, features, regressors—pick your favorite term).</p>
<p>We’re interested in estimating a relationship like: <img src="https://latex.codecogs.com/png.latex?%0AY%20=%20X%20%5Cbeta%20+%20%5Cvarepsilon,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cbeta%20%5Cin%20%5Cmathbb%7BR%7D%5Ep"> is the vector of coefficients and <img src="https://latex.codecogs.com/png.latex?%5Cvarepsilon"> is the error term.</p>
<p>In high-dimensional settings, <img src="https://latex.codecogs.com/png.latex?p"> may be large—possibly even larger than <img src="https://latex.codecogs.com/png.latex?n">. The core task of variable selection is to identify which components of <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> are nonzero (or, more generally, which features matter for predicting <img src="https://latex.codecogs.com/png.latex?Y">).</p>
<p>(Distinguishing prediction and inference is crucial here: we focus on the former, so we ignore things like confidence intervals or <img src="https://latex.codecogs.com/png.latex?p">-values for coefficients altogether. The latter is a much <a href="https://vyasenov.github.io/blog/hypothesis-testing-linear-ml.html">more complex problem</a>.)</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<p>We begin by loading the data.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load iris data</span></span>
<span id="cb1-5">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb1-6">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris.frame</span>
<span id="cb1-7">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal length (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]]</span>
<span id="cb1-8">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span></code></pre></div></div>
</div>
</div>
</div>
<p>Now let’s examine each of the methods in turn.</p>
<section id="stepwise-selection-forward-backward-both" class="level3">
<h3 class="anchored" data-anchor-id="stepwise-selection-forward-backward-both">Stepwise Selection (Forward, Backward, Both)</h3>
<p>The classic workhorse of variable selection, stepwise procedures iteratively add or remove variables based on some criterion like AIC (Aikake Information Criterion), BIC (Bayesian Information Criterion), or <img src="https://latex.codecogs.com/png.latex?p">-values. In forward selection, you start with no variables and add the one that improves the model the most. In backward elimination with <img src="https://latex.codecogs.com/png.latex?p%3Cn">, you start with all variables and remove the least significant one at each step. Both methods can also be combined in a bidirectional stepwise approach. In either case, you stop when adding or removing variables no longer sufficiently improves the model according to your chosen criterion.</p>
<p>Stepwise selection can work well for smaller problems where computational cost is low and interpretability is key (although we have recently made some progress on the computation side). However, it is unstable and prone to overfitting.</p>
<p>We are now ready to start with the modeling part.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(MASS)</span>
<span id="cb2-2">full_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris)</span>
<span id="cb2-3">step_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stepAIC</span>(full_model, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"both"</span>)</span>
<span id="cb2-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(step_model)</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> mlxtend.feature_selection <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> SequentialFeatureSelector <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> SFS</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stepwise selection (both directions)</span></span>
<span id="cb3-5">sfs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SFS(LinearRegression(),</span>
<span id="cb3-6">          k_features<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'best'</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Select best number of features</span></span>
<span id="cb3-7">          forward<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb3-8">          floating<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enables bidirectional selection</span></span>
<span id="cb3-9">          scoring<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'neg_mean_squared_error'</span>,</span>
<span id="cb3-10">          cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)               <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># No cross-validation, like stepAIC</span></span>
<span id="cb3-11"></span>
<span id="cb3-12">sfs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sfs.fit(X, y)</span>
<span id="cb3-13"></span>
<span id="cb3-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Selected features</span></span>
<span id="cb3-15"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Selected features:'</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(sfs.k_feature_names_))</span>
<span id="cb3-16"></span>
<span id="cb3-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit final model</span></span>
<span id="cb3-18">selected_X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(sfs.k_feature_names_)]</span>
<span id="cb3-19">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression().fit(selected_X, y)</span>
<span id="cb3-20"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(model.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="lasso-aka-ell_1-regularization" class="level3">
<h3 class="anchored" data-anchor-id="lasso-aka-ell_1-regularization">Lasso (aka <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> Regularization)</h3>
<p>Lasso introduced the big idea of <em>sparsity</em>, that only some variables enter the model. It penalizes the sum of the absolute values of the coefficients:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%5E%7B%5Ctext%7Blasso%7D%7D%20=%20%5Carg%5Cmin_%7B%5Cbeta%7D%20%5Cleft%5C%7B%20%5C%7C%20Y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda%20%5C%7C%20%5Cbeta%20%5C%7C_1%20%5Cright%5C%7D.%0A"></p>
<p>The magic of the <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty is that it can shrink some coefficients exactly to zero, performing variable selection as part of the estimation. Over the years, Lasso has become a staple in the variable selection toolkit. Its theoretical properties have been studied extensively, and it has been shown to work well in many practical scenarios.</p>
<p>Part of its appeal and popularity is the computation efficiency where modern algorithms can solve the entire regularization path efficiently. Lasso comes in a wide variety of flavors, including group lasso, adaptive lasso, and fused lasso, which I will probably cover in a future blog post. Be careful, though, lasso is known to be biased, so it’s great for prediction, but don’t take its coefficients at face value.</p>
<p>Lasso is a good idea when you believe that only a subset of predictors are relevant and want an interpretable model. It can struggle with groups of correlated predictors (tends to pick one arbitrarily), and is known to be biased due to shrinkage.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb4-3">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal.Width"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Length"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>)])</span>
<span id="cb4-4">Y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length</span>
<span id="cb4-5">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb4-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LassoCV</span>
<span id="cb5-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb5-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb5-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb5-5"></span>
<span id="cb5-6">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).frame</span>
<span id="cb5-7">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal length (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]]</span>
<span id="cb5-8">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span>
<span id="cb5-9">lasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LassoCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>).fit(X, y)</span>
<span id="cb5-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(lasso.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="ridge-regression-aka-ell_2-regularization" class="level3">
<h3 class="anchored" data-anchor-id="ridge-regression-aka-ell_2-regularization">Ridge Regression (aka <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> Regularization)</h3>
<p>Ridge regression doesn’t exactly <em>select</em> variables—it shrinks them. The idea is to add a penalty on the size of the coefficients:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%5E%7B%5Ctext%7Bridge%7D%7D%20=%20%5Carg%5Cmin_%7B%5Cbeta%7D%20%5Cleft%5C%7B%20%5C%7C%20Y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda%20%5C%7C%20%5Cbeta%20%5C%7C_2%5E2%20%5Cright%5C%7D.%0A"></p>
<p>Here, <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%5Cge%200"> is a tuning parameter that controls the strength of the penalty. As <img src="https://latex.codecogs.com/png.latex?%5Clambda"> increases, the solution is increasingly biased toward zero, but the variance decreases, which can improve out-of-sample performance.</p>
<p>Unlike the lasso, Ridge regression does not produce sparse solutions—none of the coefficients are exactly zero. Instead, it distributes shrinkage smoothly across all variables, which can be helpful when all predictors contribute weakly and roughly equally.</p>
<p>Ridge is also computationally convenient. The modified normal equations involve the matrix <img src="https://latex.codecogs.com/png.latex?X%5E%5Ctop%20X%20+%20%5Clambda%20I">, which is always invertible when <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%3E%200">, even if <img src="https://latex.codecogs.com/png.latex?X%5E%5Ctop%20X"> is singular. As a result, Ridge provides a unique and stable solution even in high-dimensional settings where <img src="https://latex.codecogs.com/png.latex?p%20%3E%20n">—a situation where ordinary least squares (OLS) fails due to non-identifiability.</p>
<p>Ridge is especially good when multicollinearity is a problem; when you prefer stability over sparsity; or when many small effects contribute to the outcome.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb6-3">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal.Width"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Length"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>)])</span>
<span id="cb6-4">Y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length</span>
<span id="cb6-5">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb6-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> RidgeCV</span>
<span id="cb7-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb7-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb7-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb7-5"></span>
<span id="cb7-6">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).frame</span>
<span id="cb7-7">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal length (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]]</span>
<span id="cb7-8">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span>
<span id="cb7-9">lasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> RidgeCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>).fit(X, y)</span>
<span id="cb7-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(lasso.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="elastic-net" class="level3">
<h3 class="anchored" data-anchor-id="elastic-net">Elastic Net</h3>
<p>Elastic Net combines the strengths of both Ridge and Lasso by blending their penalties into a single regularization framework:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D%5E%7B%5Ctext%7BEN%7D%7D%20=%20%5Carg%5Cmin_%7B%5Cbeta%7D%20%5Cleft%5C%7B%20%5C%7C%20Y%20-%20X%20%5Cbeta%20%5C%7C_2%5E2%20+%20%5Clambda_1%20%5C%7C%20%5Cbeta%20%5C%7C_1%20+%20%5Clambda_2%20%5C%7C%20%5Cbeta%20%5C%7C_2%5E2%20%5Cright%5C%7D.%0A"></p>
<p>This formulation retains the sparsity-inducing property of the Lasso via the <img src="https://latex.codecogs.com/png.latex?%5Cell_1"> penalty while incorporating the stabilizing effect of Ridge regression through the <img src="https://latex.codecogs.com/png.latex?%5Cell_2"> penalty. The result is a model that not only performs variable selection but also handles groups of correlated predictors more gracefully than Lasso alone, which tends to pick one variable from a group and ignore the rest.</p>
<p>Elastic Net is especially helpful in high-dimensional settings where predictors are strongly correlated or when <img src="https://latex.codecogs.com/png.latex?p%20%5Cgg%20n">. The two tuning parameters, <img src="https://latex.codecogs.com/png.latex?%5Clambda_1"> and <img src="https://latex.codecogs.com/png.latex?%5Clambda_2">, control the trade-off between sparsity and smooth shrinkage. In practice, these are often reparameterized using a single penalty term <img src="https://latex.codecogs.com/png.latex?%5Clambda"> and a mixing proportion <img src="https://latex.codecogs.com/png.latex?%5Calpha"> (as in many software packages), where:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_1%20=%20%5Clambda%20%5Calpha,%20%5Cquad%20%5Clambda_2%20=%20%5Clambda%20(1%20-%20%5Calpha).%0A"></p>
<p>This makes it easy to interpolate between Ridge (<img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%200">) and Lasso (<img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%201">), giving you a continuum of models with different regularization characteristics.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb8-3">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal.Width"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Length"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>)])</span>
<span id="cb8-4">Y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length</span>
<span id="cb8-5">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb8-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ElasticNetCV</span>
<span id="cb9-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_iris</span>
<span id="cb9-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb9-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb9-5"></span>
<span id="cb9-6">iris <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_iris(as_frame<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>).frame</span>
<span id="cb9-7">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal width (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal length (cm)'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'petal width (cm)'</span>]]</span>
<span id="cb9-8">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> iris[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sepal length (cm)'</span>]</span>
<span id="cb9-9">lasso <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ElasticNetCV(cv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>).fit(X, y)</span>
<span id="cb9-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(lasso.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="principal-components-regression-pcr" class="level3">
<h3 class="anchored" data-anchor-id="principal-components-regression-pcr">Principal Components Regression (PCR)</h3>
<p>Principal Components Analysis (PCA) finds linear combinations of the original variables that explain the most variance of the entire dataset.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cmax_%7Bw%20%5Cin%20%5Cmathbb%7BR%7D%5Ep,%5C,%20%5C%7Cw%5C%7C%20=%201%7D%20%5C;%20%5Cmathrm%7BVar%7D(Xw)%20=%20w%5E%5Ctop%20%5CSigma%20w%0A"></p>
<p>In Principal Components Regression, we regress <img src="https://latex.codecogs.com/png.latex?Y"> on the top <img src="https://latex.codecogs.com/png.latex?k"> principal components of <img src="https://latex.codecogs.com/png.latex?X"> instead of on the original variables.</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7B%5Cbeta%7D_%7B%5Ctext%7BPCR%7D%7D%20=%20V_k%20(Z%5E%5Ctop%20Z)%5E%7B-1%7D%20Z%5E%5Ctop%20y%0A"></p>
<p>PCA is among the most popular methods for dimensionality reduction even among junior data scientists, so I won’t spend too much time on it here. PCA lives in dual nature, with one foot in unsupervised learning (finding components) and the other in supervised learning (variable selection). Note how the term variable selection here is used indirectly, since it selects combinations of variables, not individual variables. Its main strength is its incredible versatility and ability to handle high-dimensional data, but its output can be challenging to interpret.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pls)</span>
<span id="cb10-2">pcr_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pcr</span>(Sepal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> Sepal.Width <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Length <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Petal.Width, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> iris, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">validation =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CV"</span>)</span>
<span id="cb10-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(pcr_model)</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.decomposition <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> PCA</span>
<span id="cb11-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb11-3">pca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> PCA(n_components<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb11-4">X_pca <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pca.fit_transform(X)</span>
<span id="cb11-5">reg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression().fit(X_pca, y)</span>
<span id="cb11-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(reg.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pca.get_feature_names_out()))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="least-angle-regression-lar" class="level3">
<h3 class="anchored" data-anchor-id="least-angle-regression-lar">Least Angle Regression (LAR)</h3>
<p>Least Angle Regression (LAR) is a greedy, stepwise variable selection algorithm that adds predictors to a linear model incrementally. At each step, it moves in the direction of the predictor most correlated with the current residual, just like forward selection—but with a twist: it adjusts the direction gradually as more variables become equally correlated with the residuals. How it works:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<ol type="1">
<li>Start with all coefficients set to zero.</li>
<li>Find the predictor most correlated with the current residual.</li>
<li>Move the coefficient of that variable in the direction of its sign until another predictor becomes equally correlated with the residual.</li>
<li>Continue in a “least angle” direction, adjusting the path to include both predictors, and so on.</li>
</ol>
</div>
</div>
<p>The result is a sequence of models, each with one more active variable—just like in forward stepwise regression, but using geometry rather than brute force.</p>
<p>Geometrically, LAR moves along piecewise linear paths toward the least squares solution, and its trajectory closely tracks that of Lasso. In fact, with a small modification, LAR can be used to compute the entire Lasso solution path.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lars)</span>
<span id="cb12-2">lar_model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lars</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lar"</span>)</span>
<span id="cb12-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(lar_model)</span></code></pre></div></div>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Lars</span>
<span id="cb13-2">lar <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Lars().fit(X, y)</span>
<span id="cb13-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(lar.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="scad-smoothly-clipped-absolute-deviation" class="level3">
<h3 class="anchored" data-anchor-id="scad-smoothly-clipped-absolute-deviation">SCAD (Smoothly Clipped Absolute Deviation)</h3>
<p>SCAD (Smoothly Clipped Absolute Deviation) is a non-convex penalty introduced by Fan and Li (2001) to address a key limitation of the Lasso: its tendency to over-shrink large coefficients, leading to biased estimates for important variables.</p>
<p>The SCAD penalty is designed to encourage sparsity like the Lasso for small coefficients, but to relax the penalty for larger ones. In other words, it behaves like Lasso near zero—pushing small coefficients toward zero—but reduces shrinkage as coefficients grow, effectively preserving the size of large signals.</p>
<p>Mathematically, the derivative of the SCAD penalty is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AP'_%5Clambda(%5Cbeta)%20=%20%5Clambda%20%5Cleft%5B%20I(%7C%5Cbeta%7C%20%5Cleq%20%5Clambda)%20+%20%5Cfrac%7B(a%20%5Clambda%20-%20%7C%5Cbeta%7C)_+%7D%7B(a%20-%201)%5Clambda%7D%20I(%7C%5Cbeta%7C%20%3E%20%5Clambda)%20%5Cright%5D,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?a%20%3E%202"> (typically <img src="https://latex.codecogs.com/png.latex?a%20=%203.7">) and <img src="https://latex.codecogs.com/png.latex?(x)_+%20=%20%5Cmax(0,%20x)"> denotes the positive part. This piecewise definition ensures a smooth transition:</p>
<ul>
<li>For small coefficients <img src="https://latex.codecogs.com/png.latex?%7C%5Cbeta%7C%20%5Cleq%20%5Clambda">, it behaves like the Lasso.</li>
<li>For moderate coefficients <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%3C%20%7C%5Cbeta%7C%20%3C%20a%20%5Clambda">, the penalty decreases gradually.</li>
<li>For large coefficients <img src="https://latex.codecogs.com/png.latex?%7C%5Cbeta%7C%20%5Cge%20a%5Clambda">, the penalty becomes flat—effectively applying no further shrinkage.</li>
</ul>
<p>This adaptive behavior helps SCAD achieve a balance between sparsity and unbiasedness. Although the non-convexity makes optimization more challenging than with Lasso or Ridge, the SCAD penalty is continuous and piecewise smooth, allowing the use of local coordinate descent algorithms and oracle-like properties under certain conditions. The non-convex objective can lead to multiple local minima, making optimization more delicate and computationally intensive.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ncvreg)</span>
<span id="cb14-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb14-3">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sepal.Width"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Length"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Petal.Width"</span>)])</span>
<span id="cb14-4">Y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Sepal.Length</span>
<span id="cb14-5"></span>
<span id="cb14-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit SCAD-penalized regression</span></span>
<span id="cb14-7">scad_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncvreg</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SCAD"</span>)</span>
<span id="cb14-8"></span>
<span id="cb14-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Plot cross-validated error</span></span>
<span id="cb14-10">cv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.ncvreg</span>(X, Y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SCAD"</span>)</span>
<span id="cb14-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(cv)</span>
<span id="cb14-12"></span>
<span id="cb14-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Coefficients at optimal lambda</span></span>
<span id="cb14-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(cv, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"min"</span>)</span></code></pre></div></div>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> skglm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GeneralizedLinearEstimator</span>
<span id="cb15-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> skglm.datafits <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Quadratic</span>
<span id="cb15-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> skglm.penalties <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> SCAD</span>
<span id="cb15-4"></span>
<span id="cb15-5">scad <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GeneralizedLinearEstimator(Quadratic(), SCAD(alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, gamma<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.7</span>))</span>
<span id="cb15-6">scad.fit(X, y)</span>
<span id="cb15-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(pd.Series(scad.coef_, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>X.columns))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="knockoffs" class="level3">
<h3 class="anchored" data-anchor-id="knockoffs">Knockoffs</h3>
<p>Knockoffs, introduced by Barber and Candès (2015), is a clever framework for variable selection with <strong>false discovery rate (FDR) control</strong>. The method constructs “knockoff copies” of each feature—artificial variables that mimic the correlation structure of the real ones but are known to be null. Then it tests whether the real variables outperform their knockoffs.</p>
<p>I have <a href="https://vyasenov.github.io/blog/flavors-multiple-testing.html">written about knockoffs</a> in more detail in previous posts, so I won’t go into the details here. Just like PCA, knockoffs live in dual nature, with one foot in the multiple testing literature (constructing knockoffs) and the other in supervised learning world (variable selection).</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-9-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-1" aria-controls="tabset-9-1" aria-selected="true" href="">R</a></li></ul>
<div class="tab-content">
<div id="tabset-9-1" class="tab-pane active" aria-labelledby="tabset-9-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"></span>
<span id="cb16-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clear workspace</span></span>
<span id="cb16-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">list =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ls</span>())</span>
<span id="cb16-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(knockoff)</span>
<span id="cb16-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glmnet)</span>
<span id="cb16-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb16-7"></span>
<span id="cb16-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load data</span></span>
<span id="cb16-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(iris)</span>
<span id="cb16-10"></span>
<span id="cb16-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Prepare the data (binary classification)</span></span>
<span id="cb16-12">iris_binary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> iris <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(Species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"setosa"</span>)</span>
<span id="cb16-13">X <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(iris_binary[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># numeric predictors</span></span>
<span id="cb16-14">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(iris_binary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"virginica"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># binary target: virginica vs versicolor</span></span>
<span id="cb16-15"></span>
<span id="cb16-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Create knockoff copies</span></span>
<span id="cb16-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use the default Gaussian model-X knockoffs</span></span>
<span id="cb16-18">knockoffs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create.fixed</span>(X)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># creates a list with X and X_k (knockoffs)</span></span>
<span id="cb16-19"></span>
<span id="cb16-20">X_knock <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> knockoffs<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Xk</span>
<span id="cb16-21"></span>
<span id="cb16-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Combine X and knockoffs and fit a Lasso model</span></span>
<span id="cb16-23">X_combined <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(X, X_knock)</span>
<span id="cb16-24">fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(X_combined, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb16-25"></span>
<span id="cb16-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 4: Compute importance statistics (lasso coefficients at lambda.min)</span></span>
<span id="cb16-27">coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(fit, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove intercept</span></span>
<span id="cb16-28">p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncol</span>(X)</span>
<span id="cb16-29"></span>
<span id="cb16-30">W <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>p]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(coefs[(p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p)])  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># feature importance W-statistic</span></span>
<span id="cb16-31"></span>
<span id="cb16-32"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 5: Apply knockoff threshold to select features</span></span>
<span id="cb16-33">threshold <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">knockoff.threshold</span>(W, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fdr =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># control FDR at 10%</span></span>
<span id="cb16-34">selected <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> threshold)</span>
<span id="cb16-35"></span>
<span id="cb16-36"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 6: Print results</span></span>
<span id="cb16-37">feature_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(X)</span>
<span id="cb16-38"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Selected features controlling FDR at 10%:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb16-39"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(feature_names[selected])</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="foci-feature-ordering-by-conditional-independence" class="level3">
<h3 class="anchored" data-anchor-id="foci-feature-ordering-by-conditional-independence">FOCI (Feature Ordering by Conditional Independence)</h3>
<p>FOCI is a recent, information-theoretic method that orders features by how much conditional mutual information they contribute to the outcome. It’s model-free and does not assume a particular parametric form. I have also written about FOCI in a <a href="https://vyasenov.github.io/blog/foci.html">previous post</a>, so I won’t repeat the details here.</p>
<hr>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>Lasso, Ridge, and Elastic Net are the go-to penalized regression methods, with Lasso giving sparsity, Ridge providing stability, and Elastic Net blending the two.</li>
<li>Non-convex penalties like SCAD address Lasso’s bias issue but at a computational cost.</li>
<li>PCA-based methods reduce dimensionality but don’t directly select variables.</li>
<li>Knockoffs offer strong statistical guarantees like FDR control but require careful implementation.</li>
<li>Modern approaches like FOCI expand the toolkit to nonlinear and information-theoretic settings.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>For a great introduction to penalized regression methods, <em>The Elements of Statistical Learning</em> by Hastie, Tibshirani, and Friedman is a classic. As always, you can reach for <em>Computer Age Statistical Inference</em> or <em>All of Statistics</em> and they won’t let you down.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Barber, R. F., &amp; Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. <em>Annals of Statistics</em>, 43(5), 2055–2085.</p>
<p>Efron, B., &amp; Hastie, T. (2021). Computer age statistical inference, student edition: algorithms, evidence, and data science (Vol. 6). Cambridge University Press.</p>
<p>Efron, B., Hastie, T., Johnstone, I., &amp; Tibshirani, R. (2004). Least angle regression.</p>
<p>Fan, J., &amp; Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. <em>Journal of the American Statistical Association</em>, 96(456), 1348–1360.</p>
<p>Hastie, T., Tibshirani, R., &amp; Friedman, J. (2009). <em>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</em>. Springer.</p>
<p>Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.</p>
<p>Wasserman, L. (2004). All of statistics: a concise course in statistical inference. Springer Science &amp; Business Media.</p>


</section>

 ]]></description>
  <category>variable selection</category>
  <category>machine learning</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-var-selection.html</guid>
  <pubDate>Mon, 26 May 2025 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Many Flavors of Bootstrap</title>
  <link>https://vyasenov.github.io/blog/flavors-bootstrap.html</link>
  <description><![CDATA[ 





<div class="reading-time">8 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>At its heart, the bootstrap poses a simple yet powerful question: “What if we could resample from our existing data, treating it as a stand-in for the population?” By doing so, we can estimate variability, build confidence intervals, and carry out hypothesis tests—without leaning heavily on strong parametric assumptions. It’s especially useful in situations where analytic solutions exist in theory but are too complex to derive or even implement in practice.</p>
<p>But here’s the thing: there isn’t just <em>one</em> bootstrap. Over the years, statisticians have developed many flavors of the bootstrap to address different challenges in different settings. Some handle small samples better. Some are designed for dependent data like time series. Others shine when the assumptions of classic bootstrapping crumble (think clustered data or heteroskedasticity).</p>
<p>In this article, I’ll take a tour through the zoo of bootstrap methods: from the classic nonparametric bootstrap to the jackknife, parametric bootstrap, Bayesian bootstrap, wild bootstrap, moving block bootstrap, and more. I’ll explore where each method shines, where it stumbles, and how to pick the right one for your problem. As usual, I won’t just throw formulas at you. The focus here is on understanding <em>why</em> these methods work, not just how to mechanically apply them. There is also plenty of <code>R</code> and <code>Python</code> code to illustrate each method in action.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>We have data <img src="https://latex.codecogs.com/png.latex?%5C%7BY_1,%20Y_2,%20%5Cdots,%20Y_n%5C%7D">, where <img src="https://latex.codecogs.com/png.latex?Y_i"> are independent and identically distributed (i.i.d.) random variables drawn from some unknown distribution <img src="https://latex.codecogs.com/png.latex?F">. We’re interested in estimating some parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20=%20T(F),"> like the mean, median, regression coefficients, or a more complicated functional.</p>
<p>Our estimator of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> from the observed sample is <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%20=%20T(%5Chat%7BF%7D_n),"> where <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n"> is the empirical distribution function that puts mass <img src="https://latex.codecogs.com/png.latex?1/n"> on each observed data point.</p>
<p>The big question is: <em>How variable is <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D">?</em> And that’s where the bootstrap comes in. Regardless of the type of bootstrap, given a bunch of <img src="https://latex.codecogs.com/png.latex?B"> estimates of <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, its variance is computed as: <img src="https://latex.codecogs.com/png.latex?%0A%5Chat%7BV%7D_%7B%5Ctext%7Bboot%7D%7D%20=%20%5Cfrac%7B1%7D%7BB%20-%201%7D%20%5Csum_%7Bb=1%7D%5EB%20%5Cleft(%20%5Chat%7B%5Ctheta%7D%5E*_b%20-%20%5Cbar%7B%5Ctheta%7D%5E*%20%5Cright)%5E2,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cbar%7B%5Ctheta%7D%5E*%20=%20%5Cfrac%7B1%7D%7BB%7D%20%5Csum_%7Bb=1%7D%5EB%20%5Chat%7B%5Ctheta%7D%5E*_b"> is the average of the bootstrap estimates.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="the-jackknife" class="level3">
<h3 class="anchored" data-anchor-id="the-jackknife">The Jackknife</h3>
<p>Let’s start with the jackknife, developed back in the 1950s by Quenouille and popularized by Tukey. The jackknife isn’t technically a bootstrap, but it’s often the gateway to resampling methods. Here is how it works:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For <img src="https://latex.codecogs.com/png.latex?i%20=%201,%20%5Cdots,%20n">:</p>
<ol type="1">
<li>Drop observation <img src="https://latex.codecogs.com/png.latex?i"> at a time and recompute your estimate.</li>
<li>Compute the jackknife estimate, <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D_%7B(i)%7D%20=%20T(%5Chat%7BF%7D_%7Bn,-i%7D)">, on the remaining <img src="https://latex.codecogs.com/png.latex?n-1"> observations.</li>
</ol>
</div>
</div>
<p>Here <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_%7Bn,-i%7D"> is the empirical distribution leaving out the <img src="https://latex.codecogs.com/png.latex?i">-th observation. We then use the variability across these “leave-one-out” estimates to approximate the variance of <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> following the formula above.</p>
<p>The jackknife works well for smooth statistics like the mean or regression coefficients. But it can fail miserably for non-smooth functionals like the median or quantiles.</p>
<p><strong>Strengths:</strong> Fast, easy to implement, no randomness involved.</p>
<p><strong>Weaknesses:</strong> Limited to statistics that are smooth in the data. Doesn’t handle complex dependency structures or non-smooth parameters well.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb1-2">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb1-3">jackknife_estimates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y), <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(i) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>i]))</span>
<span id="cb1-4">jackknife_variance <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(jackknife_estimates)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(jackknife_variance)</span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb2-3">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.normal(size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb2-4">jackknife_estimates <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([np.mean(np.delete(y, i)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y))])</span>
<span id="cb2-5">jackknife_variance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> np.var(jackknife_estimates, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb2-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(jackknife_variance)</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="classic-nonparametric-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="classic-nonparametric-bootstrap">Classic Nonparametric Bootstrap</h3>
<p>The classic bootstrap, introduced by Bradley Efron in 1979, takes the idea of resampling and turns it up a notch. Instead of dropping one observation at a time, we repeatedly resample <em>with replacement</em> from our data to create many “new” datasets, each the same size as the original.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap sample <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Sample <img src="https://latex.codecogs.com/png.latex?n"> observations with replacement from your data.</li>
<li>Compute the statistic <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p><strong>Strengths:</strong> Flexible, broadly applicable, works well for non-smooth statistics.</p>
<p><strong>Weaknesses:</strong> Can struggle with small samples or dependent data (like time series). Resampling with replacement assumes independence.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb3-2">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb3-3">B <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb3-4">boot_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(B, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)))</span>
<span id="cb3-5">boot_variance <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(boot_means)</span>
<span id="cb3-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(boot_variance)</span></code></pre></div></div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb4-2">B <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb4-3">boot_means <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.mean(np.random.choice(y, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y), replace<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(B)]</span>
<span id="cb4-4">boot_variance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.var(boot_means, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb4-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(boot_variance)</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="parametric-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="parametric-bootstrap">Parametric Bootstrap</h3>
<p>The parametric bootstrap is a natural extension of the classic idea with a minor twist. Instead of sampling from the empirical distribution <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n">, you <em>assume a parametric model</em> <img src="https://latex.codecogs.com/png.latex?F_%5Ctheta"> for the data, fit it to the sample, and then generate new data from the fitted model.</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap sample <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Sample <img src="https://latex.codecogs.com/png.latex?n"> observations from <img src="https://latex.codecogs.com/png.latex?F_%5Ctheta">.</li>
<li>Compute the statistic <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p>For example, if you assume <img src="https://latex.codecogs.com/png.latex?Y_i%20%5Csim%20N(%5Cmu,%20%5Csigma%5E2)">, estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cmu%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Csigma%7D%5E2">, and then generate bootstrap samples from <img src="https://latex.codecogs.com/png.latex?N(%5Chat%7B%5Cmu%7D,%20%5Chat%7B%5Csigma%7D%5E2)">.</p>
<p>The parametric bootstrap can be a good idea when you trust your parametric model (or at least trust it more than the empirical distribution) and want to leverage that structure.</p>
<p><strong>Strengths:</strong> More efficient than nonparametric bootstrap if the model is well-specified. Can handle small samples better.</p>
<p><strong>Weaknesses:</strong> Garbage in, garbage out—if the parametric model is wrong, so are your bootstrap results.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb5-2">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb5-3">mu_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(y)</span>
<span id="cb5-4">sigma_hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(y)</span>
<span id="cb5-5">param_boot_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(B, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, mu_hat, sigma_hat)))</span>
<span id="cb5-6">param_boot_variance <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(param_boot_means)</span>
<span id="cb5-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(param_boot_variance)</span></code></pre></div></div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">mu_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.mean(y)</span>
<span id="cb6-2">sigma_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.std(y, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb6-3">param_boot_means <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [np.mean(np.random.normal(mu_hat, sigma_hat, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y))) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(B)]</span>
<span id="cb6-4">param_boot_variance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.var(param_boot_means, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb6-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(param_boot_variance)</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="bayesian-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="bayesian-bootstrap">Bayesian Bootstrap</h3>
<p>Invented by Rubin in 1981, the Bayesian bootstrap doesn’t resample data points directly. Instead, it puts a <em>Dirichlet prior</em> on the weights assigned to each observation.</p>
<p>Whereas the classical bootstrap simulates sampling from a population by creating new samples from the observed data, the Bayesian bootstrap simulates uncertainty about the population distribution itself using the Bayesian framework—specifically by placing a nonparametric prior over the unknown distribution (implicitly, a Dirichlet process prior).</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap replicate <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Draw weights <img src="https://latex.codecogs.com/png.latex?(w_1,%20%5Cdots,%20w_n)%20%5Csim%20%5Ctext%7BDirichlet%7D(1,%20%5Cdots,%201)">.</li>
<li>Construct the weighted empirical distribution <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D%5E*_b%20=%20%5Csum_%7Bi=1%7D%5En%20w_i%5E%7B(b)%7D%20%5Cdelta_%7BY_i%7D">, where <img src="https://latex.codecogs.com/png.latex?%5Cdelta_%7BY_i%7D"> is a point mass at observation <img src="https://latex.codecogs.com/png.latex?Y_i">.</li>
<li>Compute the weighted statistic: <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p><strong>Strengths:</strong> Smooth, avoids ties from discrete resampling, easy to implement.</p>
<p><strong>Weaknesses:</strong> Interpretation may feel less intuitive if you’re used to classical frequentist bootstrap.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(MCMCpack)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for rdirichlet</span></span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb7-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb7-4">B <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb7-5">bayes_boot_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(B, {</span>
<span id="cb7-6">  weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rdirichlet</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y))))</span>
<span id="cb7-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(weights <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> y)</span>
<span id="cb7-8">})</span>
<span id="cb7-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(bayes_boot_means)</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy.stats <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> dirichlet</span>
<span id="cb8-2">bayes_boot_means <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb8-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(B):</span>
<span id="cb8-4">    weights <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dirichlet.rvs([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(y))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb8-5">    bayes_boot_means.append(np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(weights <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> y))</span>
<span id="cb8-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.var(bayes_boot_means, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="wild-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="wild-bootstrap">Wild Bootstrap</h3>
<p>The wild bootstrap is a lifesaver when dealing with <em>heteroskedasticity</em> or few clusters. Rather than resampling entire observations (which breaks the structure of heteroskedastic errors), the wild bootstrap keeps the design matrix <img src="https://latex.codecogs.com/png.latex?X"> fixed and perturbs only the residuals—in a way that maintains heteroskedasticity-consistent variability. Some versions modify the score function instead of the residuals.</p>
<p>Suppose you’re estimating a regression model: <img src="https://latex.codecogs.com/png.latex?%0AY_i%20=%20X_i%20%5Cbeta%20+%20%5Cvarepsilon_i.%0A"></p>
<p>Then, you proceed as follows:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm: Wild Bootstrap">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm: Wild Bootstrap
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap replicate <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Generate a new outcome variable by perturbing the residuals: <img src="https://latex.codecogs.com/png.latex?%0AY%5E*_i%20=%20X_i%20%5Chat%7B%5Cbeta%7D%20+%20v_i%20%5Chat%7B%5Cvarepsilon%7D_i,%0A"></li>
</ol>
<p>where <img src="https://latex.codecogs.com/png.latex?v_i"> are random variables with mean zero and variance one (e.g., Rademacher random variables taking values <img src="https://latex.codecogs.com/png.latex?%5Cpm1"> with probability <img src="https://latex.codecogs.com/png.latex?0.5">).</p>
<ol start="2" type="1">
<li>Refit the model using the perturbed outcomes and compute the statistic <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p><strong>Strengths:</strong> Handles heteroskedasticity gracefully, robust in small-sample settings.</p>
<p><strong>Weaknesses:</strong> Mostly designed for regression contexts. Choice of perturbation distribution matters.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb9-2">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb9-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(x))</span>
<span id="cb9-4">model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x)</span>
<span id="cb9-5">residuals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resid</span>(model)</span>
<span id="cb9-6">predicted <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fitted</span>(model)</span>
<span id="cb9-7">B <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb9-8">wild_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(B, {</span>
<span id="cb9-9">  v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(residuals), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb9-10">  y_star <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> predicted <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> v <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> residuals</span>
<span id="cb9-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(y_star <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> x))[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb9-12">})</span>
<span id="cb9-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(wild_means)</span></code></pre></div></div>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> sklearn.linear_model <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> LinearRegression</span>
<span id="cb10-2">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.normal(size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>).reshape(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb10-3">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x.flatten() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> np.random.normal(scale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>np.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(x.flatten()))</span>
<span id="cb10-4">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression().fit(x, y)</span>
<span id="cb10-5">residuals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> model.predict(x)</span>
<span id="cb10-6">predicted <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.predict(x)</span>
<span id="cb10-7">wild_boot_coefs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb10-8"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> _ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(B):</span>
<span id="cb10-9">    v <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice([<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(residuals))</span>
<span id="cb10-10">    y_star <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> predicted <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> v <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> residuals</span>
<span id="cb10-11">    coef <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> LinearRegression().fit(x, y_star).coef_[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-12">    wild_boot_coefs.append(coef)</span>
<span id="cb10-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.var(wild_boot_coefs, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
<section id="cluster-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="cluster-bootstrap">Cluster Bootstrap</h3>
<p>The cluster bootstrap is essential when working with clustered data, where observations within the same group (e.g., students in schools, workers in firms) may be correlated. Unlike standard bootstrap methods that resample individuals, the cluster bootstrap resamples <em>entire clusters</em>, preserving the internal dependence structure of the data.</p>
<p>Suppose you’re estimating a model like:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AY_%7Big%7D%20=%20X_%7Big%7D%20%5Cbeta%20+%20%5Cvarepsilon_%7Big%7D,%0A"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?g"> indexes clusters and <img src="https://latex.codecogs.com/png.latex?i"> indexes observations within cluster <img src="https://latex.codecogs.com/png.latex?g">.</p>
<p>The cluster bootstrap generates resampled datasets by:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap sample <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Sample clusters <img src="https://latex.codecogs.com/png.latex?g"> from your data with replacement and include <em>all observations</em> <img src="https://latex.codecogs.com/png.latex?i"> from each selected cluster.</li>
<li>Compute the statistic <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p><strong>Strengths:</strong> Simple to implement, preserves cluster dependence, consistent under many forms of within-cluster correlation.</p>
<p><strong>Weaknesses:</strong> Requires a reasonably large number of clusters (typically <img src="https://latex.codecogs.com/png.latex?G%20%5Cgeq%2030">). Can be biased or unstable with few clusters. (Luckily, the United States was broken down into 50 states. The Swiss were not as fortunate.)</p>
<hr>
</section>
<section id="moving-block-bootstrap" class="level3">
<h3 class="anchored" data-anchor-id="moving-block-bootstrap">Moving Block Bootstrap</h3>
<p>If your data are <em>dependent</em>, like time series, the classic bootstrap fails because it breaks the correlation structure. The moving block bootstrap fixes this by resampling blocks of adjacent observations instead of individual data points. You can easily see how this makes sense for time series: you want to maintain the local dependence structure while still resampling.</p>
<p>You choose a block length <img src="https://latex.codecogs.com/png.latex?l"> and create overlapping blocks of data: <img src="https://latex.codecogs.com/png.latex?%0A%5C%7BY_1,%20%5Cdots,%20Y_l%5C%7D,%20%5C%7BY_2,%20%5Cdots,%20Y_%7Bl+1%7D%5C%7D,%20%5Cdots,%20%5C%7BY_%7Bn-l+1%7D,%20%5Cdots,%20Y_n%5C%7D.%0A"></p>
<p>Then, you proceed as follows:</p>
<div class="callout callout-style-default callout-note callout-titled" title="Algorithm:">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Algorithm:
</div>
</div>
<div class="callout-body-container callout-body">
<p>For each bootstrap sample <img src="https://latex.codecogs.com/png.latex?b%20=%201,%20%5Cdots,%20B">:</p>
<ol type="1">
<li>Sample these blocks with replacement to form a new dataset.</li>
<li>Compute the statistic <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D%5E*_b%20=%20T(%5Chat%7BF%7D%5E*_b)">.</li>
</ol>
</div>
</div>
<p><strong>Strengths:</strong> Maintains local dependence within blocks.</p>
<p><strong>Weaknesses:</strong> Choice of block size can be tricky; too small loses dependence, too big reduces variability.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">R</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">Python</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(boot)</span>
<span id="cb11-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb11-3">y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arima.sim</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ar =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb11-4">block_length <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb11-5">B <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb11-6">block_boot_means <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tsboot</span>(y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">statistic =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(x), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">R =</span> B, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">l =</span> block_length, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fixed"</span>)</span>
<span id="cb11-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">var</span>(block_boot_means<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>t)</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> arch.bootstrap <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> MovingBlockBootstrap</span>
<span id="cb12-2">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1988</span>)</span>
<span id="cb12-3">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.normal(size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb12-4">block_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb12-5">bs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> MovingBlockBootstrap(block_length, y)</span>
<span id="cb12-6">boot_means <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.array([np.mean(data[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> data <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> bs.bootstrap(B)])</span>
<span id="cb12-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(np.var(boot_means, ddof<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
</div>
</div>
</div>
<hr>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li>The bootstrap is not a single method—it’s a whole family of techniques, each with its own sweet spot.</li>
<li>The jackknife is fast and simple but struggles with non-smooth statistics.</li>
<li>The classic bootstrap works great for i.i.d. data and smooth or non-smooth statistics, but fails with dependence or small samples.</li>
<li>Specialized bootstraps (wild, block, Bayesian, subsampling) handle heteroskedasticity, clustering, dependence, and other real-world challenges that trip up the classic approach.</li>
</ul>
</section>
<section id="where-to-learn-more" class="level2">
<h2 class="anchored" data-anchor-id="where-to-learn-more">Where to Learn More</h2>
<p>Careful readers of this blog may have noticed that I frequently recommend Efron and Hastie’s <em>Computer Age Statistical Inference</em> for its modern perspective on statistical methods, including bootstrapping. While it’s an excellent and insightful text, it can be a bit too technical for many applied practitioners. If you’re looking for more approachable resources, I recommend exploring how various statistical software packages implement the bootstrap—Stata, in particular, offers great documentation and examples. You’ll also find high-quality lecture notes from advanced econometrics courses online that treat these topics with a contemporary lens. Finally, any of the references listed below will give you a solid grounding in bootstrap techniques.</p>
<hr>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Cameron, A. C., Gelbach, J. B., &amp; Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors. The review of economics and statistics, 90(3), 414-427.</p>
<p>Davidson, R., &amp; Flachaire, E. (2008). The wild bootstrap, tamed at last. Journal of Econometrics, 146(1), 162-169.</p>
<p>Davison, A. C., &amp; Hinkley, D. V. (1997). <em>Bootstrap Methods and Their Application</em>. Cambridge University Press.</p>
<p>Efron, B. (1979). Bootstrap methods: Another look at the jackknife. <em>Annals of Statistics</em>, 7(1), 1–26.</p>
<p>Efron, B., &amp; Hastie, T. (2021). Computer age statistical inference, student edition: algorithms, evidence, and data science (Vol. 6). Cambridge University Press.</p>
<p>Lahiri, S. N. (2003). <em>Resampling Methods for Dependent Data</em>. Springer.</p>
<p>Rubin, D. B. (1981). The Bayesian bootstrap. <em>Annals of Statistics</em>, 9(1), 130–134.</p>


</section>

 ]]></description>
  <category>bootstrap</category>
  <category>statistical inference</category>
  <category>flavors</category>
  <guid>https://vyasenov.github.io/blog/flavors-bootstrap.html</guid>
  <pubDate>Mon, 26 May 2025 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Secret Life of Correlation: Myths and Thirteen Views</title>
  <link>https://vyasenov.github.io/blog/corr-myths-13-views.html</link>
  <description><![CDATA[ 





<div class="reading-time">7 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>Statistical correlation has long captivated me—it’s probably the topic I’ve written about most on this blog. What makes it so compelling is the combination of theoretical richness and deceptive simplicity. In an age dominated by deep learning and opaque models, correlation remains a refreshingly transparent and interpretable quantity. When I encounter a new dataset, it’s often the first tool I reach for to explore relationships among variables.</p>
<p>Despite its familiarity, correlation is also one of the most frequently misunderstood and misapplied concepts in statistics. It seems straightforward: a value between –1 and 1 that quantifies the strength and direction of a relationship between two variables. But beneath that tidy number lies a complex web of assumptions, limitations, and interpretations—many of which are overlooked even by seasoned practitioners.</p>
<p>In this article, I revisit two insightful papers—van den Heuvel and Zhan (2022), and Rodgers and Nicewander (1988)—that peel back the layers of meaning surrounding correlation. My aim is to deepen our intuition and clear up common misconceptions about three of the most widely used correlation measures: Pearson’s <em>r</em>, Spearman’s <em>ρ</em>, and Kendall’s <em>τ</em>. Along the way, I’ll explore thirteen different lenses through which correlation can be understood.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> be two random variables with realizations <img src="https://latex.codecogs.com/png.latex?(x_i,%20y_i)"> for a random sample indexed by <img src="https://latex.codecogs.com/png.latex?i%20=%201,%20%5Cldots,%20n">. I assume all variables are centered (i.e., de-meaned) unless stated otherwise. Below are the three most commonly used correlation coefficients in practice.</p>
<p>As a refresher, here are the three correlation coefficients I’ll focus on:</p>
<ul>
<li><strong>Pearson’s <img src="https://latex.codecogs.com/png.latex?r"></strong> is defined as: <img src="https://latex.codecogs.com/png.latex?r(X,Y)%20=%20%5Cfrac%7B%5Csum%20(x_i%20-%20%5Cbar%7Bx%7D)(y_i%20-%20%5Cbar%7By%7D)%7D%7B%5Csqrt%7B%5Csum%20(x_i%20-%20%5Cbar%7Bx%7D)%5E2%7D%20%5Csqrt%7B%5Csum%20(y_i%20-%20%5Cbar%7By%7D)%5E2%7D%7D."></li>
<li><strong>Spearman’s <img src="https://latex.codecogs.com/png.latex?%5Crho"></strong> is Pearson’s <img src="https://latex.codecogs.com/png.latex?r"> computed on the ranks of the data: <img src="https://latex.codecogs.com/png.latex?%5Crho(X,Y)=r(%5Ctext%7Brank%7D(X),%20%5Ctext%7Brank%7D(Y))."></li>
<li><strong>Kendall’s <img src="https://latex.codecogs.com/png.latex?%5Ctau"></strong> is based on the number of concordant and discordant pairs: <img src="https://latex.codecogs.com/png.latex?%5Ctau%20=%20%5Cfrac%7B%5C#%5Ctext%7Bconcordant%7D%20-%20%5C#%5Ctext%7Bdiscordant%7D%7D%7B%5Cbinom%7Bn%7D%7B2%7D%7D."></li>
</ul>
<p>Concordant pairs of observations refer to pairs where the ranks of both variables move in the same direction. For example, if one observation is higher than another in both variables, they are concordant. Conversely, discordant pairs occur when the ranks of the variables move in opposite directions; one observation is higher in one variable but lower in the other.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="some-myths" class="level3">
<h3 class="anchored" data-anchor-id="some-myths">Some Myths</h3>
<p>Pearson’s <img src="https://latex.codecogs.com/png.latex?r"> is traditionally described as a measure of linear association, while Spearman’s <img src="https://latex.codecogs.com/png.latex?%5Crho"> and Kendall’s <img src="https://latex.codecogs.com/png.latex?%5Ctau"> are thought to capture monotonic relationships. This textbook distinction often leads analysts to default to rank-based methods when faced with nonlinear relationships. But as appealing as this neat categorization may be, it oversimplifies the reality.</p>
<p>Van den Heuvel and Zhan (2022) challenge this conventional wisdom. They argue that none of these three correlation coefficients are intrinsically limited to detecting “linear” or “monotonic” associations. Instead, their sensitivity depends on the underlying distributional structure, presence of heteroskedasticity, and even how the data were transformed. Through carefully constructed counterexamples, they demonstrate that Pearson’s <img src="https://latex.codecogs.com/png.latex?r"> can sometimes outperform Spearman’s <img src="https://latex.codecogs.com/png.latex?%5Crho"> and Kendall’s <img src="https://latex.codecogs.com/png.latex?%5Ctau"> even when the association is nonlinear. Conversely, rank-based methods can be more powerful than <img src="https://latex.codecogs.com/png.latex?r"> even when the association is linear—particularly in distributions outside the bivariate normal family.</p>
<p>Another persistent myth is that rank correlations are categorically “more robust.” While it’s true that <img src="https://latex.codecogs.com/png.latex?%5Crho"> and <img src="https://latex.codecogs.com/png.latex?%5Ctau"> are less sensitive to outliers in marginal distributions, this robustness has limits. Rank-based methods can still underperform or behave erratically in the presence of non-monotonic relationships or certain forms of heteroskedasticity. For instance, a <img src="https://latex.codecogs.com/png.latex?U">-shaped relationship will likely elude all three measures.</p>
</section>
<section id="new-framework-for-association" class="level3">
<h3 class="anchored" data-anchor-id="new-framework-for-association">New Framework for Association</h3>
<p>To overcome these misconceptions and some of the counterexamples previously suggested in the literature, the authors propose a more nuanced framework for understanding linear and monotonic associations. They developed the following extended definitions:</p>
<p><strong>Linear Association:</strong> <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are linearly associated if there exist known monotone functions <img src="https://latex.codecogs.com/png.latex?%5Cphi(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?%5Cpsi(%5Ccdot)"> such that: <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5B%5Cpsi(Y)%20%5Cmid%20%5Cphi(X)%5D%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20%5Cphi(X)."></p>
<p>Similarly,</p>
<p><strong>Monotonic Association:</strong> <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are monotonically associated if there exist two potentially unknown monotonic functions <img src="https://latex.codecogs.com/png.latex?%5Cphi(%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?%5Cpsi(%5Ccdot)"> such that <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BE%7D%5B%5Cpsi(Y)%20%5Cmid%20%5Cphi(X)%5D%20=%20%5Cphi(X)."></p>
<p>Under these updated definitions, the conventional understanding of which correlation coefficient is best suited for linear or monotonic relationships holds better ground. These definitions capture a richer set of relationships by accounting for transformations, rather than relying on raw scale comparisons. They also emphasize the importance of conditional expectation as the lens through which to define association, rather than relying solely on scatter plot geometry or regression output.</p>
<p>Overall, what becomes clear is that no correlation coefficient offers a complete or universally superior summary of association. Each captures different aspects of dependence. They are tools, not truths—and should be interpreted in context. Visualizations and complementary diagnostic tests remain indispensable.</p>
</section>
<section id="thirteen-ways-to-look-at-pearsons-r" class="level3">
<h3 class="anchored" data-anchor-id="thirteen-ways-to-look-at-pearsons-r">Thirteen Ways to Look at Pearson’s <img src="https://latex.codecogs.com/png.latex?r"></h3>
<p>If this wasn’t enough for you, Rodgers and Nicewander (1988) offer a brilliant framing of correlation by listing thirteen distinct ways to interpret Pearson’s <img src="https://latex.codecogs.com/png.latex?r">. Here’s a quick tour, each providing a slightly different angle:</p>
<ol type="1">
<li><strong>As a measure of standardized covariance</strong>, it tells you how two variables co-vary after accounting for their units.</li>
<li><strong>As a regression slope between standardized variables</strong>, it equals the slope of the line predicting <img src="https://latex.codecogs.com/png.latex?z">-scored <img src="https://latex.codecogs.com/png.latex?Y"> from <img src="https://latex.codecogs.com/png.latex?z">-scored <img src="https://latex.codecogs.com/png.latex?X">.</li>
<li><strong>As the centered and standardized sum</strong> of cross-product of two variables. This is merely the definition of Pearson’s <img src="https://latex.codecogs.com/png.latex?r"> shown above.</li>
<li><strong>As the cosine of the angle between two vectors</strong>, showing their geometric alignment.</li>
<li><strong>As a geometric mean of the two regression slopes</strong>. It equals the square root of the product of the slopes of the regression of <img src="https://latex.codecogs.com/png.latex?Y"> on <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?X"> on <img src="https://latex.codecogs.com/png.latex?Y">.</li>
<li><strong>As a square root of the ratio of two variances</strong>, where <img src="https://latex.codecogs.com/png.latex?r%5E2"> is the proportion of variance in <img src="https://latex.codecogs.com/png.latex?Y"> explained by <img src="https://latex.codecogs.com/png.latex?X"> by linear regression.</li>
<li><strong>As a function of the angle between the two standardized regression lines</strong>, where it equals the sum of the inverse of the cosine and the tangent of the angle between the two lines.</li>
<li><strong>As an average cross-product of standardized variables</strong>, which is obtained by dividing both the numerator and the denominator by the product of the two sample standard deviations.</li>
<li><strong>As a rescaled variance of the difference between standardized scores</strong></li>
<li><strong>As a balloon rule</strong>: A visual approximation of <img src="https://latex.codecogs.com/png.latex?r"> using the ellipse-shaped scatterplot “balloon” width and height.</li>
<li><strong>As a geometric property of elliptical contours (isoconcentration ellipses)</strong> in a bivariate distribution—essentially more precise versions of the “balloon” idea from the prior rule.</li>
<li><strong>As a test statistic in randomized experiments</strong>, <img src="https://latex.codecogs.com/png.latex?r"> can be computed from a t-statistic or F-statistic (e.g., from ANOVA).</li>
<li><strong>As a ratio of two means</strong> following Galton, <img src="https://latex.codecogs.com/png.latex?r"> reflects how the mean of Y changes with selected values of X.</li>
</ol>
<p>Each interpretation highlights a different trade-off or caveat. For example, the geometric view gives a great intuition, but the regression slope interpretation connects more directly to causal inference. And perhaps most importantly, several of these views are not invariant to nonlinear transformations, which matters a lot in real data.</p>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li><p>Pearson’s <img src="https://latex.codecogs.com/png.latex?r">, Spearman’s <img src="https://latex.codecogs.com/png.latex?%5Crho">, and Kendall’s <img src="https://latex.codecogs.com/png.latex?%5Ctau"> measure different aspects of association—none is a catch-all indicator.</p></li>
<li><p>The “monotonic vs.&nbsp;linear” framing is a helpful heuristic, but it can break down in some real-world scenarios.</p></li>
<li><p>Rodgers and Nicewander’s thirteen perspectives on correlation reveal its multifaceted nature and limitations.</p></li>
<li><p>Always visualize your data—correlation coefficients should not replace your eyes or your understanding of the domain.</p></li>
</ul>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>van den Heuvel, E., &amp; Zhan, Z. (2022). Myths about linear and monotonic associations: Pearson’s <img src="https://latex.codecogs.com/png.latex?r">, Spearman’s <img src="https://latex.codecogs.com/png.latex?%5Crho">, and Kendall’s <img src="https://latex.codecogs.com/png.latex?%5Ctau">. <em>The American Statistician</em>, 76(1), 44–52.</p>
<p>Lee Rodgers, J., &amp; Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. <em>The American Statistician</em>, 42(1), 59–66.</p>


</section>

 ]]></description>
  <category>statistical inference</category>
  <category>correlation</category>
  <guid>https://vyasenov.github.io/blog/corr-myths-13-views.html</guid>
  <pubDate>Sat, 24 May 2025 07:00:00 GMT</pubDate>
</item>
<item>
  <title>The Kolmogorov–Smirnov Test as a Goodness-of-fit</title>
  <link>https://vyasenov.github.io/blog/ks-test-one-sample.html</link>
  <description><![CDATA[ 





<div class="reading-time">4 min read</div>
<!-- this is for social media sharing buttons -->
<div class="sharethis-inline-share-buttons pt-5">

</div>
<section id="background" class="level2">
<h2 class="anchored" data-anchor-id="background">Background</h2>
<p>The Kolmogorov–Smirnov (KS) test is a staple in the statistical toolbox for checking how well data fit a hypothesized distribution. It comes in both a one-sample and a two-sample version. A common application in causal inference is covariates distribution balance checks between the treatment and control groups. It’s nonparametric, straightforward to compute, and widely implemented in just about every statistical software. But—and this is a big but—using the KS test naively can lead to some serious misinterpretations, especially when parameters are estimated from the data.</p>
<p>This article is based on the 2024 <a href="https://www.tandfonline.com/doi/abs/10.1080/00031305.2024.2356095">paper</a> by Zeimbekakis, Schifano, and Yan, which takes a hard look at the common misuses of the <em>one-sample</em> KS test. I’ll walk through what the KS test is supposed to do, when it goes wrong, and how to think more clearly about assessing goodness-of-fit.</p>
</section>
<section id="notation" class="level2">
<h2 class="anchored" data-anchor-id="notation">Notation</h2>
<p>Let <img src="https://latex.codecogs.com/png.latex?X_1,%20%5Cdots,%20X_n"> be i.i.d. random variables with unknown distribution function <img src="https://latex.codecogs.com/png.latex?F">. We want to test whether <img src="https://latex.codecogs.com/png.latex?F%20=%20F_0">, for some known distribution function <img src="https://latex.codecogs.com/png.latex?F_0">.</p>
<p>The empirical distribution function (EDF) is: <img src="https://latex.codecogs.com/png.latex?F_n(x)%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5En%20I(X_i%20%5Cleq%20x)"></p>
<p>You are probably familiar with this. It is a step function that estimates the true cumulative distribution function of a random variable based on a sample. At any point <img src="https://latex.codecogs.com/png.latex?x">, the ECDF gives the proportion of observations in the sample that are less than or equal to <img src="https://latex.codecogs.com/png.latex?x">. It is the nonparametric maximum likelihood estimator of the cumulative distribution function (CDF).</p>
<p>The KS statistic is: <img src="https://latex.codecogs.com/png.latex?D_n%20=%20%5Csup_%7Bx%20%5Cin%20%5Cmathbb%7BR%7D%7D%20%7CF_n(x)%20-%20F_0(x)%7C"></p>
<p>Under the null hypothesis, this test statistic converges to the Kolmogorov distribution, a distribution with no closed-form density but a known CDF. This is under the assumption that <img src="https://latex.codecogs.com/png.latex?F_0"> is fully specified, i.e., no parameters have been estimated from the data.</p>
</section>
<section id="a-closer-look" class="level2">
<h2 class="anchored" data-anchor-id="a-closer-look">A Closer Look</h2>
<section id="a-refresher-on-ks" class="level3">
<h3 class="anchored" data-anchor-id="a-refresher-on-ks">A Refresher on KS</h3>
<p>Intuitively, the KS test statistic measures the largest vertical distance between the EDF and the hypothesized CDF <img src="https://latex.codecogs.com/png.latex?F_0">. It is sensitive to discrepancies in the CDF. This gives you a global measure of discrepancy, not a local one—so it’s less powerful for detecting issues like tail misspecification or multimodality. This is important because in many applications, tail behavior is critically important, such as in risk modeling or extreme value analysis.</p>
<p>A well known limitation of the KS test is that with small samples, it has limited power to detect distributional differences, while with very large samples, it may detect statistically significant but practically trivial deviations from the hypothesized distribution. This problem in the context of “big data” is obviously broader and goes beyond the KS test.</p>
</section>
<section id="the-problem" class="level3">
<h3 class="anchored" data-anchor-id="the-problem">The Problem</h3>
<p>Here’s the catch: the null distribution of the KS statistic assumes <img src="https://latex.codecogs.com/png.latex?F_0"> is fully known. But in practice, people often use the test to evaluate model fit <em>after</em> estimating parameters—e.g., fitting a normal distribution by MLE and then checking fit with KS.</p>
<p>That invalidates the test.</p>
<p>Why? Because the theoretical distribution of <img src="https://latex.codecogs.com/png.latex?D_n"> changes when parameters are estimated. The true distribution of the test statistic becomes conditional on the data, and the critical values are no longer accurate. This leads to a deflated Type I error rate: you’re less likely to incorrectly reject the null. In other words, the test is too conservative.</p>
</section>
<section id="better-alternatives" class="level3">
<h3 class="anchored" data-anchor-id="better-alternatives">Better Alternatives</h3>
<p>When parameters are estimated, we need modified procedures:</p>
<ul>
<li><strong>Lilliefors test</strong>: An adaptation of the KS test that adjusts the null distribution when testing for normality with estimated parameters.</li>
<li><strong>Parametric bootstrap</strong>: Simulate the null distribution of the test statistic by repeatedly fitting the model and computing <img src="https://latex.codecogs.com/png.latex?D_n"> on simulated data.</li>
<li><strong>Other GOF tests</strong>: Anderson-Darling and Cramér-von Mises tests have versions that handle estimated parameters more gracefully.</li>
</ul>
</section>
</section>
<section id="bottom-line" class="level2">
<h2 class="anchored" data-anchor-id="bottom-line">Bottom Line</h2>
<ul>
<li><p>The KS test is a popular and flexible method for testing differences between statistical distributions.</p></li>
<li><p>It assumes no parameters are estimated—violating this leads to invalid inference.</p></li>
<li><p>Estimating parameters from the same data used in the test deflates Type I error.</p></li>
<li><p>Use alternatives like the Lilliefors test or bootstrap methods when parameters are estimated.</p></li>
</ul>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p>Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American statistical Association, 62(318), 399-402.</p>
<p>Zeimbekakis, A., Schifano, E. D., &amp; Yan, J. (2024). On Misuses of the Kolmogorov–Smirnov Test for One-Sample Goodness-of-Fit. <em>The American Statistician</em>, 78(4), 481-487.</p>


</section>

 ]]></description>
  <category>statistical inference</category>
  <category>hypothesis testing</category>
  <guid>https://vyasenov.github.io/blog/ks-test-one-sample.html</guid>
  <pubDate>Mon, 05 May 2025 07:00:00 GMT</pubDate>
</item>
</channel>
</rss>
