Vasco Yasenov
  • About Me
  • CV
  • Blog
  • Research
  • Software
  • Others
    • Methods Map
    • Kids Books
    • TV Series Ratings

On this page

  • Background
  • Notation
  • A Closer Look
    • What Does WMW Actually Test?
    • Understanding Stochastic Dominance
    • When Does It Coincide with a Median Test?
    • Alternative Tests
  • An Example
  • Bottom Line
  • Where to Learn More
  • References

The Wilcoxon-Mann-Whitney Test is Not a Test of Medians

statistical inference
hypothesis testing
Published

January 20, 2026

Background

Nonparametric tests like the Wilcoxon-Mann-Whitney (WMW) are among the most popular alternatives to the \(t\)- and \(z\)-tests in settings where normality assumptions break down. Often described as a “test of medians,” WMW is used when comparing two independent groups without making strong assumptions about the underlying distributions. It is also known as the Mann-Whitney-Wilcoxon (MWW) test or the Wilcoxon rank-sum test.

Despite this common interpretation, the WMW test is not a test of medians—at least not in general. Divine et al. (2018) dive deep into this misconception and show convincingly how the WMW test can lead you astray if you’re specifically interested in comparing medians.

This article explains why that happens, provides some intuition and math, and shows you how to think more clearly about what the WMW test actually does.

Notation

Let \(X_1, \ldots, X_m \sim F\) and \(Y_1, \ldots, Y_n \sim G\) be two independent random samples from distributions \(F\) and \(G\), respectively. The Wilcoxon-Mann-Whitney statistic is based on the probability:

\[P(X < Y) + \frac{1}{2}P(X = Y)\]

This quantity is sometimes referred to as the probability of superiority.

Let \(\theta_F\) and \(\theta_G\) denote the medians of \(F\) and \(G\). We often want to test:

\[H_0: \theta_F = \theta_G\]

But WMW does not directly test this hypothesis unless very specific conditions are met.

A Closer Look

What Does WMW Actually Test?

The WMW test assesses whether one distribution tends to produce larger values than the other. More formally, it tests:

\[H_0: P(X < Y) + \frac{1}{2}P(X = Y) = 0.5\]

This is equivalent to testing whether the distributions are stochastically equal, not whether the medians are equal.

The WMW test can be performed via rank sums. After combining both samples, we rank all observations from smallest to largest. The test statistic \(W\) is the sum of ranks assigned to the first sample:

\[W = \sum_{i=1}^m R(X_i)\]

where \(R(X_i)\) is the rank of \(X_i\) in the combined sample.

This rank-based formulation is mathematically equivalent to counting how many pairs \((X_i, Y_j)\) have \(X_i < Y_j\), which relates to the probability interpretation above. Under the null hypothesis, the expected rank sum is approximately \(m(m+n+1)/2\).

Understanding Stochastic Dominance

When we say the WMW test examines “stochastic dominance,” we mean it tests whether values from one distribution tend to exceed values from the other. Specifically, distribution \(G\) stochastically dominates distribution \(F\) if:

\[G(x) \leq F(x) \text{ for all } x\]

with strict inequality for at least some values of \(x\). Intuitively, this means a randomly selected value from \(G\) is more likely to be larger than a randomly selected value from \(F\).

This is quite different from comparing medians. Two distributions can have identical medians but exhibit stochastic dominance, or they can have different medians but neither stochastically dominates the other.

When Does It Coincide with a Median Test?

The WMW test only functions as a test of medians under symmetric distributions with equal shape and spread. If the shapes differ—say, one is skewed left and the other right—then even if the medians are the same, WMW can reject the null. Worse, it might fail to reject when the medians are different but the distributions have similar overall ranks.

Alternative Tests

If your research question specifically concerns differences in medians, more appropriate tests include:

  • Mood’s median test: A true test of median equality that uses contingency tables based on counts above and below the combined median.
  • Quantile regression: For more complex designs, quantile regression directly models the median (or other quantiles) and tests differences between groups.
  • Bootstrap confidence intervals: Calculating confidence intervals for the difference in medians via bootstrapping provides both a test and measure of uncertainty.

These approaches directly address median differences rather than the stochastic ordering tested by WMW.

An Example

Let’s see this in action with a small simulation.

  • R
  • Python
set.seed(123)
x <- rexp(100, rate = 1)         # Right-skewed
y <- rexp(100, rate = 1.5)       # Also right-skewed, different rate

median(x)  # Median of x
median(y)  # Median of y

wilcox.test(x, y)
import numpy as np
from scipy.stats import mannwhitneyu

np.random.seed(123)
x = np.random.exponential(scale=1.0, size=100)
y = np.random.exponential(scale=2/3, size=100)  # Higher scale = lower rate

print("Median x:", np.median(x))
print("Median y:", np.median(y))

res = mannwhitneyu(x, y, alternative='two-sided')
print(res)

This example demonstrates our point perfectly: The medians are clearly different (0.6334 vs. 0.4865), and the WMW test correctly rejects the null hypothesis (p = 0.004). However, this rejection occurs because the exponential distributions with different rates create a consistent stochastic ordering, not because it’s specifically testing the medians.

Despite different medians, the WMW test might not reject the null. Or it might reject it because of shape differences, not the medians.

Bottom Line

  • The Wilcoxon-Mann-Whitney test is not a general test of medians.
  • It tests for stochastic dominance or shift in distribution, not specifically median difference.
  • It behaves like a median test only under certain conditions (e.g., identical shape).
  • Be cautious interpreting WMW results as saying something about medians unless distributional assumptions are met.

Where to Learn More

For a deeper dive, read the original Divine et al. (2018) paper. You might also want to look at literature on robust location tests or permutation-based alternatives that better target the median.

References

Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon–Mann–Whitney procedure fails as a test of medians. The American Statistician, 72(3), 278–286.

Hollander, M., Wolfe, D. A., & Chicken, E. (2013). Nonparametric Statistical Methods (3rd ed.). Wiley.

© 2025 Vasco Yasenov

 

Powered by Quarto