Correlation is a Cosine

correlation

statistical inference

Published

February 9, 2023

Background

You might have come across the statement, “correlation is a cosine,” but never taken the time to explore its precise meaning. It certainly sounds intriguing—how can the simplest bivariate summary statistic be connected to a trigonometric function you first encountered in sixth grade? What exactly is the relationship between correlation and cosines?

A Closer Look

The Law of Cosines

The law of cosines states that in any triangle with sides \(x\), \(y\), and \(z\) and an angle (between \(x\) and \(y\)) \(\theta\), we have:

\[ z^2 = x^2 + y^2 - 2 x y cos(\theta), \]

In the special case when \(\theta=\frac{\pi}{2}\), the term on the right-hand side equals \(0\) and the equation reduces to the well-known Pythagorean Theorem.

The Variance of the \(A+B\)

Let’s imagine two random variables \(A\), \(B\). The variance of their sum is given by:

\[ var(A+B) = var(A)+var(B)+2 cov(A,B), \]

where \(cov(\cdot)\), denotes covariance. We can substitute the last term with its definition as follows:

\[ var(A+B) = var(A)+var(B)+2 corr(A,B) sd(A) sd(B). \]

Next, we know that \(var(\cdot)=sd^2(\cdot)\). Substituting, we get:

\[ sd^2(A+B) = sd^2 (A)+ sd^2 (B)+2 corr(A,B) sd(A) sd(B).\]

Putting the Two Together

Setting \(x=sd(A)\), \(y=sd(B)\), and \(z=sd(A+B)\) in the first equation gives the desired result. With one small caveat – the negative sign on the cosine term. To get around this we can simply look at the complementary angle \(\delta = \pi - \theta\).

That is, we imagine a triangle with sides equal to \(sd(A)\), \(sd(B)\) and \(sd(A+B)\), where \(\theta\) is the angle between \(sd(A)\), \(sd(B)\). When this angle is small (\(\theta < \frac{\pi}{2}\)), the two sides point in the same direction and A and B are positively correlated. The opposite is true for \(\theta > \frac{\pi}{2}\). As mentioned above, \(\theta = \frac{\pi}{2}\) kills the correlation term, consistent with \(A\) and \(B\) being independent.

Correlation as a Dot Product

There’s another way to see this connection that makes it even clearer.

If you think of \(A\) and \(B\) as vectors in \(n\)-dimensional space (e.g., \(A = (a_1, a_2, \ldots, a_n)\)), the cosine of the angle between them is given by:

\[ \cos(\theta) = \frac{A \cdot B}{\|A\| \|B\|}, \]

where \(A \cdot B\) is the dot product, and \(\|\cdot\|\) denotes the Euclidean norm. When \(A\) and \(B\) are standardized (i.e., mean zero and unit variance), this cosine becomes the Pearson correlation coefficient:

\[ \text{corr}(A, B) = \frac{1}{n} \sum_{i=1}^n A_i^* B_i^* \approx \cos(\theta), \]

where \(A^*\) and \(B^*\) are the standardized versions of \(A\) and \(B\).

Cosine Similarity vs. Correlation

Cosine similarity vs. correlation

Cosine similarity and Pearson correlation are closely related, but not always the same:

Cosine similarity considers only the angle between vectors. It’s scale-invariant but not shift-invariant.
Correlation removes both mean and scale, making it invariant to affine transformations.

So, while “correlation is a cosine,” the statement is strictly true when you’re working with standardized vectors.

An Example

Let’s generate two random vectors, standardize them, compute their correlation and angle, and plot them as vectors. We’ll also see how correlation equals the cosine of the angle between the vectors.

R
Python

rm(list=ls())
set.seed(1988)

# Generate two random vectors
A <- rnorm(100)
B <- 0.8 * A + sqrt(1 - 0.8^2) * rnorm(100)  # Correlated with A

# Standardize
A_std <- scale(A)
B_std <- scale(B)

# Correlation and cosine
correlation <- cor(A, B)
cosine <- sum(A_std * B_std) / (sqrt(sum(A_std^2)) * sqrt(sum(B_std^2)))
angle_deg <- acos(cosine) * 180 / pi

# Print results
cat("Correlation:", round(correlation, 3), "\n")
> Correlation: 0.825 
cat("Angle (degrees):", round(angle_deg, 1), "\n")
> Angle (degrees): 34.4 

# Plot vectors
plot(c(0, A_std[1]), c(0, B_std[1]), type = "n", xlab = "A (standardized)", ylab = "B (standardized)",
     main = "First Vectors from A and B")
arrows(0, 0, A_std[1], 0, col = "blue", lwd = 2)
arrows(0, 0, A_std[1], B_std[1], col = "red", lwd = 2)
legend("topright", legend = c("A", "B"), col = c("blue", "red"), lwd = 2)

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1988)

# Generate two correlated vectors
A = np.random.randn(100)
B = 0.8 * A + np.sqrt(1 - 0.8**2) * np.random.randn(100)

# Standardize
A_std = (A - np.mean(A)) / np.std(A)
B_std = (B - np.mean(B)) / np.std(B)

# Correlation and cosine
correlation = np.corrcoef(A, B)[0, 1]
cosine = np.dot(A_std, B_std) / (np.linalg.norm(A_std) * np.linalg.norm(B_std))
angle = np.arccos(np.clip(cosine, -1, 1)) * 180 / np.pi

print(f"Correlation: {correlation:.3f}")
print(f"Angle (degrees): {angle:.1f}")

# Plot first two vectors
plt.figure(figsize=(5, 5))
plt.quiver(0, 0, A_std[0], 0, angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(0, 0, A_std[0], B_std[0], angles='xy', scale_units='xy', scale=1, color='red', label='B')
plt.xlim(-3, 3)
plt.ylim(-3, 3)
plt.xlabel("A (standardized)")
plt.ylabel("B (standardized)")
plt.title("First Vectors from A and B")
plt.legend()
plt.grid(True)
plt.gca().set_aspect('equal')
plt.show()

In this example, \(cor(A,B)=0.825\) and the angle between the two vectors is \(34.4^\circ\), and we have \(cos(34.4^\circ) = cor(A,B)=0.825\). You can also visualize these vectors, but I am not showing that graph here.

Where to Learn More

As with anything else, a Google search is your friend here, with multiple Stack Overflow posts explaining this connection from all sorts of angles. However, I do find John D. Cook’s blog post most helpful, and I am following his exposition closely.

Bottom Line

The variance formula mirrors the law of cosines.
Standardizing the variables makes correlation equal the cosine of the angle.
So: “Correlation is a cosine” — literally!

References

Cosines and correlation, Cook 2010, blog post