6.8. Subgaussian Distributions¶

In this section we review subgaussian distributions and matrices drawn from subgaussian distributions.

Examples of subgaussian distributions include

Gaussian distribution
Rademacher distribution taking values $\pm \frac{1}{\sqrt{M}}$
Any zero mean distribution with a bounded support

Definition 6.2

A random variable $X$ is called subgaussian if there exists a constant $c > 0$ such that

(6.1)¶

M_{X} (t) = E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2})

holds for all $t \in R$ . We use the notation $X \sim Sub (c^{2})$ to denote that $X$ satisfies the constraint (6.1). We also say that $X$ is $c$ -subgaussian.

$E [\exp (X t)]$ is moment generating function of $X$ .

$\exp (\frac{c^{2} t^{2}}{2})$ is moment generating function of a Gaussian random variable with variance $c^{2}$ .

The definition means that for a subgaussian variable $X$ , its M.G.F. is bounded by the M.G.F. of a Gaussian random variable $\sim N (0, c^{2})$ .

Example 6.1 (Gaussian r.v. as subgaussian r.v.)

Consider zero-mean Gaussian random variable $X \sim N (0, σ^{2})$ with variance $σ^{2}$ . Then

E [\exp (X t)] = \exp (\frac{σ^{2} t^{2}}{2}) .

Putting $c = σ$ we see that (6.1) is satisfied. Hence $X \sim Sub (σ^{2})$ is a subgaussian r.v. or $X$ is $σ$ -subgaussian.

Example 6.2 (Rademacher distribution)

Consider $X$ with

P_{X} (x) = \frac{1}{2} δ (x - 1) + \frac{1}{2} δ (x + 1)

i.e. $X$ takes a value $1$ with probability $0.5$ and value $- 1$ with probability $0.5$ .

Then

E [\exp (X t)] = \frac{1}{2} \exp (- t) + \frac{1}{2} \exp (t) = \cosh t \leq \exp (\frac{t^{2}}{2}) .

Thus $X \sim Sub (1)$ or $X$ is 1-subgaussian.

Example 6.3 (Uniform distribution)

Consider $X$ as uniformly distributed over the interval $[- a, a]$ for some $a > 0$ . i.e.

\begin{array}{r} f_{X} (x) = {\begin{cases} \frac{1}{2 a} & - a \leq x \leq a \\ 0 & otherwise \end{cases} \end{array}

Then

E [\exp (X t)] = \frac{1}{2 a} \int_{- a}^{a} \exp (x t) d x = \frac{1}{2 a t} [e^{a t} - e^{- a t}] = \sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(2 n + 1)!}

But $(2 n + 1)! \geq n! 2^{n}$ . Hence we have

\sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(2 n + 1)!} \leq \sum_{n = 0}^{\infty} \frac{(a t)^{2 n}}{(n! 2^{n})} = \sum_{n = 0}^{\infty} \frac{(a^{2} t^{2} / 2)^{n}}{(n!)} = \exp (\frac{a^{2} t^{2}}{2}) .

Thus

E [\exp (X t] \leq \exp (\frac{a^{2} t^{2}}{2}) .

Hence $X \sim Sub (a^{2})$ or $X$ is $a$ -subgaussian.

Example 6.4 (Random variable with bounded support)

Consider $X$ as a zero mean, bounded random variable i.e.

P (| X | \leq B) = 1

for some $B \in R^{+}$ and

E (X) = 0.

Then, the following upper bound holds:

E [\exp (X t)] = \int_{- B}^{B} \exp (x t) f_{X} (x) d x \leq \exp (\frac{B^{2} t^{2}}{2}) .

This result can be proven with some advanced calculus. $X \sim Sub (B^{2})$ or $X$ is $B$ -subgaussian.

There are some useful properties of subgaussian random variables.

Lemma 6.2 (Mean and variance of subgaussian random variables)

If $X \sim Sub (c^{2})$ then

E (X) = 0

and

E (X^{2}) \leq c^{2} .

Thus subgaussian random variables are always zero-mean. Their variance is always bounded by the variance of the bounding Gaussian distribution.

Proof. We proceed as follows:

Note that

$\sum_{n = 0}^{\infty} \frac{t^{n}}{n!} E (X^{n}) = E (\sum_{n = 0}^{\infty} \frac{(X t)^{n}}{n!}) = E (\exp (X t)) .$
But since $X \sim Sub (c^{2})$ hence

$\sum_{n = 0}^{\infty} \frac{t^{n}}{n!} E (X^{n}) \leq \exp (\frac{c^{2} t^{2}}{2}) = \sum_{n = 0}^{\infty} \frac{c^{2 n} t^{2 n}}{2^{n} n!} .$
Restating

$E (X) t + E (X^{2}) \frac{t^{2}}{2!} \leq \frac{c^{2} t^{2}}{2} + o (t^{2}) as t \to 0.$
Dividing throughout by $t > 0$ and letting $t \to 0$ we get $E (X) \leq 0$ .
Dividing throughout by $t < 0$ and letting $t \to 0$ we get $E (X) \geq 0$ .
Thus $E (X) = 0$ . So $Var (X) = E (X^{2})$ .
Now we are left with

$E (X^{2}) \frac{t^{2}}{2!} \leq \frac{c^{2} t^{2}}{2} + o (t^{2}) as t \to 0.$
Dividing throughout by $t^{2}$ and letting $t \to 0$ we get $Var (X) \leq c^{2}$ .

Subgaussian variables have a linear structure.

Theorem 6.8 (Linearity of subgaussian variables)

If $X \sim Sub (c^{2})$ i.e. $X$ is $c$ -subgaussian, then for any $α \in R$ , the r.v. $α X$ is $| α | c$ -subgaussian.

If $X_{1}, X_{2}$ are r.v. such that $X_{i}$ is $c_{i}$ -subgaussian, then $X_{1} + X_{2}$ is $c_{1} + c_{2}$ -subgaussian.

Proof. Scalar multiplication:

Let $X$ be $c$ -subgaussian.
Then

$E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2}) .$
Now for $α \neq 0$ , we have

$E [\exp (α X t)] \leq \exp (\frac{α^{2} c^{2} t^{2}}{2}) \leq \exp (\frac{(| α | c)^{2} t^{2}}{2}) .$
Hence $α X$ is $| α | c$ -subgaussian.

Addition:

Consider $X_{1}$ as $c_{1}$ -subgaussian and $X_{2}$ as $c_{2}$ -subgaussian.
Thus

$E (\exp (X_{i} t)) \leq \exp (\frac{c_{i}^{2} t^{2}}{2}) .$
Let $p, q > 1$ be two numbers s.t. $\frac{1}{p} + \frac{1}{q} = 1$ .
Using H”older’s inequality, we have

$\begin{aligned} E (\exp ((X_{1} + X_{2}) t)) & \leq {[E (\exp (X_{1} t))^{p}]}^{\frac{1}{p}} {[E (\exp (X_{2} t))^{q}]}^{\frac{1}{q}} \\ = {[E (\exp (p X_{1} t))]}^{\frac{1}{p}} {[E (\exp (q X_{2} t))]}^{\frac{1}{q}} \\ \leq {[\exp (\frac{(p c_{1})^{2} t^{2}}{2})]}^{\frac{1}{p}} {[\exp (\frac{(q c_{2})^{2} t^{2}}{2})]}^{\frac{1}{q}} \\ = \exp (\frac{t^{2}}{2} (p c_{1}^{2} + q c_{2}^{2})) \\ = \exp (\frac{t^{2}}{2} (p c_{1}^{2} + \frac{p}{p - 1} c_{2}^{2})) . \end{aligned}$
Since this is valid for any $p > 1$ , we can minimize the r.h.s. over $p > 1$ .
If suffices to minimize the term

$r = p c_{1}^{2} + \frac{p}{p - 1} c_{2}^{2} .$
We have

$\frac{\partial r}{\partial p} = c_{1}^{2} - \frac{1}{(p - 1)^{2}} c_{2}^{2} .$
Equating it to 0 gives us

$p - 1 = \frac{c_{2}}{c_{1}} ⟹ p = \frac{c_{1} + c_{2}}{c_{1}} ⟹ \frac{p}{p - 1} = \frac{c_{1} + c_{2}}{c_{2}} .$
Taking second derivative, we can verify that this is indeed a minimum value.
Thus

$r_{min} = (c_{1} + c_{2})^{2} .$
Hence we have the result

$E (\exp ((X_{1} + X_{2}) t)) \leq \exp (\frac{(c_{1} + c_{2})^{2} t^{2}}{2}) .$
Thus $X_{1} + X_{2}$ is $(c_{1} + c_{2})$ -subgaussian.

If $X_{1}$ and $X_{2}$ are independent, then $X_{1} + X_{2}$ is $\sqrt{c_{1}^{2} + c_{2}^{2}}$ -subgaussian.

If $X$ is $c$ -subgaussian then naturally, $X$ is $d$ -subgaussian for any $d \geq c$ . A question arises as to what is the minimum value of $c$ such that $X$ is $c$ -subgaussian.

Definition 6.3 (Subgaussian moment)

For a centered random variable $X$ , the subgaussian moment of $X$ , denoted by $σ (X)$ , is defined as

σ (X) = inf {c \geq 0 | E (\exp (X t)) \leq \exp (\frac{c^{2} t^{2}}{2}), \forall t \in R .}

$X$ is subgaussian if and only if $σ (X)$ is finite.

We can also show that $σ (\cdot)$ is a norm on the space of subgaussian random variables. And this normed space is complete.

For centered Gaussian r.v. $X \sim N (0, σ^{2})$ , the subgaussian moment coincides with the standard deviation. $σ (X) = σ$ .

Sometimes it is useful to consider more restrictive class of subgaussian random variables.

Definition 6.4 (Strictly subgaussian distribution)

A random variable $X$ is called strictly subgaussian if $X \sim Sub (σ^{2})$ where $σ^{2} = E (X^{2})$ , i.e. the inequality

E (\exp (X t)) \leq \exp (\frac{σ^{2} t^{2}}{2})

holds true for all $t \in R$ .

We will denote strictly subgaussian variables by $X \sim SSub (σ^{2})$ .

Example 6.5 (Gaussian distribution)

If $X \sim N (0, σ^{2})$ then $X \sim SSub (σ^{2})$ .

6.8.1. Characterization¶

We quickly review Markov’s inequality which will help us establish the results in this subsection.

Let $X$ be a non-negative random variable. And let $t > 0$ . Then

P (X \geq t) \leq \frac{E (X)}{t} .

Theorem 6.9

For a centered random variable $X$ , the following statements are equivalent:

moment generating function condition:

$E [\exp (X t)] \leq \exp (\frac{c^{2} t^{2}}{2}) \forall t \in R .$
Subgaussian tail estimate: There exists $a > 0$ such that

$P (| X | \geq λ) \leq 2 \exp (- a λ^{2}) \forall λ > 0.$
$ψ_{2}$ -condition: There exists some $b > 0$ such that

$E [\exp (b X^{2})] \leq 2.$

Proof. $(1) ⟹ (2)$

Using Markov’s inequality, for any $t > 0$ we have

$\begin{aligned} P (X \geq λ) & = P (t X \geq t λ) = P (e^{t X} \geq e^{t λ}) \\ \leq \frac{E (e^{t X})}{e^{t λ}} \leq \exp (- t λ + \frac{c^{2} t^{2}}{2}) \forall t \in R . \end{aligned}$
Since this is valid for all $t \in R$ , hence it should be valid for the minimum value of r.h.s.
The minimum value is obtained for $t = \frac{λ}{c^{2}}$ .
Thus we get

$P (X \geq λ) \leq \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Since $X$ is $c$ -subgaussian, hence $- X$ is also $c$ -subgaussian.
Hence

$P (X \leq - λ) = P (- X \geq λ) \leq \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Thus

$P (| X | \geq λ) = P (X \leq - λ) + P (X \geq λ) \leq 2 \exp (- \frac{λ^{2}}{2 c^{2}}) .$
Thus we can choose $a = \frac{1}{2 c^{2}}$ to complete the proof.

$(2) ⟹ (3)$

TODO PROVE THIS

E (\exp (b X^{2})) \leq 1 + \int_{0}^{\infty} 2 b t \exp (b t^{2}) P (| X | > t) d t

$(3) ⟹ (1)$

TODO PROVE THIS

6.8.2. More Properties¶

We also have the following result on the exponential moment of a subgaussian random variable.

Lemma 6.3

Suppose $X \sim Sub (c^{2})$ . Then

E [\exp (\frac{λ X^{2}}{2 c^{2}})] \leq \frac{1}{\sqrt{1 - λ}}

for any $λ \in [0, 1)$ .

Proof. We are given that

\begin{aligned} E (\exp (X t)) \leq \exp (\frac{c^{2} t^{2}}{2}) \\ ⟹ \int_{- \infty}^{\infty} \exp (t x) f_{X} (x) d x \leq \exp (\frac{c^{2} t^{2}}{2}) \forall t \in R \end{aligned}

Multiplying on both sides with $\exp (- \frac{c^{2} t^{2}}{2 λ})$ :

\int_{- \infty}^{\infty} \exp (t x - \frac{c^{2} t^{2}}{2 λ}) f_{X} (x) d x \leq \exp (\frac{c^{2} t^{2}}{2} \frac{λ - 1}{λ}) = \exp (- \frac{t^{2}}{2} \frac{c^{2} (1 - λ)}{λ})

Integrating on both sides w.r.t. $t$ we get:

\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \exp (t x - \frac{c^{2} t^{2}}{2 λ}) f_{X} (x) d x d t \leq \int_{- \infty}^{\infty} \exp (- \frac{t^{2}}{2} \frac{c^{2} (1 - λ)}{λ}) d t

which reduces to:

\begin{aligned} \frac{1}{c} \sqrt{2 π λ} \int_{- \infty}^{\infty} \exp (\frac{λ x^{2}}{2 c^{2}}) f_{X} (x) d x \leq \frac{1}{c} \sqrt{\frac{2 π λ}{1 - λ}} \\ ⟹ & E (\exp (\frac{λ X^{2}}{2 c^{2}})) \leq \frac{1}{\sqrt{1 - λ}} \end{aligned}

which completes the proof.

6.8.3. Subgaussian Random Vectors¶

The linearity property of subgaussian r.v.s can be extended to random vectors also. This is stated more formally in following result.

Theorem 6.10

Suppose that $X = [X_{1}, X_{2}, \dots, X_{N}]$ , where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ . Then for any $α \in R^{N}$ , $⟨ X, α ⟩ \sim Sub (c^{2} ‖ α ‖_{2}^{2})$ . Similarly if each $X_{i} \sim SSub (σ^{2})$ , then for any $α \in R^{N}$ , $⟨ X, α ⟩ \sim SSub (σ^{2} ‖ α ‖_{2}^{2})$ .

Norm of a subgaussian random vector

Let $X$ be a random vector where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ .
Consider the $l_{2}$ norm $‖ X ‖_{2}$ . It is a random variable in its own right.
It would be useful to understand the average behavior of the norm.
Suppose $N = 1$ . Then $‖ X ‖_{2} = | X_{1} |$ .
Also $‖ X ‖_{2}^{2} = X_{1}^{2}$ . Thus $E (‖ X ‖_{2}^{2}) = σ^{2}$ .
It looks like $E (‖ X ‖_{2}^{2})$ should be connected with $σ^{2}$ .
Norm can increase or decrease compared to the average value.
A ratio based measure between actual value and average value would be useful.
What is the probability that the norm increases beyond a given factor?
What is the probability that the norm reduces beyond a given factor?

These bounds are stated formally in the following theorem.

Theorem 6.11

Suppose that $X = [X_{1}, X_{2}, \dots, X_{N}]$ , where each $X_{i}$ is i.i.d. with $X_{i} \sim Sub (c^{2})$ . Then

(6.2)¶

E (‖ X ‖_{2}^{2}) = N σ^{2} .

Moreover, for any $α \in (0, 1)$ and for any $β \in [\frac{c^{2}}{σ^{2}}, β_{max}]$ , there exists a constant $κ^{*} \geq 4$ depending only on $β_{max}$ and the ratio $\frac{σ^{2}}{c^{2}}$ such that

(6.3)¶

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq \exp (- \frac{N (1 - α)^{2}}{κ^{*}})

and

(6.4)¶

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq \exp (- \frac{N (β - 1)^{2}}{κ^{*}})

First equation gives the average value of the square of the norm.
Second inequality states the upper bound on the probability that norm could reduce beyond a factor given by $α < 1$ .
Third inequality states the upper bound on the probability that norm could increase beyond a factor given by $β > 1$ .
Note that if $X_{i}$ are strictly subgaussian, then $c = σ$ . Hence $β \in (1, β_{max})$ .

Proof. Since $X_{i}$ are independent hence

E [‖ X ‖_{2}^{2}] = E [\sum_{i = 1}^{N} X_{i}^{2}] = \sum_{i = 1}^{N} E [X_{i}^{2}] = N σ^{2} .

This proves the first part.

Now let us look at (6.4).

By applying Markov’s inequality for any $λ > 0$ we have:

\begin{aligned} P (‖ X ‖_{2}^{2} \geq β N σ^{2}) & = P (\exp (λ ‖ X ‖_{2}^{2}) \geq \exp (λ β N σ^{2})) \\ \leq \frac{E (\exp (λ ‖ X ‖_{2}^{2}))}{\exp (λ β N σ^{2})} = \frac{\prod_{i = 1}^{N} E (\exp (λ X_{i}^{2}))}{\exp (λ β N σ^{2})} \end{aligned}

Since $X_{i}$ is $c$ -subgaussian, hence from \cref {lem:subgaussian_exp_square_moment} we have

E (\exp (λ X_{i}^{2})) = E (\exp (\frac{2 c^{2} λ X_{i}^{2}}{2 c^{2}})) \leq \frac{1}{\sqrt{1 - 2 c^{2} λ}} .

Thus:

\prod_{i = 1}^{N} E (\exp (λ X_{i}^{2})) \leq {(\frac{1}{\sqrt{1 - 2 c^{2} λ}})}^{\frac{N}{2}} .

Putting it back we get:

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq {(\frac{\exp (- 2 λ β σ^{2})}{\sqrt{1 - 2 c^{2} λ}})}^{\frac{N}{2}} .

Since above is valid for all $λ > 0$ , we can minimize the R.H.S. over $λ$ by setting the derivative w.r.t. $λ$ to $0$ .

Thus we get optimum $λ$ as:

λ = \frac{β σ^{2} - c^{2}}{2 c^{2} σ^{2} (1 + β)} .

Plugging this back we get:

P (‖ X ‖_{2}^{2} \geq β N σ^{2}) \leq {(β \frac{σ^{2}}{c^{2}} \exp (1 - β \frac{σ^{2}}{c^{2}}))}^{\frac{N}{2}} .

Similarly proceeding for (6.3) we get

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq {(α \frac{σ^{2}}{c^{2}} \exp (1 - α \frac{σ^{2}}{c^{2}}))}^{\frac{N}{2}} .

We need to simplify these equations. We will do some jugglery now. Consider the function

f (γ) = \frac{2 (γ - 1)^{2}}{(γ - 1) - \ln γ} \forall γ > 0.

By differentiating twice, we can show that this is a strictly increasing function. Let us have $γ \in (0, γ_{max}]$ . Define

κ^{*} = max (4, \frac{2 (γ_{max} - 1)^{2}}{(γ_{max} - 1) - \ln γ_{max}})

Clearly

κ^{*} \geq \frac{2 (γ - 1)^{2}}{(γ - 1) - \ln γ} \forall γ \in (0, γ_{max}] .

Which gives us:

\ln (γ) \leq (γ - 1) - \frac{2 (γ - 1)^{2}}{κ^{*}} .

Hence by exponentiating on both sides we get:

γ \leq \exp [(γ - 1) - \frac{2 (γ - 1)^{2}}{κ^{*}}] .

By slight manipulation we get:

γ \exp (1 - γ) \leq \exp [\frac{2 (1 - γ)^{2}}{κ^{*}}] .

We now choose

γ = α \frac{σ^{2}}{c^{2}}

Substituting we get:

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq {(γ \exp (1 - γ))}^{\frac{N}{2}} \leq \exp [\frac{N (1 - γ)^{2}}{κ^{*}}] .

Finally

c \geq σ ⟹ \frac{σ^{2}}{c^{2}} \leq 1 ⟹ γ \leq α ⟹ 1 - γ \geq 1 - α

Thus we get

P (‖ X ‖_{2}^{2} \leq α N σ^{2}) \leq \exp [\frac{N (1 - α)^{2}}{κ^{*}}] .

Similarly by choosing $γ = β \frac{σ^{2}}{c^{2}}$ proves the other bound.

We can now map $γ_{max}$ to some $β_{max}$ by:

γ_{max} = \frac{β_{max} σ^{2}}{c^{2}} .

This result tells us that given a vector with entries drawn from a subgaussian distribution, we can expect the norm of the vector to concentrate around its expected value $N σ^{2}$ .

Topics in Signal Processing

Subgaussian Distributions

Contents

6.8. Subgaussian Distributions¶

6.8.1. Characterization¶

6.8.2. More Properties¶

6.8.3. Subgaussian Random Vectors¶