Home

Consistent way of ensuring i.i.d. condition for infinitely many random variables

In probability theory, the term i.i.d. condition is loosely stated to describe random variables that are independent of each other and are identically distributed. This is equivalent to saying that there is a joint distribution whose marginals are identical and independent of each other. In an infinite dimensional setting, Daniel-Kolmogorov theorem is invoked to explore the conditions under which such a joint distribution can be established.

Published

11 February 2024

Main question about i.i.d. condition

In an elementary probably class, we learn that $F(x)$ is defined to be probability that certain random variable is less than or equal to some value $x$ . Can we reverse it? In other words, given distribution function $F$ , can we ensure that there is a corresponding random variable $X$ satisfying $\mathbb{P}(\{w \mid X(w) \le x\}) = F(x)$ ?

Well, this is a result of the existence of Lebesgue-Stieltjes measure as a consequence of the renowned Caratheodory extension theorem: as long as $F$ is right-continuous, non decreasing, $\lim_{x\to\infty} F(x) = 1$ and $\lim_{x\to -\infty} F(x) = 0$ , the existence is ensured.

Caratheodory extension theorem proceeds by initially assuming a pre-measure on an algebra, which is a collection of sets closed under finite union and intersection. This pre-measure is then expanded to an outer measure that applied to all subsets of a given set $\Omega$ . This outer measure qualifies as a measure on the collection of Caratheodory-measurable sets, which form a $\sigma$ -algebra. The theorem additionally stipulates that $\sigma$ -finiteness makes this extension unique.

Now going back to our case of finding a random variable $X$ , we can simply put $(\Omega, \mathcal{F}, \mathbb{P}) = (\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_F)$ and set $X(w) = w$ , where $\mu_F$ is a Lebesgue Stieltjes measure. Then we have $\mathbb{P}(w \mid X(w) \le x) = \mu_F((-\infty, x]) = F(x)$ .

More generally, this tells us how to construct a random vector $(X_1, X_2, \ldots, X_n)$ out of $F_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n)$ .

How about an entire sequence $X_1, X_2, \ldots$ for which we prescribe the finite-dimensional distribution function $F_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n) = \prod_{i=1}^n F_{X_i}(x_i)$ for every $n \in \mathbb{N}$ ?

In probability theory textbooks, this condition is loosely referred to as “i.i.d. condition” of infinitely many random variables. In order to show that this is feasible, one way is to establish some probability measure $\mathbb{P}$ whose marginal distribution corresponds to every random variable in the sequence.

Daniel-Kolmogorov theorem

Let $\{F_\tau\}_{\tau \in T}$ be a given family of finite-dimensional probability distribution functions, and denote by $\{\mu_\tau\}_{\tau \in T}$ the corresponding (induced) distributions. If these distributions satisfy the consistency conditions

1) If $s = (t_{i_1}, \ldots, t_{i_n})$ is a permutation of $\tau = (t_1, \ldots, t_n)$ then for any Borel sets $A_1, \ldots, A_n$ of the real line we have

$\mu_\tau(A_1 \times \cdots \times A_n) = \mu_s(A_{i_1} \times \cdots \times A_{i_n})$

2) If $t = (t_1, \ldots, t_n) \in \tau_n$ and $s = (t_1, \ldots, t_n, t_{n+1}) \in \tau_{n+1}$ then for any Borel set $B \in \mathcal{B}(\mathbb{R}^n)$ we have

$\mu_\tau(B) = \mu_s(B \times \mathbb{R})$

then on the “canonical” space $(\Omega, \mathcal{F})$ , there exists a probability measure $\mathbb{P}$ such that

$\mu_{(t_1, \ldots, t_n)}(A) = \mathbb{P}((X_{t_1}, \ldots, X_{t_n}) \in A)$

with $A \in \mathcal{B}(\mathbb{R}^n)$ for every $n \in \mathbb{N}$ .

The proof follows the natural logic of reduction, detailed here, where the author conveniently assumed that $\tau$ is in time domain. It largely follows the following four steps.

1) We begin by defining $\mu(C_{\tau}(B)) := \mu_\tau(B)$ for $B \in \mathcal{B}(\mathbb{R}^n)$ where the cylinder set is defined by

$C_{\tau}(B) = \{ f \in \mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R}) : (f(t_1), \ldots, f(t_n)) \in B\}.$

Then we have $\mu(\mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R})) = 1$ , and it is well-defined due to condition 1 and 2 (think when $B$ is a rectangular measurable set; the permutation condition is necessary).

2) We can check each cylinder set satisfies the set conditions of Caratheodory extension theorem and since $\mu$ has ambient value 1, it is $\sigma$ -finite. Therefore, it suffices to show that $\mu$ is countably additive.

3) We reformulate the problem into showing that $\mu(D_n) \to 0$ by some sequence of “decreasing” set $D_n$ . Note that this reduces the problem into the one involving Borel sets by denoting

$D_n = \{ f : (f(t_1), \ldots, f(t_n)) \in B_n \}$

for some “decreasing” sequence of sets $B_n \in \mathcal{B}(\mathbb{R}^n)$ . This makes the problem easier to handle.

4) We can approximate $B_n$ with compact sets $K_n$ . This approximation allows us to use a limiting argument. Assuming that $\mu(D_n)$ does not converge to zero, we can extract some nice sequence of points satisfying $(x_1, \ldots, x_n) \in K_n$ for every $n \in \mathbb{N}$ , and this leads to the contradiction because

$\{ f \in \mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R}) : (f(t_1), \ldots, f(t_n)) \in (x_1, \ldots, x_n)\} \subseteq D_n$

for every $n$ , and $\bigcap_{n \in \mathbb{N}} D_n \neq \varnothing.$

Continuing with i.i.d.

Let $T = \mathbb{N}$ and define

$F_{\tau}(x_1, \ldots, x_n) = F(x_1)\cdots F(x_n)$

for $(x_1, \ldots, x_n) \in \mathbb{R}^n$ with $\tau = (t_1, \ldots, t_n) \in \tau_n$ and $n \in \mathbb{N}$ . This is a probability distribution function on $\mathbb{R}^n$ and induces a probability measure $\mu_\tau = \mu_{F_\tau} = \mu \otimes \cdots \otimes \mu$ , the $n$ -product measure of $\mu = \mu_F$ with itself on $\mathcal{B}(\mathbb{R}^n)$ . It is clear that the family $\{\mu_\tau\}_{\tau \in T}$ satisfies the consistency conditions above in the theorem.

According to the theorem, there exists a probability measure $\mathbb{P}$ on $(\Omega, \mathcal{F})$ with

$\mathbb{P}\{w \in \Omega : w(t_1) \in A_1, \ldots, w(t_n) \in A_n\} = \mu_{(t_1, \ldots, t_n)}(A_1 \times \cdots \times A_n) = \mu(A_1)\cdots \mu(A_n) = \prod_{j=1}^n \mathbb{P}\{w \in \Omega : w(t_j) \in A_j\}$

for every $A_1, \ldots, A_n \in \mathcal{B}(\mathbb{R}), n \in \mathbb{N}$ . But this means that the random variables $X_n(w) := w, n \in \mathbb{N}$ are independent.

Application of Daniel-Kolmogorov in the construction of Brownian motion

On a side note, this is very similar to how Brownian motion is constructed.

Consider

$C = \{ w \in \mathbb{R}^{[0,\infty]} : (w(t_1), \ldots, w(t_n)) \in A\}$

and let $C$ denote the field of all such sets. Further, let the $\sigma$ -algebra generated by this set be denoted by $\mathcal{B}(\mathbb{R}^{[0,\infty]})$ .

Using Daniel-Kolmogorov theorem, we could establish the probability measure $\mathbb{P}$ on $(\mathbb{R}^{[0,\infty]}, \mathcal{B}(\mathbb{R}^{[0,\infty]}))$ , under which the coordinate mapping process $B_t(w) = w(t)$ ; $w \in \mathbb{R}^{[0,\infty]}, t \ge 0$ has stationary, independent increments. Further, an increment $B_t - B_s$ where $0 \le s < t$ is normally distributed with mean zero and variance $t - s$ (we can do this by explicitly writing down joint density formulation).

In order to establish the continuity, we invoke the modification theorem by Kolmogorov and Centov, whose proof is detailed in Karatzas and Shreve:

Suppose that a process $X = \{X_t; 0 \le t \le T\}$ on a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ satisfies the condition

$\mathbb{E}(|X_t - X_s|^{\alpha}) \le C |t - s|^{1 + \beta}, \quad 0 \le s, t \le T$

for some positive constants $\alpha, \beta,$ and $C$ . Then there exists a continuous modification $\tilde{X} = \{\tilde{X_t}; 0 \le t \le T\}$ of $X$ , which is locally Hölder-continuous with exponent $\gamma$ for every $\gamma \in (0, \beta/\alpha)$ , i.e.,

$\mathbb{P}\bigl\{ w : \sup_{0 \le t-s < h(w), s,t \in [0,T]} \frac{|\tilde{X_t}(w) - \tilde{X_s}(w)|}{|t - s|^\gamma} \le \delta \bigr\} = 1$

where $h(w)$ is an a.s. positive random variable and $\delta > 0$ is an appropriate constant.

In our case, the condition in particular holds with $\alpha = 2$ and $\beta = 0$ . Since Hölder continuity implies continuity, we are done with the construction of Brownian motion.

By the way, there are other ways of constructing Brownian motion, one of which involves constructing Brownian motion in dyadic rationals in an interval $[0,1]$ (by explicitly writing down the joint distribution), filling the gaps (using uniform continuity), and patching the rest, which is also elaborated in Karatzas and Shreve.