Consistent way of ensuring i.i.d. condition for infinitely many random variables

In probability theory, the term i.i.d. condition is loosely stated to describe random variables that are independent of each other and are identically distributed. This is equivalent to saying that there is a joint distribution whose marginals are identical and independent of each other. In an infinite dimensional setting, Daniel-Kolmogorov theorem is invoked to explore the conditions under which such a joint distribution can be established.

Main question about i.i.d. condition

In an elementary probably class, we learn that F(x)F(x) is defined to be probability that certain random variable is less than or equal to some value xx. Can we reverse it? In other words, given distribution function FF, can we ensure that there is a corresponding random variable XX satisfying P({wX(w)x})=F(x)\mathbb{P}(\{w \mid X(w) \le x\}) = F(x)?

Well, this is a result of the existence of Lebesgue-Stieltjes measure as a consequence of the renowned Caratheodory extension theorem: as long as FF is right-continuous, non decreasing, limxF(x)=1\lim_{x\to\infty} F(x) = 1 and limxF(x)=0\lim_{x\to -\infty} F(x) = 0, the existence is ensured.

Caratheodory extension theorem proceeds by initially assuming a pre-measure on an algebra, which is a collection of sets closed under finite union and intersection. This pre-measure is then expanded to an outer measure that applied to all subsets of a given set Ω\Omega. This outer measure qualifies as a measure on the collection of Caratheodory-measurable sets, which form a σ\sigma-algebra. The theorem additionally stipulates that σ\sigma-finiteness makes this extension unique.

Now going back to our case of finding a random variable XX, we can simply put (Ω,F,P)=(R,B(R),μF) (\Omega, \mathcal{F}, \mathbb{P}) = (\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu_F) and set X(w)=wX(w) = w, where μF\mu_F is a Lebesgue Stieltjes measure. Then we have P(wX(w)x)=μF((,x])=F(x)\mathbb{P}(w \mid X(w) \le x) = \mu_F((-\infty, x]) = F(x).

More generally, this tells us how to construct a random vector (X1,X2,,Xn)(X_1, X_2, \ldots, X_n) out of FX1,X2,,Xn(x1,x2,,xn)F_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n).

How about an entire sequence X1,X2,X_1, X_2, \ldots for which we prescribe the finite-dimensional distribution function FX1,X2,,Xn(x1,x2,,xn)=i=1nFXi(xi)F_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n) = \prod_{i=1}^n F_{X_i}(x_i) for every nNn \in \mathbb{N}?

In probability theory textbooks, this condition is loosely referred to as “i.i.d. condition” of infinitely many random variables. In order to show that this is feasible, one way is to establish some probability measure P\mathbb{P} whose marginal distribution corresponds to every random variable in the sequence.


Daniel-Kolmogorov theorem

Let {Fτ}τT\{F_\tau\}_{\tau \in T} be a given family of finite-dimensional probability distribution functions, and denote by {μτ}τT\{\mu_\tau\}_{\tau \in T} the corresponding (induced) distributions. If these distributions satisfy the consistency conditions

1) If s=(ti1,,tin)s = (t_{i_1}, \ldots, t_{i_n}) is a permutation of τ=(t1,,tn)\tau = (t_1, \ldots, t_n) then for any Borel sets A1,,AnA_1, \ldots, A_n of the real line we have

μτ(A1××An)=μs(Ai1××Ain) \mu_\tau(A_1 \times \cdots \times A_n) = \mu_s(A_{i_1} \times \cdots \times A_{i_n})

2) If t=(t1,,tn)τnt = (t_1, \ldots, t_n) \in \tau_n and s=(t1,,tn,tn+1)τn+1s = (t_1, \ldots, t_n, t_{n+1}) \in \tau_{n+1} then for any Borel set BB(Rn)B \in \mathcal{B}(\mathbb{R}^n) we have

μτ(B)=μs(B×R) \mu_\tau(B) = \mu_s(B \times \mathbb{R})

then on the “canonical” space (Ω,F) (\Omega, \mathcal{F}) , there exists a probability measure P\mathbb{P} such that

μ(t1,,tn)(A)=P((Xt1,,Xtn)A) \mu_{(t_1, \ldots, t_n)}(A) = \mathbb{P}((X_{t_1}, \ldots, X_{t_n}) \in A)

with AB(Rn)A \in \mathcal{B}(\mathbb{R}^n) for every nNn \in \mathbb{N}.

The proof follows the natural logic of reduction, detailed here, where the author conveniently assumed that τ\tau is in time domain. It largely follows the following four steps.

1) We begin by defining μ(Cτ(B)):=μτ(B)\mu(C_{\tau}(B)) := \mu_\tau(B) for BB(Rn)B \in \mathcal{B}(\mathbb{R}^n) where the cylinder set is defined by

Cτ(B)={fA(R0,R):(f(t1),,f(tn))B}. C_{\tau}(B) = \{ f \in \mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R}) : (f(t_1), \ldots, f(t_n)) \in B\}.

Then we have μ(A(R0,R))=1\mu(\mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R})) = 1, and it is well-defined due to condition 1 and 2 (think when BB is a rectangular measurable set; the permutation condition is necessary).

2) We can check each cylinder set satisfies the set conditions of Caratheodory extension theorem and since μ\mu has ambient value 1, it is σ\sigma-finite. Therefore, it suffices to show that μ\mu is countably additive.

3) We reformulate the problem into showing that μ(Dn)0\mu(D_n) \to 0 by some sequence of “decreasing” set DnD_n. Note that this reduces the problem into the one involving Borel sets by denoting

Dn={f:(f(t1),,f(tn))Bn} D_n = \{ f : (f(t_1), \ldots, f(t_n)) \in B_n \}

for some “decreasing” sequence of sets BnB(Rn)B_n \in \mathcal{B}(\mathbb{R}^n). This makes the problem easier to handle.

4) We can approximate BnB_n with compact sets KnK_n. This approximation allows us to use a limiting argument. Assuming that μ(Dn)\mu(D_n) does not converge to zero, we can extract some nice sequence of points satisfying (x1,,xn)Kn(x_1, \ldots, x_n) \in K_n for every nNn \in \mathbb{N}, and this leads to the contradiction because

{fA(R0,R):(f(t1),,f(tn))(x1,,xn)}Dn \{ f \in \mathcal{A}(\mathbb{R}_{\ge 0}, \mathbb{R}) : (f(t_1), \ldots, f(t_n)) \in (x_1, \ldots, x_n)\} \subseteq D_n

for every nn, and nNDn.\bigcap_{n \in \mathbb{N}} D_n \neq \varnothing.


Continuing with i.i.d.

Let T=NT = \mathbb{N} and define

Fτ(x1,,xn)=F(x1)F(xn) F_{\tau}(x_1, \ldots, x_n) = F(x_1)\cdots F(x_n)

for (x1,,xn)Rn(x_1, \ldots, x_n) \in \mathbb{R}^n with τ=(t1,,tn)τn\tau = (t_1, \ldots, t_n) \in \tau_n and nNn \in \mathbb{N}. This is a probability distribution function on Rn\mathbb{R}^n and induces a probability measure μτ=μFτ=μμ\mu_\tau = \mu_{F_\tau} = \mu \otimes \cdots \otimes \mu, the nn-product measure of μ=μF\mu = \mu_F with itself on B(Rn)\mathcal{B}(\mathbb{R}^n). It is clear that the family {μτ}τT\{\mu_\tau\}_{\tau \in T} satisfies the consistency conditions above in the theorem.

According to the theorem, there exists a probability measure P\mathbb{P} on (Ω,F) (\Omega, \mathcal{F}) with

P{wΩ:w(t1)A1,,w(tn)An}=μ(t1,,tn)(A1××An)=μ(A1)μ(An)=j=1nP{wΩ:w(tj)Aj} \mathbb{P}\{w \in \Omega : w(t_1) \in A_1, \ldots, w(t_n) \in A_n\} = \mu_{(t_1, \ldots, t_n)}(A_1 \times \cdots \times A_n) = \mu(A_1)\cdots \mu(A_n) = \prod_{j=1}^n \mathbb{P}\{w \in \Omega : w(t_j) \in A_j\}

for every A1,,AnB(R),nNA_1, \ldots, A_n \in \mathcal{B}(\mathbb{R}), n \in \mathbb{N}. But this means that the random variables Xn(w):=w,nNX_n(w) := w, n \in \mathbb{N} are independent.


Application of Daniel-Kolmogorov in the construction of Brownian motion

On a side note, this is very similar to how Brownian motion is constructed.

Consider

C={wR[0,]:(w(t1),,w(tn))A} C = \{ w \in \mathbb{R}^{[0,\infty]} : (w(t_1), \ldots, w(t_n)) \in A\}

and let CC denote the field of all such sets. Further, let the σ\sigma-algebra generated by this set be denoted by B(R[0,])\mathcal{B}(\mathbb{R}^{[0,\infty]}).

Using Daniel-Kolmogorov theorem, we could establish the probability measure P\mathbb{P} on (R[0,],B(R[0,]))(\mathbb{R}^{[0,\infty]}, \mathcal{B}(\mathbb{R}^{[0,\infty]})), under which the coordinate mapping process Bt(w)=w(t)B_t(w) = w(t); wR[0,],t0w \in \mathbb{R}^{[0,\infty]}, t \ge 0 has stationary, independent increments. Further, an increment BtBsB_t - B_s where 0s<t0 \le s < t is normally distributed with mean zero and variance tst - s (we can do this by explicitly writing down joint density formulation).

In order to establish the continuity, we invoke the modification theorem by Kolmogorov and Centov, whose proof is detailed in Karatzas and Shreve:


Suppose that a process X={Xt;0tT}X = \{X_t; 0 \le t \le T\} on a probability space (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) satisfies the condition

E(XtXsα)Cts1+β,0s,tT \mathbb{E}(|X_t - X_s|^{\alpha}) \le C |t - s|^{1 + \beta}, \quad 0 \le s, t \le T

for some positive constants α,β,\alpha, \beta, and CC. Then there exists a continuous modification X~={Xt~;0tT}\tilde{X} = \{\tilde{X_t}; 0 \le t \le T\} of XX, which is locally Hölder-continuous with exponent γ\gamma for every γ(0,β/α)\gamma \in (0, \beta/\alpha), i.e.,

P{w:sup0ts<h(w),s,t[0,T]Xt~(w)Xs~(w)tsγδ}=1 \mathbb{P}\bigl\{ w : \sup_{0 \le t-s < h(w), s,t \in [0,T]} \frac{|\tilde{X_t}(w) - \tilde{X_s}(w)|}{|t - s|^\gamma} \le \delta \bigr\} = 1

where h(w)h(w) is an a.s. positive random variable and δ>0\delta > 0 is an appropriate constant.


In our case, the condition in particular holds with α=2\alpha = 2 and β=0\beta = 0. Since Hölder continuity implies continuity, we are done with the construction of Brownian motion.

By the way, there are other ways of constructing Brownian motion, one of which involves constructing Brownian motion in dyadic rationals in an interval [0,1][0,1] (by explicitly writing down the joint distribution), filling the gaps (using uniform continuity), and patching the rest, which is also elaborated in Karatzas and Shreve.