Convergence of Hard Thresholding Algorithm

We present a greedy-based approximation algorithm designed to reconstruct the vector $x$. It is applicatble to matrices $A$ that satisfy the condition $\delta_{3s}\leq \frac{1}{12}$. Additionally, the article includes a proof demonstrating the convergence of this algorithm.

Context

Let ARN×pA \in \mathbb{R}^{N \times p} be a “compressing matrix,” where NpN \ll p. Let Az=yAz = y. It is generally not possible to recover xx without any further constraint. The field of compressed sensing adds the constraint that zz is ss-sparsesparse and asks the following two questions.

(1) What matrix AA should we use to ensure the perfect recovery of the sparse vector xx?

(2) How do we go about finding such algorithm?

For what follows, an excellent reference is available here.

It can be shown that the problem of

minz{s-sparse}z1subject toAz=Ax for an s-sparse vector x \min_{z \in \{s\text{-sparse}\}} \|z\|_1 \quad \text{subject to} \quad Az = Ax \text{ for an } s\text{-sparse vector } x

has unique solution if and only if AA satisfies the restricted nullspace property. There are two sufficient conditions of AA that enable this.

(1) Pairwise incoherence

ASTASI12s,Ss \|A_S^T A_S - I\|_\infty \leq \frac{1}{2s}, \quad \forall |S| \leq s

(2) Restricted Isometry property

δ=maxS:S=sASTASI213 \delta = \max_{S:|S|=s} \|A_S^T A_S - I\|_2 \leq \frac{1}{3}

One way to obtain such matrix AA in (1) is to utilize the randomized matrix by Johnson-Lindenstrauss type inequality. However, the resulting dimension NN is not generally small enough. Specifically, we want O(s)O(s) but we have an order of O(s2)O(s^2).

For condition (2), while the complexity of NN is more favorable with O(sconstant)O(s * constant), the difficulty lies in the fact that the only matrices known to satisfy this condition are those that are randomized by normal distribution.

Given these theoretical challenges, I want to introduce an elegant greedy-based approximation algorithm to recover xx that applies to the matrix AA with δ3s112\delta_{3s} \leq \frac{1}{12}, and provides proof of its convergence.


Iterative Hard Thresholding Algorithm

Algorithm: Iterative Sparse Minimization

Objective: Minimize the L1L_1-norm of zz subject to Az=yAz = y.

Initialization: Choose an initial guess z0z_0 and set t=0t = 0.

Repeat until convergence:

  1. Gradient Step:

    Update aa by:

    at+1=ztAT(Azty) a_{t+1} = z_t - A^T(Az_t - y)

  2. Sparsity Constraint:

    Update zz by finding the closest ss-sparse vector to at+1a_{t+1}:

    zt+1=argminz{s-sparse}at+1z2 z_{t+1} = \arg\min_{z \in \{s\text{-sparse}\}} \|a_{t+1} - z\|_2

  3. Update Iteration:

    Set t=t+1t = t + 1 .

Output: Return ztz_t as the solution.



Theorem Let RIP constant δ3s112\delta_{3s} \leq \frac{1}{12}. Then we have linear convergence of the above algorithm. That is,

zt+1z2βz0z2 \|z_{t+1} - z^*\|_2 \leq \beta \|z_0 - z^*\|_2

for some β<1\beta < 1.

Proof: Let S=supp(z)supp(zt+1)S = supp(z^*) \cup supp(z_{t+1}). We have

zt+1z2=zt+1,SzS2zt+1,Sat+1,S2+at+1,SzS2zSat+1,S2+at+1,SzS2=2zSat+1,S2 \begin{aligned} \|z_{t+1} - z^*\|_2 &= \|z_{t+1,S} - z^*_S\|_2 \\ &\leq \|z_{t+1,S} - a_{t+1,S}\|_2 + \|a_{t+1,S} - z^*_S\|_2 \\ &\leq \|z^*_S - a_{t+1,S}\|_2 + \|a_{t+1,S} - z^*_S\|_2 \\ &= 2\|z^*_S - a_{t+1,S}\|_2 \end{aligned}

Now, we have

zSat+1,S2=zSzt,S+ASTA(ztz)2=rt,S+ASTArt2=rt,S+ASTArt,S+ASTArt,SC2=rt,S+ASTArt,S2+ASTArt,SC2=(IASTAS)rt2+ASTASCrt2δ3srt2+ASTAS/Srt2=δ3srt2+ASTAS/Srt2 \begin{aligned} \|z^*_S - a_{t+1,S}\|_2 &= \|z^*_S - z_{t,S} + A_S^T A (z_t - z^*)\|_2 \\ &= \|-r_{t,S} + A_S^T A r_t\|_2 \\ &= \|-r_{t,S} + A_S^T A r_{t,S} + A_S^T A r_{t,S^C}\|_2 \\ &= \|-r_{t,S} + A_S^T A r_{t,S}\|_2 + \|A_S^T A r_{t,S^C}\|_2 \\ &= \|(I - A_S^T A_S) r_t\|_2 + \|A_S^T A_{S^C} r_t\|_2 \\ &\leq \delta_{3s} \|r_t\|_2 + \|A_S^T A_{S^-/S} r_t\|_2 \\ &= \delta_{3s} \|r_t\|_2 + \|A_S^T A_{S^-/S} r_t\|_2 \end{aligned}

where rtr_t is supported on SS^-, rt:=ztzr_t := z_t - z^* and ASA_S is a matrix with zero on columns SS.

Now, it suffices to show that

ASTAS/S22δ3s, \|A_S^T A_{S^-/S}\|_2 \leq 2\delta_{3s},

since we would then have the desired inequality with

β=12t \beta = \frac{1}{2^t}

by combining two inequalities above.

To this end, let x2=y2=1\|x\|_2 = \|y\|_2 = 1. We have

xTASTAS/Sy=xST(ATA)yS/S=14(AxS+AyS/S22AxSAyS/S22)=14(xS+yS/S22+(xS+yS/S)T(ATAI)(xS+yS/S)xSyS/S22(xSyS/S)T(ATAI)(xSyS/S))=14((xS+yS/S)T(ATAI)(xS+yS/S)(xSyS/S)T(ATAI)(xSyS/S))2δ3s \begin{aligned} x^T A_S^T A_{S^-/S} y &= x_S^T (A^T A) y_{S^-/S} \\ &= \frac{1}{4} \left( \|Ax_S + Ay_{S^-/S}\|_2^2 - \|Ax_S - Ay_{S^-/S}\|_2^2 \right) \\ &= \frac{1}{4} \left( \|x_S + y_{S^-/S}\|_2^2 + (x_S + y_{S^-/S})^T (A^T A - I)(x_S + y_{S^-/S}) \right. \\ &\quad \left. - \|x_S - y_{S^-/S}\|_2^2 - (x_S - y_{S^-/S})^T (A^T A - I)(x_S - y_{S^-/S}) \right) \\ &= \frac{1}{4} \left( (x_S + y_{S^-/S})^T (A^T A - I)(x_S + y_{S^-/S}) \right. \\ &\quad \left. - (x_S - y_{S^-/S})^T (A^T A - I)(x_S - y_{S^-/S}) \right) \\ &\leq 2 \delta_{3s} \end{aligned}

where the last inequality is due to the fact that xS±yS/Sx_S \pm y_{S^-/S} is 3s-sparse (recall that S=supp(rt)supp(z)supp(zt)S^- = supp(r_t) \subset supp(z^*) \cup supp(z_t)). Therefore, we have established the claim

zt+1z2βz0z2 \|z_{t+1} - z^*\|_2 \leq \beta \|z_0 - z^*\|_2

with

β=12t. \beta = \frac{1}{2^t}.

Conclusion

We have established the convergence of the Hard Thresholding algorithm for reconstructing the vector x. In today’s landscape, where data storage is relatively inexpensive, the practicality of compressed sensing, or reconstruction problems, may not be immediately relevant for everyday data scientists. Nevertheless, the underlying mathematics of this field is quite fascinating, particularly in the context of exploring different trade-offs.

Bibiliography

Foucart, S., & Rauhut, H. (2012). A Mathematical Introduction to Compressive Sensing. Springer.