Home

Convergence of Hard Thresholding Algorithm

We present a greedy-based approximation algorithm designed to reconstruct the vector $x$. It is applicatble to matrices $A$ that satisfy the condition $\delta_{3s}\leq \frac{1}{12}$. Additionally, the article includes a proof demonstrating the convergence of this algorithm.

Published

06 November 2023

Context

Let $A \in \mathbb{R}^{N \times p}$ be a “compressing matrix,” where $N \ll p$ . Let $Az = y$ . It is generally not possible to recover $x$ without any further constraint. The field of compressed sensing adds the constraint that $z$ is $s$ - $sparse$ and asks the following two questions.

(1) What matrix $A$ should we use to ensure the perfect recovery of the sparse vector $x$ ?

(2) How do we go about finding such algorithm?

For what follows, an excellent reference is available here.

It can be shown that the problem of

$\min_{z \in \{s\text{-sparse}\}} \|z\|_1 \quad \text{subject to} \quad Az = Ax \text{ for an } s\text{-sparse vector } x$

has unique solution if and only if $A$ satisfies the restricted nullspace property. There are two sufficient conditions of $A$ that enable this.

(1) Pairwise incoherence

$\|A_S^T A_S - I\|_\infty \leq \frac{1}{2s}, \quad \forall |S| \leq s$

(2) Restricted Isometry property

$\delta = \max_{S:|S|=s} \|A_S^T A_S - I\|_2 \leq \frac{1}{3}$

One way to obtain such matrix $A$ in (1) is to utilize the randomized matrix by Johnson-Lindenstrauss type inequality. However, the resulting dimension $N$ is not generally small enough. Specifically, we want $O(s)$ but we have an order of $O(s^2)$ .

For condition (2), while the complexity of $N$ is more favorable with $O(s * constant)$ , the difficulty lies in the fact that the only matrices known to satisfy this condition are those that are randomized by normal distribution.

Given these theoretical challenges, I want to introduce an elegant greedy-based approximation algorithm to recover $x$ that applies to the matrix $A$ with $\delta_{3s} \leq \frac{1}{12}$ , and provides proof of its convergence.

Iterative Hard Thresholding Algorithm

Algorithm: Iterative Sparse Minimization

Objective: Minimize the $L_1$ -norm of $z$ subject to $Az = y$ .

Initialization: Choose an initial guess $z_0$ and set $t = 0$ .

Repeat until convergence:

Gradient Step:

Update $a$ by:

$a_{t+1} = z_t - A^T(Az_t - y)$
Sparsity Constraint:

Update $z$ by finding the closest $s$ -sparse vector to $a_{t+1}$ :

$z_{t+1} = \arg\min_{z \in \{s\text{-sparse}\}} \|a_{t+1} - z\|_2$
Update Iteration:

Set $t = t + 1$ .

Output: Return $z_t$ as the solution.

Note that $-A^T(Az_t - y)$ is like gradient descent step.

Theorem Let RIP constant $\delta_{3s} \leq \frac{1}{12}$ . Then we have linear convergence of the above algorithm. That is,

$\|z_{t+1} - z^*\|_2 \leq \beta \|z_0 - z^*\|_2$

for some $\beta < 1$ .

Proof: Let $S = supp(z^*) \cup supp(z_{t+1})$ . We have

$\begin{aligned} \|z_{t+1} - z^*\|_2 &= \|z_{t+1,S} - z^*_S\|_2 \\ &\leq \|z_{t+1,S} - a_{t+1,S}\|_2 + \|a_{t+1,S} - z^*_S\|_2 \\ &\leq \|z^*_S - a_{t+1,S}\|_2 + \|a_{t+1,S} - z^*_S\|_2 \\ &= 2\|z^*_S - a_{t+1,S}\|_2 \end{aligned}$

Now, we have

$\begin{aligned} \|z^*_S - a_{t+1,S}\|_2 &= \|z^*_S - z_{t,S} + A_S^T A (z_t - z^*)\|_2 \\ &= \|-r_{t,S} + A_S^T A r_t\|_2 \\ &= \|-r_{t,S} + A_S^T A r_{t,S} + A_S^T A r_{t,S^C}\|_2 \\ &= \|-r_{t,S} + A_S^T A r_{t,S}\|_2 + \|A_S^T A r_{t,S^C}\|_2 \\ &= \|(I - A_S^T A_S) r_t\|_2 + \|A_S^T A_{S^C} r_t\|_2 \\ &\leq \delta_{3s} \|r_t\|_2 + \|A_S^T A_{S^-/S} r_t\|_2 \\ &= \delta_{3s} \|r_t\|_2 + \|A_S^T A_{S^-/S} r_t\|_2 \end{aligned}$

where $r_t$ is supported on $S^-$ , $r_t := z_t - z^*$ and $A_S$ is a matrix with zero on columns $S$ .

Now, it suffices to show that

$\|A_S^T A_{S^-/S}\|_2 \leq 2\delta_{3s},$

since we would then have the desired inequality with

$\beta = \frac{1}{2^t}$

by combining two inequalities above.

To this end, let $\|x\|_2 = \|y\|_2 = 1$ . We have

$\begin{aligned} x^T A_S^T A_{S^-/S} y &= x_S^T (A^T A) y_{S^-/S} \\ &= \frac{1}{4} \left( \|Ax_S + Ay_{S^-/S}\|_2^2 - \|Ax_S - Ay_{S^-/S}\|_2^2 \right) \\ &= \frac{1}{4} \left( \|x_S + y_{S^-/S}\|_2^2 + (x_S + y_{S^-/S})^T (A^T A - I)(x_S + y_{S^-/S}) \right. \\ &\quad \left. - \|x_S - y_{S^-/S}\|_2^2 - (x_S - y_{S^-/S})^T (A^T A - I)(x_S - y_{S^-/S}) \right) \\ &= \frac{1}{4} \left( (x_S + y_{S^-/S})^T (A^T A - I)(x_S + y_{S^-/S}) \right. \\ &\quad \left. - (x_S - y_{S^-/S})^T (A^T A - I)(x_S - y_{S^-/S}) \right) \\ &\leq 2 \delta_{3s} \end{aligned}$

where the last inequality is due to the fact that $x_S \pm y_{S^-/S}$ is 3s-sparse (recall that $S^- = supp(r_t) \subset supp(z^*) \cup supp(z_t)$ ). Therefore, we have established the claim

$\|z_{t+1} - z^*\|_2 \leq \beta \|z_0 - z^*\|_2$

with

$\beta = \frac{1}{2^t}.$

Conclusion

We have established the convergence of the Hard Thresholding algorithm for reconstructing the vector x. In today’s landscape, where data storage is relatively inexpensive, the practicality of compressed sensing, or reconstruction problems, may not be immediately relevant for everyday data scientists. Nevertheless, the underlying mathematics of this field is quite fascinating, particularly in the context of exploring different trade-offs.

Bibiliography

Foucart, S., & Rauhut, H. (2012). A Mathematical Introduction to Compressive Sensing. Springer.