RegReg algorithms

This page gives an overview of the types of problems RegReg is designed to solve and discusses some of the algorithmic tools available to users.

Regularized regression

RegReg is a framework for solving regularized regression problems such as the LASSO

\[\mbox{minimize}_{\beta} \frac{1}{2}||y - X\beta||^{2}_{2} + \lambda||\beta||_1\]

and many others. The general problem is to minimize a combination of a differentiable function and certain non-differentiable penalties or constraints, i.e. a composite problem.

This includes many problems popular in applied statistics, such as

  • the LASSO in constraint form

    \[\frac{1}{2}||y - X\beta||^{2}_{2} \ \text{subject to} \ ||\beta||_{1} \leq \delta\]
  • the group LASSO

    \[\frac{1}{2}||y - X\beta||^{2}_{2} + \lambda \sum_j \|\beta_j\|_{K_J}, \qquad \|\eta\|_{K_j} = \sqrt{\eta^T K \eta}\]
  • \(\ell_2\) regularized logistic regression

    \[-2\left(Y^TX\beta - \sum_i \log \left[ 1 + \exp(x_i^T\beta) \right] \right) + \lambda \|\beta\|_2^2, \qquad Y_i \in \{0,1\}\]

The generic problem

RegReg is designed to solve the generic problem

\[\mbox{minimize}_\beta \quad \mathcal{L}(\beta) + \mathcal{P}(\beta)\]

where \(\mathcal{P}\) is a convex function with a simple proximal map. That is, the problem

\[\mbox{minimize}_{\beta} \frac{\epsilon}{2} \|\beta\|^2_2 + \alpha^T\beta + \mathcal{P}(\beta)\]

is easy to solve, perhaps in closed form.

Many popular seminorms fall into this framework, for example

  • the \(\ell_1\) norm

  • the \(\ell_2\) norm

  • the element-wise sum of the positive part of a vector

Constraints related to these seminorms can also be represented as affine compositions to support functions, for example,

  • \(\|D \beta + \alpha\|_1 \leq c\)

  • \(\|D\beta + \alpha\|_2 \leq c\)

  • \(\beta \geq 0\) element-wise.

Algorithms

The core RegReg algorithm is generalized gradient descent. In particular, the RegReg implementation is based on [FISTA]. This can be used to solve many problems directly, and to solve subproblems in more complicated strategies such as solving the dual problem associated with the general problem described above

\[\mbox{minimize}_u \quad \frac{1}{2} \| z - \sum_{i \in \mathcal{I}} D_i^T u_i\|_2^2 \quad \mbox{s.t.} \quad u_i \in \lambda_i K_i\]

or an [ADMM] approach

\[\frac{1}{2}||y - \beta||^{2}_{2} + \sum_i \lambda_i \|z_i\|_1 + \sum_i u_i^T(z_i - D_i \beta) + \frac{\rho}{2} \sum_i \|z_i - D_i\beta\|_2^2\]

Optimization strategies

RegReg offers support for a variety of optimization strategies including

  • Primal generalized gradient descent

    • The current implementation is the based on the [FISTA] framework.

    • The primal problem is solved by solving a series of proximal problems.

    • Often the proximal problems are very easy to solve. Sometimes they are more complicated - then the proximal problems can be solved via their dual, which can be solved directly by proximal gradient descent, though each iteration will involve solving an unpenalized version of the original problem.

  • Dual generalized gradient ascent

    • If the conjugate of the differentiable part of the objective is known, then generalized gradient descent can be used to solve the dual problem.

    • If the conjugate is not known, it can be evaluated by solving an optimization problem with proximal gradient descent.

  • ADMM

    • The general problem can be solved directly with ADMM

    • This approach has the advantage of making it easy to parallelize many problems across features or observations

  • Seminorm smoothing

    • The function \(P(\beta)\) can be approximated by a smooth function as in [TFOCS] or [NESTA]. See

    • This approximate problem can be solved directly using simple gradient methods

    • See <Smoothing>

ADMM

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” (http://www.stanford.edu/~boyd//papers/pdf/admm_distr_stats.pdf)

FISTA(1,2)

Beck, A., Teboulle, M. “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems” (http://iew3.technion.ac.il/~becka/papers/71654.pdf)

NESTA

Becker, S., Bobin, J. Candes, E. “A Fast and Accurate First-order Method for Sparse Recovery” (https://statweb.stanford.edu/~candes/nesta/nesta.html)

TFOCS

Becker, S., Candes, E. Grant, S. “TFOCS: Templates for First-Order Conic Solvers” (http://tfocs.stanford.edu/)