RegReg algorithms¶
This page gives an overview of the types of problems RegReg is designed to solve and discusses some of the algorithmic tools available to users.
Regularized regression¶
RegReg is a framework for solving regularized regression problems such as the LASSO
and many others. The general problem is to minimize a combination of a differentiable function and certain non-differentiable penalties or constraints, i.e. a composite problem.
This includes many problems popular in applied statistics, such as
the LASSO in constraint form
\[\frac{1}{2}||y - X\beta||^{2}_{2} \ \text{subject to} \ ||\beta||_{1} \leq \delta\]the group LASSO
\[\frac{1}{2}||y - X\beta||^{2}_{2} + \lambda \sum_j \|\beta_j\|_{K_J}, \qquad \|\eta\|_{K_j} = \sqrt{\eta^T K \eta}\]\(\ell_2\) regularized logistic regression
\[-2\left(Y^TX\beta - \sum_i \log \left[ 1 + \exp(x_i^T\beta) \right] \right) + \lambda \|\beta\|_2^2, \qquad Y_i \in \{0,1\}\]
The generic problem¶
RegReg is designed to solve the generic problem
where \(\mathcal{P}\) is a convex function with a simple proximal map. That is, the problem
is easy to solve, perhaps in closed form.
Many popular seminorms fall into this framework, for example
the \(\ell_1\) norm
the \(\ell_2\) norm
the element-wise sum of the positive part of a vector
Constraints related to these seminorms can also be represented as affine compositions to support functions, for example,
\(\|D \beta + \alpha\|_1 \leq c\)
\(\|D\beta + \alpha\|_2 \leq c\)
\(\beta \geq 0\) element-wise.
Algorithms¶
The core RegReg algorithm is generalized gradient descent. In particular, the RegReg implementation is based on [FISTA]. This can be used to solve many problems directly, and to solve subproblems in more complicated strategies such as solving the dual problem associated with the general problem described above
or an [ADMM] approach
Optimization strategies¶
RegReg offers support for a variety of optimization strategies including
Primal generalized gradient descent
The current implementation is the based on the [FISTA] framework.
The primal problem is solved by solving a series of proximal problems.
Often the proximal problems are very easy to solve. Sometimes they are more complicated - then the proximal problems can be solved via their dual, which can be solved directly by proximal gradient descent, though each iteration will involve solving an unpenalized version of the original problem.
Dual generalized gradient ascent
If the conjugate of the differentiable part of the objective is known, then generalized gradient descent can be used to solve the dual problem.
If the conjugate is not known, it can be evaluated by solving an optimization problem with proximal gradient descent.
ADMM
The general problem can be solved directly with ADMM
This approach has the advantage of making it easy to parallelize many problems across features or observations
Seminorm smoothing
- ADMM
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” (http://www.stanford.edu/~boyd//papers/pdf/admm_distr_stats.pdf)
- FISTA(1,2)
Beck, A., Teboulle, M. “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems” (http://iew3.technion.ac.il/~becka/papers/71654.pdf)
- NESTA
Becker, S., Bobin, J. Candes, E. “A Fast and Accurate First-order Method for Sparse Recovery” (https://statweb.stanford.edu/~candes/nesta/nesta.html)
- TFOCS
Becker, S., Candes, E. Grant, S. “TFOCS: Templates for First-Order Conic Solvers” (http://tfocs.stanford.edu/)