Non-convex Robust PCA

Non-convex Robust PCA

Iterative Hard Thresholding for Sparse/Low-rank Linear Regression Prateek Jain Microsoft Research, India Ambuj Tewari Univ of Michigan Purushottam Kar MSR, India Praneeth Netrapalli MSR, NE

Microsoft Research India Our work Foundations Systems Applications Interplay of society and technology Academic and government outreach Our vectors of impact

Research impact Company Machine Learning and Optimization @ MSRI High-dimensional Learning & Optimization Manik Varma Prateek Jain Ravi Kannan

Amit Deshpande Navin Goyal Sundarajan S. Vinod Nair Sreangsu Acharyya Kush Bhatia Aditya Nori

Raghavendra Udupa Purushottam Kar Extreme Classification Online Learning / Multi-armed Bandits

We are Hiring! Interns Learning with Structured Losses PostDocs Distributed Machine Learning Applied Researchers Probabilistic Programming Full-time Researchers Privacy Preserving Machine Learning SVMs & Kernel Learning Learning in High-dimensions

2 2 Need to solve: can be: Set of sparse vectors Set of group-sparse vectors

Set of low-rank matrices Non-convex Comp. Complexity: NP-Hard Overview Most popular approach: convex relaxation Solvable in poly-time Guarantees under certain assumptions Slow in practice Practical Practical AlgorithmsAlgorithms

For High-d ML Problems For High-d ML Problems Theoretically Theoretically Provable Provable Algorithms Algorithms ForProblems High-d ML Problems For High-d ML

Results Sparse Regression [Garg & Khandekar. ICML 2008; J., Kar, Tewari. NIPS14] constraint Matrix Completion/Regression [J., Netrapalli, Sanghavi. STOC 2013; Hardt & Wooters. COLT 2014] Low-rank constraint Robust Regression [Loh & Wainwright. NIPS 2013 ; Bhatia, J., Kar. Submitted, 2015] Tensor Factorization and Completion [Anandkumar et al. Arxiv 2012; J., Oh. NIPS14] Low-tensor rank constraint Dictionary Learning [Agarwal et al. COLT 2014; Arora et al. COLT 2014] Non-convex bilinear form + Sparsity constraint

Phase Sensing [Netrapalli, J., Sanghavi. NIPS13 ; Candes et al. Arxiv2014] System of quadratic equations Low-rank matrix approximation [Bhojanapalli, J., Sanghavi. SODA15] 6 Outline Sparse Linear Regression Lasso Iterative Hard Thresholding Our Results

Low-rank Matrix Regression Low-rank Matrix Completion Conclusions Sparse Linear Regression = n But: sparse ( non-zeros) d

= Sparse Linear Regression : number of non-zeros NP-hard problem in general : non-convex function

Convex Relaxation Relaxed Problem: Non-differentiable Lasso Problem Known to promote sparsity Pros: a) Principled approach, b) Solid theoretical guarantees Cons: Slow to optimize Our Approach : Projected Gradient Descent

[Jain, Tewari, Kar2014] Projection onto ball? 0.1 0 .01 1 1 0.9 0.1 []

sort 1 1 0.9 0.1 0.1 0.01 [] Hard Thresholding

1 1 0.9 0 0 0 [] 3 () Convex-projections vs Non-convex Projections

convex set 1st order Optimality condition non-convex set 0-th order Optimality condition 0 order condition sufficient for convergence of Proj. Grad. Descent? In general, NO But, for certain specially structured problems, YES!!! Restricted Isometry Property (RIP) satisfies RIP if, for all sparse vectors acts as an Isometry

Formally: For all -sparse 2 2 2 (1 )| || || | | (1+ ) | | | Xw Popular RIP Ensembles

= ( 2 log ) Most popular examples: Proof under RIP

Assume: Recall: Hard Thresholding : Triangle inequality RIP [Blumensath & Davies09, Garg & Khandekar09] What if RIP is not possible?

Eigenvalues of So, doesnt hold even for infinite samples Problem is solvable for samples using standard regression Iterative Hard Thresholding: Larger Sparsity For t=1, 2, Stronger Projection Guarantee ||

2 ( ) || || ( ) | |2 dim of 2 2 Statistical Guarantees 2 min ( )=|| | |

Statistically: Known Computation Lower-bound: Same as Lasso sparse Recall, [J., Tewari, Kar2014]

General Result for Any Function satisfies RSC/RSS, i.e., IHT guarantee: After steps If and [J., Tewari, Kar2014] Extension to other Non-convex Procedures IHT-Fully Corrective HTP [Foucart12]

CoSAMP [Tropp & Neadell2008] Subspace Pursuit [Dai & Milenkovic2008] OMPR [J., Tewari, Dhillon2010] Partial hard thresholding and two-stage family [J., Tewari, Dhillon2010] Empirical Results Hard Thresholding Greedy 350x 90x

(d) =2 log ( ) , =300, =1 =2 log ( ) , =20,000, =1 More Empirical Results = 0 log Empirical Results: poor condition number

Low-rank Matrix Regression 2 2 matrix Low-rank Matrix Regression Convex relaxation: sum of singular values of Several interesting results: [Recht et al.2007, Negahban et al2009]

Projected Gradient Descent: where Statistical Guarantees [J., Tewari, Kar2014] Low-rank Matrix Completion set of known entries

Special case of low-rank matrix regression However, assumptions required by the regression analysis not satisfied Guarantees Projected Gradient Descent: Show -approximate recovery in iterations Assuming: incoherent uniformly sampled First near linear time algorithm for exact Matrix Completion with finite samples

[J., Netrapalli2015] Tale of two Lemmas Lemma 1: Perturbation bound with bounds Standard bounds only give: incoherent zero-mean with small variance Lemma 2: Davis-Kahn style result for matrix perturbation If and Empirical Results Matrix Regression

Hard Thresholding Trace-norm Number of data points (n) =100,||=5 log Summary High-dimensional problems Need to impose structure on Typical structures Sparsity

Low-rank Low-rank+Sparsity Non-convex sets but easy projection Proof of convergence (linear rate) Under suitable generative model assumptions Future Work Generalized theory for such provable non-convex optimization Performance analysis on different models Empirical comparisons on real-world datasets Questions?

Recently Viewed Presentations

  • Grammar Components to cover:  PUNCTUATIONS  CONTEXT CLUES WHAT

    Grammar Components to cover: PUNCTUATIONS CONTEXT CLUES WHAT

    WHAT IS PUNCTUATION? Punctuation is "the use of spacing, conventional signs, and certain typographical devices as aids to the understanding and correct reading, both silently and aloud, of handwritten and printed texts.
  • Brookhaven Town (Suffolk County, Long Island, N.Y.) @ 350 Years

    Brookhaven Town (Suffolk County, Long Island, N.Y.) @ 350 Years

    It extends from the villages of Stony Brook to Wading River in the north, Lake Ronkonkoma to Calverton in the center, and, with an eastward jog in the map, from Blue Point to Eastport, in the south. ... Harbor Hill...
  • Result Clauses - University of Florida

    Result Clauses - University of Florida

    Result Clauses A result clause is another use of the subjunctive mood in a dependent clause. The Result Clause will always appear subordinate to the main/independent clause in a Latin sentence. Purpose = reason for the action Result = outcome...
  • P-300 3 / 230th Military Police Company SER.#

    P-300 3 / 230th Military Police Company SER.#

    NG2VP5 1. Plugger SER.# 2. SINCGARS Radios SER.#'s 050601A/051026A 3. ... Arial Times New Roman Calibri Leaders Book Office Theme 1_Office Theme Bitmap Image PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint ...
  • Population Ecology: Growth & Regulation Photo of introduced

    Population Ecology: Growth & Regulation Photo of introduced

    Dominant Eigenvalue of . L = Dominant Eigenvector of . L = stable stage distribution. Stage-specific . survival & fecundity. Please do not use the images in these PowerPoint slides without permission. Wikpedia "Leslie matrix" page; accessed 17-IX-2014. Note that...
  • Metals in Medicine Consortium Sydney Cancer Centre Sydney

    Metals in Medicine Consortium Sydney Cancer Centre Sydney

    Metals in Medicine Consortium Sydney Cancer Centre Sydney Cancer Centre Clinical Oncology Departments of: Royal Prince Alfred Hospital Medical Oncology Radiation Oncology Surgical Units (urology, melanoma, breast, GI, H&N, cardiothoracic, etc) Palliative Care Haematology Concord Hospital Dubbo Hospital SCC Clinical...
  • The Good Father - Braggs Church of Christ

    The Good Father - Braggs Church of Christ

    God our Father promises to be present with His family - 1 Cor. 8:6 [6] But to us there is but one God, the Father, of whom are all things, and we in him; and one Lord Jesus Christ, by...
  • Birth and early years of the SLC

    Birth and early years of the SLC

    Removed "shower drain" that was installed wrong so beam had to go through 1 mm hole. 3/2/18. ... Shielded DR bellows with sleeves to raise the longitudinal instability threshold. Multiple efforts to eliminate the "sawtooth" bunch length instability.