Tutorial for Convex Optimization formulation for neural networksWe provide a short tutorial toward how to understand neural networks via the lens of convex optimization. The most important theoretical contribution of this line of work is that we formulate a methodlody to understand the NP-hardness of finding the global optima of the regularized training problem of two-layer ReLU fully-connected neural networks. The starting point is [1], which shows that regularized training problems of two-layer ReLU networks can be reformulated as convex programs. The convex geometry is carefully studied in [3]. We further extend the analysis for multi-layer NNs [2,4,5,7]. Based on the convex optimization formulations, [6] further gives the exact characterizations of all global optima in two-layer ReLU networks. More precisely, all globally optimal solutions of the nonconvex training problem are given by the solution set of a simple convex program up to permutation and splitting. In other words, we can find the set of optimal NNs for the regularized training problem by solving a convex optimization problem. The convex optimization formulations of NNs were also extended to NNs with batch normalization layers [8], convolutional NNs (CNNs) [2,4], polynomial activation networks [9], transformers with self-attention layers [10], Generative Adversarial Networks (GANs) [11], implicit regularization [13] and Wasserstein gradient flow (WGF) [14]. We emphasize that the zero duality gap and the exact convex formulations of NNs are important as follows.
Supplementary materials
Reference[1] M. Pilanci, T. Ergen. Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks. ICML 2020. (paper) [2] T. Ergen, M. Pilanci. Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time. ICLR 2021 Spotlight. (paper) [3] T. Ergen, M. Pilanci. Revealing the Structure of Deep Neural Networks via Convex Duality. ICML 2021. (paper) [4] T. Ergen, M. Pilanci. Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs. ICML 2021. (paper) [5] Y. Wang, T. Ergen, M. Pilanci. Parallel Deep Neural Networks Have Zero Duality Gap. (paper) [6] Y. Wang, J. Lacotte, M. Pilanci. The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions. ICLR 2022 Oral. (paper) [7] T. Ergen, M. Pilanci. Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks. (paper) [8] T. Ergen, A. Sahiner, B. Ozturkler, J. Pauly, M. Mardani, M. Pilanci. Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization. ICLR 2022. (paper) [9] B. Bartan and M. Pilanci. Neural spectrahedra and semidefinite lifts: Global convex optimization of polynomial activation neural networks in fully polynomial-time. (paper) [10] A. Sahiner, T. Ergen, B. Ozturkler, J. Pauly, M. Mardani, and M. Pilanci. Unraveling attention via convex duality: Analysis and interpretations of vision transformers. (paper) [11] A. Sahiner, T. Ergen, B. Ozturkler, B. Bartan, J. Pauly, M. Mardani, and M. Pilanci. Hidden convexity of wasserstein gans: Interpretable generative models with closed-form solutions. ICLR 2021. (paper) [12] A. Mishkin, A. Sahiner, M. Pilanci. Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions. ICML 2022. (paper) [13] Y. Wang, M. Pilanci, The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program, International Conference on Learning Representations (ICLR) 2022 Poster, (paper), (poster), (slide) [14] Y. Wang, P. Chen, M. Pilanci, W. Li, Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization. (paper) |