[AutoMLConf'22]: GSparsity: Unifying Network Pruning and Neural Architecture Search by Group Teaser

Chatzimichailidis, Avraam; Zela, Arber; Keuper, Janis; Yang, Yang

2022

Video

Abstract

In this paper, we propose a uni ed approach for network pruning and one-shot neural ar chitecture search (NAS) via group sparsity. We first show that group sparsity via the recent Proximal Stochastic Gradient Descent (ProxSGD) algorithm achieves new state-of-the-art re sults for lter pruning. Then, we extend this approach to operation pruning, directly yielding a gradient-based NAS method based on group sparsity. Compared to existing gradient-based algorithms such as DARTS, the advantages of this new group sparsity approach are threefold. Firstly, instead of a costly bilevel optimization problem, we formulate the NAS problem as a single-level optimization problem, which can be optimally and efficiently solved using ProxSGD with convergence guarantees. Secondly, due to the operation-level sparsity, discretizing the network architecture by pruning less important operations can be safely done without any performance degradation. Thirdly, the proposed approach nds architectures that are both stable and performant on a variety of search spaces and datasets.