高效深度学习:一个缩小、加速和优化深度学习模型的综述

近年来,深度学习在计算机视觉、自然语言处理、语音识别等多个领域取得了突破性进展。然而,随着模型规模的不断扩大,深度学习模型的参数量、推理延迟和训练资源消耗也呈指数级增长。如何在保证模型性能的同时提高模型效率,已成为学术界和工业界共同关注的重要问题。

本文全面综述了高效深度学习领域的研究进展,主要涵盖以下几个方面:

1. 网络压缩

网络压缩旨在减少深度神经网络的参数量和计算量,主要包括以下技术:

剪枝(Pruning):去除网络中不重要的连接或神经元。代表工作有:
- Han等人提出的Deep Compression[1]
- Li等人提出的基于过滤器重要性的剪枝方法[2]
- He等人提出的通道剪枝方法[3]
量化(Quantization):使用低比特表示网络参数和激活值。代表工作有:
- BinaryConnect[4]和BinaryNet[5]等二值化网络
- Zhou等人提出的DoReFa-Net[6]
低秩分解:利用矩阵分解降低模型复杂度。代表工作有:
- Jaderberg等人提出的卷积滤波器分解方法[7]
- Zhang等人提出的低秩正则化方法[8]

2. 知识蒸馏

知识蒸馏通过将大型教师模型的知识转移到小型学生模型中,实现模型压缩。代表工作有:

Hinton等人提出的知识蒸馏框架[9]
Romero等人提出的FitNets[10]
Zagoruyko等人提出的注意力转移[11]

3. 高效网络架构设计

通过精心设计网络结构,提高模型效率。代表工作有:

MobileNet系列[12][13]
ShuffleNet系列[14][15]
EfficientNet[16]

4. 神经架构搜索(NAS)

利用自动化方法搜索高效的网络结构。代表工作有:

Zoph等人提出的NASNet[17]
Liu等人提出的DARTS[18]
Tan等人提出的MnasNet[19]

5. 动态推理

根据输入动态调整网络计算量。代表工作有:

Huang等人提出的多尺度密集网络[20]
Wang等人提出的SkipNet[21]
Yang等人提出的条件计算网络[22]

此外,本文还讨论了模型量化、联邦学习、模型编译等相关技术,以及高效深度学习在移动端和嵌入式设备上的应用。

高效深度学习是一个快速发展的研究领域,涉及机器学习、计算机体系结构、编译原理等多个学科。未来的研究方向包括:设计更高效的网络结构、开发自动化的模型压缩技术、探索新型的硬件架构等。我们期待这一领域能够推动深度学习技术在更广泛的场景中落地应用,为人工智能的发展做出重要贡献。

Efficient Deep Learning Overview

高效深度学习概览图

参考文献:

[1] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv preprint arXiv:1510.00149, 2015.

[2] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint arXiv:1608.08710, 2016.

[3] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks[C]//Proceedings of the IEEE international conference on computer vision. 2017: 1389-1397.

[4] Courbariaux M, Bengio Y, David J P. Binaryconnect: Training deep neural networks with binary weights during propagations[J]. Advances in neural information processing systems, 2015, 28.

[5] Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks[J]. Advances in neural information processing systems, 2016, 29.

[6] Zhou A, Yao A, Guo Y, et al. Incremental network quantization: Towards lossless CNNs with low-precision weights[J]. arXiv preprint arXiv:1702.03044, 2017.

[7] Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions[J]. arXiv preprint arXiv:1405.3866, 2014.

[8] Zhang X, Zou J, He K, et al. Accelerating very deep convolutional networks for classification and detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 1943-1955.

[9] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.

[10] Romero A, Ballas N, Kahou S E, et al. Fitnets: Hints for thin deep nets[J]. arXiv preprint arXiv:1412.6550, 2014.

[11] Zagoruyko S, Komodakis N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer[J]. arXiv preprint arXiv:1612.03928, 2016.

[12] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

[13] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[14] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.

[15] Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131.

[16] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. PMLR, 2019: 6105-6114.

[17] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8697-8710.

[18] Liu H, Simonyan K, Yang Y. Darts: Differentiable architecture search[J]. arXiv preprint arXiv:1806.09055, 2018.

[19] Tan M, Chen B, Pang R, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2820-2828.

[20] Huang G, Chen D, Li T, et al. Multi-scale dense networks for resource efficient image classification[J]. arXiv preprint arXiv:1703.09844, 2017.

[21] Wang X, Yu F, Dou Z Y, et al. Skipnet: Learning dynamic routing in convolutional networks[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 409-424.

[22] Yang Z, Wang Y, Chen X, et al. Convolutional neural networks with conditional computation[J]. arXiv preprint arXiv:1904.12282, 2019.