Compressing CNN Kernels for Videos Using Tucker Decompositions: Towards Lightweight CNN Applications

Tobias Engelhardt Rasmussen, Line KH Clemmensen, Andreas Baum

Research output: Contribution to journalJournal articleResearchpeer-review

67 Downloads (Pure)

Abstract

Convolutional Neural Networks (CNN) are the state-of-the-art in the field of visual computing. However, a major problem with CNNs is the large number of floating point operations (FLOPs) required to perform convolutions for large inputs. When considering the application of CNNs to video data, convolutional filters become even more complex due to the extra temporal dimension. This leads to problems when respective applications are to be deployed on mobile devices, such as smart phones, tablets, micro-controllers or similar, indicating less computational power.
Kim et al. proposed using a Tucker-decomposition to compress the convolutional kernel of a pre-trained network for images in order to reduce the complexity of the network, i.e. the number of FLOPs. In this paper, we generalize the aforementioned method for application to videos (and other 3D signals) and evaluate the proposed method on a modified version of the THETIS data set, which contains videos of individuals performing tennis shots. We show that the compressed network reaches comparable accuracy, while indicating a memory compression by a factor of 51. However, the actual computational speed-up (factor 1.4) does not meet our theoretically derived expectation (factor 6).

Original languageEnglish
JournalProceedings of the Northern Lights Deep Learning Workshop
Volume3
Number of pages7
ISSN2703-6928
DOIs
Publication statusPublished - 2022
EventNorthern Lights Deep Learning Workshop 2022 - Tromsø, Norway
Duration: 10 Jan 202212 Jan 2022

Conference

ConferenceNorthern Lights Deep Learning Workshop 2022
Country/TerritoryNorway
CityTromsø
Period10/01/202212/01/2022

Keywords

  • CNN
  • Tucker Decomposition
  • Video Classification
  • Early Fusion

Fingerprint

Dive into the research topics of 'Compressing CNN Kernels for Videos Using Tucker Decompositions: Towards Lightweight CNN Applications'. Together they form a unique fingerprint.

Cite this