Abstract
Convolutional Neural Networks (CNN) are the state-of-the-art in the field of visual computing. However, a major problem with CNNs is the large number of floating point operations (FLOPs) required to perform convolutions for large inputs. When considering the application of CNNs to video data, convolutional filters become even more complex due to the extra temporal dimension. This leads to problems when respective applications are to be deployed on mobile devices, such as smart phones, tablets, micro-controllers or similar, indicating less computational power.
Kim et al. proposed using a Tucker-decomposition to compress the convolutional kernel of a pre-trained network for images in order to reduce the complexity of the network, i.e. the number of FLOPs. In this paper, we generalize the aforementioned method for application to videos (and other 3D signals) and evaluate the proposed method on a modified version of the THETIS data set, which contains videos of individuals performing tennis shots. We show that the compressed network reaches comparable accuracy, while indicating a memory compression by a factor of 51. However, the actual computational speed-up (factor 1.4) does not meet our theoretically derived expectation (factor 6).
Kim et al. proposed using a Tucker-decomposition to compress the convolutional kernel of a pre-trained network for images in order to reduce the complexity of the network, i.e. the number of FLOPs. In this paper, we generalize the aforementioned method for application to videos (and other 3D signals) and evaluate the proposed method on a modified version of the THETIS data set, which contains videos of individuals performing tennis shots. We show that the compressed network reaches comparable accuracy, while indicating a memory compression by a factor of 51. However, the actual computational speed-up (factor 1.4) does not meet our theoretically derived expectation (factor 6).
Original language | English |
---|---|
Journal | Proceedings of the Northern Lights Deep Learning Workshop |
Volume | 3 |
Number of pages | 7 |
ISSN | 2703-6928 |
DOIs | |
Publication status | Published - 2022 |
Event | Northern Lights Deep Learning Workshop 2022 - Tromsø, Norway Duration: 10 Jan 2022 → 12 Jan 2022 |
Conference
Conference | Northern Lights Deep Learning Workshop 2022 |
---|---|
Country/Territory | Norway |
City | Tromsø |
Period | 10/01/2022 → 12/01/2022 |
Keywords
- CNN
- Tucker Decomposition
- Video Classification
- Early Fusion