Abstract
Non-negative Tensor Factorization (NTF) has become a prominent tool for analyzing high dimensional multi-way structured data. In this paper we set out to analyze gene expression across brain regions in multiple subjects based on data from the Allen Human Brain Atlas [1] with more than 40 % data missing in our problem. Our analysis is based on the non-negativity constrained Canonical Polyadic (CP) decomposition where we handle the missing data using marginalization considering three prominent alternating least squares procedures; multiplicative updates, column-wise, and row-wise updating of the component matrices. We examine three gene expression prediction scenarios based on data missing at random, whole genes missing and whole areas missing within a subject. We find that the column-wise updating approach also known as HALS performs the most efficient when fitting the model. We further observe that the non-negativity constrained CP model is able to predict gene expressions better than predicting by the subject average when data is missing at random. When whole genes and whole areas are missing it is in general better to predict by subject averages. However, we find that when whole genes are missing from all subjects the model based predictions are useful. When analyzing the structure of the components derived for one of the best predicting model orders the components identified in general constitute localized regions of the brain. Non-negative tensor factorization based on marginalization thus forms a promising framework for imputing missing values and characterizing gene expression in the human brain. However, care also has to be taken in particular when predicting the genetic expression levels at a whole region of the brain missing as our analysis indicates that this requires a substantial amount of subjects with data for this region in order for the model predictions to be reliable.
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2014 ) |
Editors | Mamadou Mboup, Tü lay Adali , Éric Moreau, Jan Larsen |
Number of pages | 6 |
Publisher | IEEE |
Publication date | 2014 |
ISBN (Print) | 978-1-4799-3694-6 |
DOIs | |
Publication status | Published - 2014 |
Event | 2014 IEEE International Workshop on Machine Learning for Signal Processing - Reims Centre des Congrès, Reims, France Duration: 21 Sept 2014 → 24 Sept 2014 Conference number: 24 https://ieeexplore.ieee.org/xpl/conhome/6945945/proceeding |
Conference
Conference | 2014 IEEE International Workshop on Machine Learning for Signal Processing |
---|---|
Number | 24 |
Location | Reims Centre des Congrès |
Country/Territory | France |
City | Reims |
Period | 21/09/2014 → 24/09/2014 |
Internet address |
Keywords
- Bioengineering
- Communication, Networking and Broadcast Technologies
- Computing and Processing
- Engineering Profession
- Signal Processing and Analysis
- Abstracts
- Cande-Comp/PARAFAC
- CP
- Genetics
- Loading
- Marginalization
- Missing Values
- Noise
- Non-negative Matrix Factorization
- Non-negative Tensor Factorization
- Tensile stress
- Training
- Vectors