Non-negative Tensor Factorization with missing data for the modeling of gene expressions in the Human Brain

Søren Føns Vind Nielsen, Morten Mørup

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

525 Downloads (Pure)

Abstract

Non-negative Tensor Factorization (NTF) has become a prominent tool for analyzing high dimensional multi-way structured data. In this paper we set out to analyze gene expression across brain regions in multiple subjects based on data from the Allen Human Brain Atlas [1] with more than 40 % data missing in our problem. Our analysis is based on the non-negativity constrained Canonical Polyadic (CP) decomposition where we handle the missing data using marginalization considering three prominent alternating least squares procedures; multiplicative updates, column-wise, and row-wise updating of the component matrices. We examine three gene expression prediction scenarios based on data missing at random, whole genes missing and whole areas missing within a subject. We find that the column-wise updating approach also known as HALS performs the most efficient when fitting the model. We further observe that the non-negativity constrained CP model is able to predict gene expressions better than predicting by the subject average when data is missing at random. When whole genes and whole areas are missing it is in general better to predict by subject averages. However, we find that when whole genes are missing from all subjects the model based predictions are useful. When analyzing the structure of the components derived for one of the best predicting model orders the components identified in general constitute localized regions of the brain. Non-negative tensor factorization based on marginalization thus forms a promising framework for imputing missing values and characterizing gene expression in the human brain. However, care also has to be taken in particular when predicting the genetic expression levels at a whole region of the brain missing as our analysis indicates that this requires a substantial amount of subjects with data for this region in order for the model predictions to be reliable.
Original languageEnglish
Title of host publicationProceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2014 )
EditorsMamadou Mboup, Tü lay Adali , Éric Moreau, Jan Larsen
Number of pages6
PublisherIEEE
Publication date2014
ISBN (Print)978-1-4799-3694-6
DOIs
Publication statusPublished - 2014
Event24th IEEE International Workshop on Machine Learning for Signal Processing - Reims Centre des Congrès, Reims, France
Duration: 21 Sep 201424 Sep 2014
Conference number: 24
http://mlsp2014.conwiz.dk/home.htm
http://mlsp2014.conwiz.dk/home.htm

Conference

Conference24th IEEE International Workshop on Machine Learning for Signal Processing
Number24
LocationReims Centre des Congrès
CountryFrance
CityReims
Period21/09/201424/09/2014
Internet address

Keywords

  • Bioengineering
  • Communication, Networking and Broadcast Technologies
  • Computing and Processing
  • Engineering Profession
  • Signal Processing and Analysis
  • Abstracts
  • Cande-Comp/PARAFAC
  • CP
  • Genetics
  • Loading
  • Marginalization
  • Missing Values
  • Noise
  • Non-negative Matrix Factorization
  • Non-negative Tensor Factorization
  • Tensile stress
  • Training
  • Vectors

Fingerprint

Dive into the research topics of 'Non-negative Tensor Factorization with missing data for the modeling of gene expressions in the Human Brain'. Together they form a unique fingerprint.

Cite this