Big data analytics using semi-supervised learning methods

Flavia Dalia Frumosu*, Murat Kulahci

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

89 Downloads (Pure)

Abstract

The expanding availability of complex data structures requires development of new analysis methods for process understanding and monitoring. In manufacturing, this is primarily due to high‐frequency and high‐dimensional data available through automated data collection schemes and sensors. However, particularly for fast production rate situations, data on the quality characteristics of the process output tend to be scarcer than the available process data. There has been a considerable effort in incorporating latent structure–based methods in the context of complex data. The research question addressed in this paper is to make use of latent structure–based methods in the pursuit of better predictions using all available data including the process data for which there are no corresponding output measurements, ie, unlabeled data. Inspiration for the research question comes from an industrial setting where there is a need for prediction with extremely low tolerances. A semi‐supervised principal component regression method is compared against benchmark latent structure–based methods, principal components regression, and partial least squares, on simulated and experimental data. In the analysis, we show the circumstances in which it becomes more advantageous to use the semi‐supervised principal component regression over these competing methods.

Original languageEnglish
JournalQuality and Reliability Engineering International
Volume34
Issue number7
Pages (from-to)1413-1423
Number of pages11
ISSN0748-8017
DOIs
Publication statusPublished - 2018

Keywords

  • Dimension reduction
  • Latent structure methods
  • Multivariate data
  • Production statistics

Cite this