Abstract
Cognitive component analysis (COCA) is defined as unsupervised grouping of data leading to a group structure well aligned with that resulting from human cognitive activity. We focus here on speech at different time scales looking for possible hidden ‘cognitive structure’. Statistical regularities have earlier been revealed at multiple time scales corresponding to: phoneme, gender, height and speaker identity. We here show that the same simple unsupervised learning algorithm can detect these cues. Our basic features are 25-dimensional short time
Mel-frequency weighted cepstral coefficients, assumed to
model the basic representation of the human auditory system.
The basic features are aggregated in time to obtain features at longer time scales. Simple energy based filtering is used to achieve a sparse representation. Our hypothesis is now basically ecological: We hypothesize that features that are essentially independent in a reasonable ensemble can be efficiently coded using a sparse independent component representation. The representations are indeed shown to be very similar between supervised learning (invoking cognitive activity) and unsupervised learning (statistical regularities), hence lending additional support to our cognitive component hypothesis.
Original language | English |
---|---|
Title of host publication | Twenty-Ninth Meeting of the Cognitive Science Society (CogSci'07) |
Publication date | 2007 |
Pages | 983-988 |
ISBN (Print) | 978-0-9768318-3-9 |
Publication status | Published - 2007 |
Event | 29th Annual Conference of the Cognitive Science Society - Nashville, United States Duration: 1 Aug 2007 → 4 Aug 2007 Conference number: 29 |
Conference
Conference | 29th Annual Conference of the Cognitive Science Society |
---|---|
Number | 29 |
Country/Territory | United States |
City | Nashville |
Period | 01/08/2007 → 04/08/2007 |
Keywords
- Time Scales
- Unsupervised Learning
- Energy Based Sparsification
- Cognitive component analysis
- Statistical Regularity
- Supervised Learning