Learning to Taste: A Multimodal Wine Dataset

Thoranna Bender, Simon Moe Søresen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

71 Downloads (Pure)

Abstract

We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.
Original languageEnglish
Title of host publicationProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Number of pages40
Volume36
PublisherNeural Information Processing Systems Foundation
Publication statusAccepted/In press - 2024
Event37th Conference on Neural Information Processing Systems - New Orleans Ernest N. Morial Convention Center, New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Conference

Conference37th Conference on Neural Information Processing Systems
LocationNew Orleans Ernest N. Morial Convention Center
Country/TerritoryUnited States
CityNew Orleans
Period10/12/202316/12/2023

Fingerprint

Dive into the research topics of 'Learning to Taste: A Multimodal Wine Dataset'. Together they form a unique fingerprint.

Cite this