Towards Incentive Compatible Collaborative Machine Learning

Research output: Book/ReportPh.D. thesis

9 Downloads (Orbit)

Abstract

Just as coal powered the industrial economy, data now fuels the digital one, with machine learning emerging as its engine. In industry, firms of all shapes and sizes are discovering how predictive capabilities can sharpen their core competencies and enrich value propositions. These benefits, however, depend on access to quality data, specific to their business needs. For many firms, especially those within the early stages of machine learning adoption, acquiring such data can be difficult, making data access a key bottleneck in practice. Collaborative machine learning (or federated learning) offers a promising remedy by allowing multiple parties to train models jointly by sharing data. Yet, widespread adoption remains limited, as the absence of well-structured market mechanisms means self-interested parties often lack sufficient incentives to collaborate.
The challenge in designing such mechanisms lies in valuing data, as standard machine learning metrics say little about what a model’s performance is actually worth in practice. The value of a dataset depends not just on predictive accuracy, but on how it shapes decisions and the consequences that follow. Take data from smartwatches, e.g., heart rate, movement, and sleep patterns. In tech, it may help personalize coaching and boost subscription revenue. In insurance, it could refine risk models, sometimes reducing claims via wellness incentives, other times raising premiums for high-risk individuals. In health, it might trigger earlier interventions that save lives. The data is the same; its value depends entirely on context.
To address this, recent works propose analytics markets to buy and sell data for use in machine learning applications, where gains in predictive performance, rather than the data itself, are the commodity. Buyers submit prediction tasks to a central platform, along with their value for improved accuracy; sellers propose their own data as features. The platform matches these features to tasks, determines what information to share, and sets prices. The market revenue is then allocated amongst sellers according to their contribution to the improved accuracy. Crucially, buyers receive only refined predictions, not data, enabling privacy by design, akin to standard collaborative machine learning. In this thesis, we tackle key challenges that hinder the practicability of these markets: (i) datasets contain a limited number of observations; (ii) they can be replicated freely; and (iii) buyers are unlikely to truthfully report their valuations for improved accuracy, especially in competitive environments where data is bought from competitors in downstream markets.
To this end, we first establish a way to allocate market revenue amongst features and reward each seller fairly. We do so by treating features as players in a coalitional game and applying solution concepts that satisfy fairness criteria. We analyze the resultant market properties, propose methods to mitigate the financial risks sellers face due to sampling uncertainty, and use tools from causal inference to address the replicability of features. We then shift focus to eliciting truthful valuations from buyers. We frame the seller’s choice of what information to offer, and at what price, as a product versioning problem. We model the competitive environment between buyers and sellers as a Bayesian game of incomplete information with externalities, capturing the cost to the seller of improving a competitor’s forecast. We cast this problem as one of joint information and mechanism design and characterize pricing and allocation rules that maximize the sellers’ profits.
Original languageEnglish
Place of PublicationKgs. Lyngby, Denmark
PublisherDTU Wind and Energy Systems
Number of pages172
DOIs
Publication statusPublished - 2025

Fingerprint

Dive into the research topics of 'Towards Incentive Compatible Collaborative Machine Learning'. Together they form a unique fingerprint.
  • AI for Electricity Market Design

    Falconer, T. (PhD Student), Kazempour, J. (Main Supervisor), Pinson, P. (Supervisor), Cummings, R. (Examiner) & Dahleh, M. (Examiner)

    15/05/202205/10/2025

    Project: PhD

Cite this