Hunchline
← Back to Digest
Machine LearningApr 9, 2026

Provably Adaptive Linear Approximation for the Shapley Value and Beyond

A new algorithm cuts the cost of explaining AI decisions, with formal guarantees on accuracy — useful for anyone building explainability tools at scale.

5.5
Scrape Score
5.5
Academic
1.7
Commercial
5.0
Cultural
HorizonMid (2-5y)
Evidencemedium
Was this useful?

The Thesis

The Shapley value is the most widely used tool for explaining why an AI model made a specific prediction — it assigns each input feature a share of the credit. Computing it exactly is prohibitively expensive, scaling exponentially with the number of features. This paper introduces Adalina, a randomized algorithm that approximates Shapley values (and a broader family called semi-values) using linear memory and a provably near-optimal number of model queries. The catch is that the gains are theoretical: the paper validates on benchmarks but does not yet demonstrate wall-clock speedups in production-scale systems. For teams building explainability pipelines on large tabular or language models, this could reduce compute costs without sacrificing accuracy guarantees.

Catalyst

Regulatory pressure — particularly the EU AI Act and financial sector model-risk rules — is forcing organizations to explain model outputs at scale, making Shapley approximation a production concern rather than a research curiosity. At the same time, models are growing larger, pushing feature counts into the thousands, which breaks older approximation methods that assumed smaller input spaces. The vector concentration inequality the authors build on is a mature mathematical tool, but its systematic application to Shapley approximation is new here.

What's New

Prior algorithms such as KernelSHAP, SHAP-IQ, and the OFA (one-for-all) estimator each approximated Shapley values using different sampling strategies, but lacked a unified theoretical framework that could bound their error simultaneously. Those approaches also generally required superlinear memory or missed opportunities to adapt sampling to the specific utility function being evaluated. This paper builds a single framework that subsumes all those methods, formally characterizes when paired sampling helps, and introduces the first algorithm that adapts its query budget per feature based on observed variance — achieving lower mean squared error with the same query count.

The Counter

The paper's guarantees are asymptotic and worst-case, which rarely translate directly into practical speedups on the messy, correlated feature spaces found in real deployments. The experiments validate that the theory holds, but the paper does not show that Adalina beats KernelSHAP or SHAP-IQ on wall-clock time in a production setting — the gap between query complexity and actual runtime is often where theoretical wins evaporate. The adaptive component requires estimating per-feature variance on the fly, which adds overhead that could offset the savings for utility functions that are cheap to evaluate. Finally, the Shapley approximation literature is already crowded with algorithms that are 'theoretically optimal' in some regime but rarely displace well-engineered incumbents in practice.

Longs

  • FICO — explainability is core to credit-scoring compliance products
  • IBM (IBM) — Watson OpenScale and AI Fairness 360 rely on Shapley-based attribution
  • Palantir (PLTR) — AIP explainability layer for regulated enterprise AI
  • AIQ (Global X AI & Technology ETF) — broad exposure to enterprise AI tooling

Shorts

  • Existing SHAP library maintainers risk commoditization if Adalina's adaptive sampling is adopted as the default, reducing differentiation of paid explainability wrappers
  • Vendors selling fixed-query explainability APIs (flat per-call pricing) lose margin if query counts drop significantly

Enablers (Picks & Shovels)

  • SHAP open-source library (slundberg/shap on GitHub) — the dominant implementation Adalina would plug into
  • scikit-learn ecosystem — standard benchmark environment used in the paper's experiments
  • arXiv math.PR literature on vector concentration inequalities — the theoretical foundation

Private Watchlist

  • Fiddler AI — model monitoring and explainability platform
  • Arthur AI — enterprise model observability with Shapley-based attribution
  • Aporia — ML observability startup focused on production explainability

Resources

The Paper

The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players $n$. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a $Θ(n)$ space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires $O(\frac{n}{ε^{2}}\log\frac{1}δ)$ utility queries to ensure $P(\|\hat{\boldsymbolφ}-\boldsymbolφ\|_{2}\geqε)\leq δ$ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the mean square error for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved mean square error. All of our theoretical findings are experimentally validated.

Synthesized 4/27/2026, 9:22:40 PM · claude-sonnet-4-6