Introduction
Feature attribution under correlation explains how much each input feature contributes to a model's prediction when features are not independent. In quant trading and alpha modeling you often face highly correlated signals, such as momentum and industry momentum, or valuation metrics that move together. If you rely on naive attribution you will misassign credit and can make poor portfolio decisions.
This article shows you how Shapley values solve the fairness problem in attribution, what breaks when features are correlated, and practical ways to compute reliable Shapley contributions for alpha signals. You will learn conditional and interventional approaches, grouping and orthogonalization techniques, and concrete steps to avoid leakage and double counting.
What will you get from this guide? A step by step workflow, tradeable examples using $AAPL and $NVDA style scenarios, and a checklist you can run before you publish attributions.
- Shapley fairness works but requires careful handling of correlated features, because permutations that ignore feature dependence produce misleading contributions.
- Use conditional expectations or generative sampling to respect feature dependence, or use group/Owen-Shapley values to allocate credit among strongly correlated blocks.
- Orthogonalization gives an alternative by sequentially removing explained variance, but be explicit about attribution order and interpretation.
- Watch for leakage and model artifacts, they can cause large, spurious Shapley allocations especially in lookahead features used in alpha research.
- Practical toolbox: KernelSHAP with conditional sampling, TreeSHAP for tree models, KNN or copula-based sampling, and LMG/Owen values for grouped features.
Why Shapley Values Matter for Alpha Attribution
Shapley values come from cooperative game theory and give a unique, fair allocation of a model output to features by averaging marginal contributions across all feature orderings. That fairness property is attractive if you want to say how much each signal contributed to a stock-level predicted return.
But financial features are rarely independent. Momentum correlates with volatility and sector tilt. Valuation ratios are cross-sectionally correlated. When you ignore that, you attribute credit for information that another feature already carries. So the method you use to compute Shapley values must handle dependence, or else you will mislead portfolio managers and risk teams.
Core Concepts and Definitions
Shapley value, marginal contribution, and baseline
At a high level, the Shapley value for feature i is the average marginal contribution of adding feature i to every subset of other features. The baseline is the expected model output when no features are known. In practice you approximate these expectations with samples or model-specific fast algorithms.
Interventional versus conditional
Interventional Shapley treats a missing feature by breaking its dependence with other features and inserting values sampled independently from the marginal. Conditional Shapley conditions on the values of other features, preserving dependence. Which do you choose? If your goal is descriptive attribution consistent with how your model actually sees feature relationships, you generally want conditional sampling.
Computing Shapley Values When Features Are Correlated
There are several practical approaches to compute Shapley values under correlation. Each has tradeoffs in bias, variance, and computational cost. You should pick the approach that aligns with your interpretability goal.
1. Conditional sampling (preferred for descriptive explanations)
- Estimate the conditional distribution P(X_i | X_S) for the feature being inserted, where S is the subset of conditioned features.
- Sample from that conditional and compute model predictions to get the conditional expectation of the model output.
- Average marginal contributions across permutations, using Monte Carlo draws where necessary.
For continuous features a multivariate Gaussian may be a starting point, but financial data often have heavy tails and nonlinearity. Use nonparametric conditional models such as conditional density estimation with gradient boosting, conditional trees, K-nearest neighbors, or copula models to better capture dependence.
2. Interventional sampling (use with caution)
Interventional Shapley samples inserted features from their marginal distribution, breaking correlations. This gives a measure of the isolated effect of a feature, which may be interesting if you want a counterfactual where signals are independent. However, it often assigns credit to a feature that in practice never appears without its correlated partners.
3. Grouping and Owen values (practical for blocks of correlated features)
If you have natural blocks, such as sector indicators or multiple momentum variants, group features and compute Shapley at the group level. Use Owen values to allocate group contribution back to members based on within-group attribution. This reduces permutation noise and reflects structural dependence.
4. Orthogonalization and sequential regression
Orthogonalize features by regressing them on earlier features and using residuals for attribution. This gives an ordered contribution interpretation similar to sequential R2 decomposition. Be explicit: the order matters, so this approach answers a different question than Shapley fairness.
Numerical Example: Three Correlated Alpha Signals
Consider three signals for a universe of mid-cap names: short-term momentum M1, sector momentum SM, and earnings surprise ES. Suppose M1 and SM correlate at 0.7, while ES is relatively independent. Your model predicts a 1-day return estimate f(X).
Naive interventional Shapley might compute marginal contributions by replacing one feature at a time with its marginal samples. If you do that, M1 will get most of the credit because breaking correlation removes SM's supportive information. Conditional Shapley, on the other hand, will sample SM conditional on M1 being at observed level, and you will see credit shared between M1 and SM, often roughly in proportion to their predictive power once dependence is respected.
Quantitative illustration with numbers: imagine expected model output baseline 0.0, predicted return 0.8 for the observed feature vector. Interventional marginal contributions give M1 = 0.6, SM = 0.15, ES = 0.05. Conditional sampling might yield M1 = 0.42, SM = 0.33, ES = 0.05. The shift reflects that SM carries redundant information with M1, and conditional sampling attributes shared credit appropriately.
Practical Implementation Steps
Follow this checklist when you compute Shapley attributions for alpha signals so you avoid common traps.
- Define the attribution goal. Are you explaining model behavior, or estimating an isolated causal effect? Your choice guides conditional or interventional sampling.
- Inspect correlations. Compute the feature correlation matrix and hierarchical clustering. Identify blocks with correlation above a threshold such as 0.6.
- Choose a sampling strategy. Use conditional sampling for descriptive explanations. For conditional density, start with KNN or conditional trees, escalate to copulas or generative models if dependence is complex.
- Estimate baselines carefully. Use in-sample or cross-validated baselines that match the model training distribution to avoid distributional shift bias.
- Monte Carlo settings. Use enough permutations and conditional draws. For 10 features, Monte Carlo with 1,000 permutations per instance is common for stable averages. If computational budget is limited, use grouping or TreeSHAP when applicable.
- Validate attributions. Run sanity checks: total Shapley sum should match predicted minus baseline within Monte Carlo error, and allocations should be stable across bootstrap resamples.
- Document assumptions. Record whether you used interventional or conditional, how you sampled, and any grouping or orthogonalization applied.
Tooling tips
If your model is tree-based, use TreeSHAP which handles feature interaction efficiently but be cautious: standard TreeSHAP implementations default to interventional sampling. Libraries exist to compute conditional TreeSHAP, or you can write a wrapper that supplies conditional samples.
For neural nets or generic models use KernelSHAP with conditional sampling. Many open source implementations let you plug a conditional sampler function for missing feature simulation.
Detecting and Avoiding Leakage and Multicollinearity Traps
Leakage arises when a feature contains information about the target that would not be available at prediction time. Shapley values will faithfully attribute credit to leaked features, so you need to identify and remove them before attribution.
Multicollinearity inflates variance of linear coefficients and confuses simple regression-based attributions. For correlated input features, Shapley gives a theoretically fair split but only if sampling respects dependence. If you ignore dependence you will over-attribute to one member of a correlated group.
Actionable checks:
- Run forward-chaining backtests to detect lookahead leakage.
- Compute VIFs to find multicollinearity, and then examine correlation clusters visually.
- Run permutation tests where you permute a feature within clusters only, to see how attribution shifts.
Common Mistakes to Avoid
- Using marginal replacement with correlated features: This breaks dependency and usually misattributes credit. Use conditional sampling when you want descriptive explanations.
- Assuming Shapley equals causality: Shapley apportions predictive contribution, not causal effect. If you need causal attribution, apply causal inference methods before Shapley.
- Ignoring baseline mismatch: Using a baseline from a different distribution biases allocations. Match baseline sampling to training distribution or use cross-validation.
- Over-interpreting small, noisy differences: Monte Carlo error and sampling noise can create apparent but meaningless rank changes. Use bootstrapping to quantify uncertainty.
- Forgetting to document the sampling method: Interventional and conditional provide different stories. Report which you used and why.
FAQ
Q: When should I use interventional vs conditional Shapley?
A: Use conditional Shapley when you want a descriptive explanation consistent with feature dependence in your data, which is most common for alpha explanation. Use interventional Shapley when you want to measure isolated capability of a feature under a hypothetical independence scenario. Be explicit about which story you are telling.
Q: How do I compute conditional distributions for financial features?
A: Start with nonparametric methods like K-nearest neighbors or conditional trees. If relationships are roughly linear use multivariate Gaussian or copula approaches. For complex joint distributions consider generative models such as conditional VAEs. Validate fit by comparing conditional marginals and joint moments to withheld data.
Q: Can I use TreeSHAP out of the box on correlated features?
A: You can, but default TreeSHAP uses interventional sampling which breaks correlations. Some implementations allow conditional sampling or you can pre-sample conditionally and feed those samples into KernelSHAP. If you stick with TreeSHAP, be aware of the interpretation and test sensitivity to sampling choice.
Q: How do I handle many correlated features without exploding compute cost?
A: Group correlated features using hierarchical clustering and compute group Shapley values, then use Owen values or within-group decomposition to split group credit. Alternatively apply dimensionality reduction such as PCA on correlated blocks and attribute to principal components, but document the transformation because interpretation changes.
Bottom Line
Shapley values give a principled way to allocate model predictions to features, but correlated alpha signals require careful handling. Conditional sampling, grouping, and orthogonalization are practical tools to produce attributions that reflect the real data generating process and avoid misleading credit assignment.
Start by deciding whether you need a descriptive or counterfactual story. Then inspect correlations, pick a conditional sampler that fits your data, and validate results with bootstraps and sanity checks. At the end of the day, transparent documentation of assumptions and methodology is as important as the numbers you report.
Next steps: run a pilot on a held-out set, compare interventional and conditional attributions, and present results to your PM and risk teams with uncertainty bands. That process will surface hidden leakage and give you robust, actionable feature attributions for your alpha signals.



