Transparency and Insights into Nonlinear Factor Effects Through Market Regimes
- The recent complex market environment has challenged systematic equity investors with protracted drawdowns driven by unexpected nonlinear factor effects and interactions.
- We use a “surrogate model” to bring transparency to how the machine-learning factor in MSCI Equity Factor Models can help portfolio managers understand and exploit nonlinear relationships between factors and stock returns.
- A SHAP analysis provided the “glass box” that can help us quantify the drivers of the machine-learning factor in different market regimes and gauge areas of risk for active portfolio managers.
Traditional styles and factors have seen strong returns over the past two years, even as global equity markets have been shaped by extreme concentration and volatile forecasts of AI-driven disruption. Yet the same period also produced unusual combinations of factors outperforming and nonlinear factor interactions, forces that drove the persistent losses suffered by quantitative equity strategies in the summer of 2025. The quant wobble was a concrete example that complex market dynamics cannot always be neatly described by linear models. Systematic equity investors can instead use machine-learning models to exploit nonlinear effects, an approach represented by the machine-learning (ML) factor in MSCI Equity Factor Models.
The ML factor is a strong predictor of future stock returns across many equity markets, but its calculation is not simple. The approach uses neural network models to predict future (standardized) stock-specific returns based on trailing specific returns and exposures to the 22 style factors of the Barra Global Total Market Equity Trading Model (GEMTR). The model is directly trained to capture effects missed by linear models and combines 18 sub-models with lookback windows ranging from two to six years. The same complexity that makes the model robust can obscure how it works in practice and what market effects it captures.
We developed a simpler model with far fewer input variables to capture the core of the ML factor, choosing an approach that allows us to easily decompose how those drivers influenced predictions. This surrogate model uses a random forest (RF), an algorithm that averages predictions from hundreds of decision trees fitted to subsets of the data. RFs deliver strong predictive performance with limited tuning, robust low-variance forecasts and the ability to capture nonlinear effects.
We picked four input variables: momentum, residual volatility, liquidity and trailing one-month factor-model-specific return, a choice guided by the original “feature importance” analysis for the ML factor. We trained the model to predict future one-month standardized specific return.1 Despite its simplicity, the RF factor delivers risk-adjusted returns comparable to those of the ML factor over the full period. As a stand-alone signal, the annualized decile spread, measured by GEMTR residual returns, was around 6.2% between February 2020 and January 2026, confirming that it captured nonlinear relationships that evaded linear models. When included alongside GEMTR model factors in a monthly multivariate regression, the RF signal’s information ratio (IR) is close to the ML factor’s over the whole period, though with some slippage over the last five years.2
Left: annualized decile returns calculated with specific returns from the GEMTR model. Decile portfolios are square-root cap-weighted. Right: cumulative decile spread between February 2000 and January 2026.
Left: cumulative return to the ML and RF factors when added (separately) to the GEMTR model. Right: a scatter of monthly returns to the ML and RF factors between February 2000 and January 2026.
For linear regression models, each variable’s contribution to a forecast is simply its exact beta multiplied by its current value. For nonlinear models, the SHAP3 framework serves the same purpose: It measures the time-varying influence of each predictor on a given prediction, relative to a baseline.
The SHAP value of input variables for a stock as of Dec. 31, 2025.
The above chart shows a SHAP analysis for a single stock on Dec. 31, 2025. The stock’s high liquidity exposure (2.8) and residual volatility (2.0) contribute negatively to the model’s prediction, while its momentum exposure (-0.8) contributes positively.
This analysis can be aggregated across the whole universe to show which input variables have the most impact on the model prediction. We measured the importance of an input variable at a point in time by computing its average absolute SHAP values across all stocks. High importance means the variable has had high impact on model predictions. To assess the relative importance of the input variables over time, we normalize this metric to 100%. Prior to the early 2010s, momentum and trailing one-month specific return are the two dominant variables; but recently, liquidity and residual volatility have become more influential. This could be linked to the technology momentum trend driving markets so that momentum effects are less important within nonlinear factor returns.
Relative feature importance of the input variables for the RF model.
What are the nonlinear and dynamic effects the RF factor captures to generate the strong risk-adjusted return in our simulation? We can answer this question by looking at the scatter of SHAP values against factor exposures for each input variable. For example, the RF model on Dec. 31, 2000, captured the following: the outperformance of stocks with very high or low momentum, reversal of specific returns and underperformance of stocks with very high residual volatility or liquidity. Through the monthly model-update process, the RF model as of Dec. 31, 2025, had evolved to produce negative forecasts for stocks with high trailing-specific returns, residual volatility or liquidity.
Top: Dependence scatter plots show the impact of an input variable, via the SHAP value, on the RF model prediction as of the selected date. Bottom: average decile returns calculated using monthly standardized specific returns for the five years prior to the selected date. Decile portfolios are equal-weighted.
Complex factor interactions have become increasingly important for portfolio managers and the MSCI ML factor can be an effective tool to exploit such effects. We have shown how the ML factor’s characteristics can be understood using experiments with a simple surrogate model whose speed of estimation allows its forecast sensitivities to be analyzed with SHAP values. Together, the surrogate model and SHAP-value analysis create a "glass box" into complex nonlinear return models, giving active equity managers the transparency they need to gauge and manage the risks and opportunities these dynamics present.
Subscribe todayto have insights delivered to your inbox.
Machine Learning Factors: Capturing Non Linearities in Linear Factor Models
It is not etched in stone that relationships between factor exposures and returns must be linear. We found machine-learning algorithms could identify nonlinear relationships and be used to construct a factor showing significant explanatory power.
Unraveling Summer 2025’s Quant Fund Wobble
Using MSCI equity factor models, we expose how unusual factor correlations and crowded trades drove this summer’s losses in long-short quantitative hedge funds.
1 Cross-sectional standardization means stock returns from different months are comparable. We constructed 500 trees, with up to eight layers, using five years of monthly data for the GEMTR estimation universe. U.S. stocks comprised roughly 20% by count.
2 The IR over the full period is 2.3 for the global ML factor versus 2.0 for the RF factor. Over the last five years, the ML factor has delivered an IR of 1.4 against 0.9 (albeit at higher volatility).
3 The SHAP (SHapley Additive exPlanations) framework was first described in Scott M. Lundberg and Su-In Lee, “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems, December 2017. It is based on the idea of Shapley values from economics. Expository accounts include Cheryll-Ann Wilson, “Explainable AI in Finance: Addressing the Needs of Diverse Stakeholders,” CFA Institute, August 2025. SHAP allocates contributions to forecasts across variables in a way that respects additivity and fairness.
The content of this page is for informational purposes only and is intended for institutional professionals with the analytical resources and tools necessary to interpret any performance information. Nothing herein is intended to recommend any product, tool or service. For all references to laws, rules or regulations, please note that the information is provided “as is” and does not constitute legal advice or any binding interpretation. Any approach to comply with regulatory or policy initiatives should be discussed with your own legal counsel and/or the relevant competent authority, as needed.
