- The COVID-19 pandemic disrupted the operating models of many businesses and forced a shift to remote working, digitization and low-contact transactions and services, which we term “remote-operation capability” (ROC).
- Using machine learning and natural language processing we built a ROC factor. Companies with high exposure to our ROC factor outperformed the MSCI USA IMI by around 15 percentage points YTD through June 30.
- Our hypothetical “combined” ROC portfolio, built from our three other ROC portfolios, had high exposure to the beta, growth and profitability factors; and low exposure to dividend yield, value and long-term reversal.
The challenges posed to corporations by COVID-19 showed that some companies were better positioned to take advantage of a remote, automated and digitized operating environment. We utilized techniques from machine learning (ML) and natural language processing (NLP) to build a potential “remote-operational capability” (ROC) factor that seeks to estimate the extent to which a company was more likely to thrive in this scenario.
Constructing the ROC Factor
Our first step was employing the “topic modeling” approach to make the theme concrete. This is a research technology used in MSCI thematic indexes1 and other products that leverage ML and NLP. We started with a set of “seed” words and phrases with strong, intuitive relations to the ROC theme (e.g., home working, remote work and telecommuting). We then used “word embedding” models2 to expand the seed-word list to a larger “dictionary” of about 50 keywords and phrases. The following is the word cloud version of the keyword dictionary for ROC.
Next, this dictionary becomes an input to the three approaches to factor construction we tested:
- Word count: We counted ROC keyword matches in the business section of a company’s 10-K filing.
- Semantic search: We identified a company’s products and services from its 10-K filing’s business section using semantic-role-labeling3 techniques and then counted ROC keyword matches for those products and services.
- Concept exposure: We used a knowledge graph dataset4 that quantifies a company’s exposures to high-level concepts, or themes, based on the co-occurrences of companies and those concepts; the centrality of the connections; and the links to similar concepts. Then we aggregated each company’s exposure to concepts that contained any of our ROC keywords.
In the first two approaches, the count of keyword matches became our raw ROC-factor exposure. For the concept-exposure method, the aggregated exposure to concepts that contained any of our ROC keywords became our raw ROC-factor exposure. We normalized each raw ROC-factor exposure to avoid any outlier influence.5
Finally, we constructed hypothetical portfolios of the 250 stocks with the highest exposure to each of our ROC factors. We weighted the 250 stocks by the product of their normalized ROC exposure and market cap.6 The weights were then normalized to 100% and each issuer capped at 5% to reduce concentration. We also took a simple average of the normalized exposures from the three methods and constructed a fourth, “combined” portfolio in the same way as with each individual ROC-factor portfolio.
Combined ROC Portfolio Outperformed the Individual Ones
We evaluated the performance of the four ROC portfolios over the year-to-date period through June 30. As we see in the exhibit below, the ROC portfolios performed similarly to one another and, in all cases, outperformed the MSCI USA Investable Market Index (IMI) benchmark portfolio. We also note that the combined ROC-factor portfolio had the largest outperformance.
ROC Portfolio Returns
Breaking Down the Combined ROC Portfolio’s Outperformance
When we examined the combined ROC portfolio’s active sector weights using the MSCI USA IMI as the benchmark, we found it overweighted the information-technology (IT) sector. This is not surprising, since many IT companies offer solutions that enable remote operations and it makes sense that they mention related terms more often than other sectors.
Combined ROC Portfolio’s Active Sector Weights
Data as of Dec. 31, 2019
We also examined the combined ROC portfolio’s active exposures to style factors in MSCI’s Barra US Total Market Equity Model for Long-Term Investors (USSLOW). We found it had high exposures to the beta, growth and profitability factors while it had low exposures to dividend yield, value and long-term reversal.
Combined ROC Portfolio’s Active Exposure to Style Factors
Data as of Dec. 31, 2019
While we can’t know how companies with high exposure to our ROC factor will perform in the future, we believe that the techniques and data sources we described can be utilized to capture emerging or long-term themes as the world adjusts to new realities created by the COVID-19 pandemic.
The authors thank George Bonne, Stuart Doole, Neeraj Kumar and Gaurav Trivedi for their contributions to this blog post.
1Kumar, N., Doole, S., Garg, K., Bhalodia, V., and Ghate, D. 2019. “Indexing Change: Understanding MSCI Thematic Indexes.” MSCI Research Insight.
2Word embeddings are language-modeling techniques in NLP where words or phrases from a text “corpus” (group of documents) are mapped to numerical vectors representing related and co-occurring words, or to vectors of linguistic contexts in which the words occur. We used word2vec and sense2vec and BERT embeddings.
3Semantic role labeling is a technique in NLP that detects the predicate-argument structure of sentences by analyzing the semantic role of words. For a detailed description, see: “Semantic role labeling.” Wikipedia.
4We used concept-exposure data from Yewno, which leverages the Yewno Knowledge Graph to extract information from various content sources, including news, company filings, conference-call transcripts and patent filings, to provide scores that quantify directional exposures from entities to concepts.
5To avoid outlier influence, we trimmed raw ROC-factor exposure to the 95% of the maximum value from each raw exposure. We did so by dividing by the maximum value from each raw exposure such that the normalized exposures resided within [0, 1]. For each method, we used data available through the end of 2019.
6We also evaluated other weighting schemes — equal weight, market-cap weight and exposure weight — and obtained similar results.