Measuring Firms’ Remote-Workforce Abilities
Blog post
July 14, 2020
- The COVID-19 pandemic disrupted the operating models of many businesses and forced a shift to remote working, digitization and low-contact transactions and services, which we term "remote-operation capability" (ROC).
- Using machine learning and natural language processing we built a ROC factor. Companies with high exposure to our ROC factor outperformed the MSCI USA IMI by around 15 percentage points YTD through June 30.
- Our hypothetical "combined" ROC portfolio, built from our three other ROC portfolios, had high exposure to the beta, growth and profitability factors; and low exposure to dividend yield, value and long-term reversal.
Constructing the ROC Factor
Our first step was employing the "topic modeling" approach to make the theme concrete. This is a research technology used in MSCI thematic indexes1 and other products that leverage ML and NLP. We started with a set of "seed" words and phrases with strong, intuitive relations to the ROC theme (e.g., home working, remote work and telecommuting). We then used "word embedding" models2 to expand the seed-word list to a larger "dictionary" of about 50 keywords and phrases. The following is the word cloud version of the keyword dictionary for ROC.

Next, this dictionary becomes an input to the three approaches to factor construction we tested:
- Word count: We counted ROC keyword matches in the business section of a company's 10-K filing.
- Semantic search: We identified a company's products and services from its 10-K filing's business section using semantic-role-labeling3 techniques and then counted ROC keyword matches for those products and services.
- Concept exposure: We used a knowledge graph dataset4 that quantifies a company's exposures to high-level concepts, or themes, based on the co-occurrences of companies and those concepts; the centrality of the connections; and the links to similar concepts. Then we aggregated each company's exposure to concepts that contained any of our ROC keywords.
Combined ROC Portfolio Outperformed the Individual Ones
We evaluated the performance of the four ROC portfolios over the year-to-date period through June 30. As we see in the exhibit below, the ROC portfolios performed similarly to one another and, in all cases, outperformed the MSCI USA Investable Market Index (IMI) benchmark portfolio. We also note that the combined ROC-factor portfolio had the largest outperformance.

Breaking Down the Combined ROC Portfolio's Outperformance
When we examined the combined ROC portfolio's active sector weights using the MSCI USA IMI as the benchmark, we found it overweighted the information-technology (IT) sector. This is not surprising, since many IT companies offer solutions that enable remote operations and it makes sense that they mention related terms more often than other sectors.

Data as of Dec. 31, 2019
We also examined the combined ROC portfolio's active exposures to style factors in MSCI's Barra US Total Market Equity Model for Long-Term Investors (USSLOW). We found it had high exposures to the beta, growth and profitability factors while it had low exposures to dividend yield, value and long-term reversal.

Data as of Dec. 31, 2019
While we can't know how companies with high exposure to our ROC factor will perform in the future, we believe that the techniques and data sources we described can be utilized to capture emerging or long-term themes as the world adjusts to new realities created by the COVID-19 pandemic.
The authors thank George Bonne, Stuart Doole, Neeraj Kumar and Gaurav Trivedi for their contributions to this blog post.
Further Reading
1Kumar, N., Doole, S., Garg, K., Bhalodia, V., and Ghate, D. 2019. “Indexing Change: Understanding MSCI Thematic Indexes.” MSCI Research Insight.2Word embeddings are language-modeling techniques in NLP where words or phrases from a text “corpus” (group of documents) are mapped to numerical vectors representing related and co-occurring words, or to vectors of linguistic contexts in which the words occur. We used word2vec and sense2vec and BERT embeddings.3Semantic role labeling is a technique in NLP that detects the predicate-argument structure of sentences by analyzing the semantic role of words. For a detailed description, see: “Semantic role labeling.” Wikipedia.4We used concept-exposure data from Yewno, which leverages the Yewno Knowledge Graph to extract information from various content sources, including news, company filings, conference-call transcripts and patent filings, to provide scores that quantify directional exposures from entities to concepts.5To avoid outlier influence, we trimmed raw ROC-factor exposure to the 95% of the maximum value from each raw exposure. We did so by dividing by the maximum value from each raw exposure such that the normalized exposures resided within [0, 1]. For each method, we used data available through the end of 2019.6We also evaluated other weighting schemes — equal weight, market-cap weight and exposure weight — and obtained similar results.—
The content of this page is for informational purposes only and is intended for institutional professionals with the analytical resources and tools necessary to interpret any performance information. Nothing herein is intended to recommend any product, tool or service. For all references to laws, rules or regulations, please note that the information is provided “as is” and does not constitute legal advice or any binding interpretation. Any approach to comply with regulatory or policy initiatives should be discussed with your own legal counsel and/or the relevant competent authority, as needed.