Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information


FinTech firms leverage big/alternative/unstructured data, in particular texts, for originating loans, predicting asset returns, improving customer service, etc. Moreover, interpretable textual information sheds light on key economic mechanisms and explanatory variables. We therefore develop a general framework for analyzing large-scale text-based data, which captures complex linguistic structures while ensuring computational scalability and economic interpretability. We then demonstrate potential applications of our methodology to issues in finance and economics, such as forecasting asset returns or macroeconomic outcomes, valuing startups, and interpreting existing models. By combining the strengths of neural network language models, especially vector representation, and generative statistical modeling, our data-driven approach leverages high-performance computation and strikes the balance between model complexity and interpretability.​