Leverage NLP to Unlock ESG Insights


1. More than half of ESG metrics are qualitative and unstructured, consisting of text and complex tables that are hard to analyze.
2. ESG data is not standardized in terms of the type and format of data reported
3. Natural Language Processing (NLP) and document comprehension technology can potentially be applied to support ESG data analysis by quantifying unstructured qualitative data.

Investors are increasingly conscious of the significance of environmental, social, and governance (ESG) issues when making investment decisions; the percentage of taking ESG into investment consideration rose from 61% to 72% between 2019 and 2021. Institutions such as the Global Reporting Initiative (GRI) and Task Force on Climate-related Financial Disclosures (TCFD) are creating guidelines and defining materiality to incorporate ESG metrics into the investment process, showing that ESG is paving the way for the future of capital markets. Therefore, ESG should be data-driven and evidence-based.

However, the absence of standardized ESG data and the lack of a robust relationship between granular ESG information and stock performance make implementing sustainable investing challenging. Unlike financial measurements, which are standardized and quantitative, more than 50% of ESG disclosures are qualitative, consisting of unstructured data including texts, tables, charts, and graphics. These unstructured data may include ESG policies, commitments, actions and processes, and objectives that cannot be quantified. Analyzing massive, unstructured ESG data can effectively unlock new insights and empower investors to evaluate a company’s management capabilities holistically.

NLP combines computational linguistics with statistical, machine learning, and deep learning models, enabling computers to comprehend written or spoken language similarly to how humans do. Using supervised learning, a type of AI training methodology, we can develop Natural Language Processing (NLP) models to automatically classify information and score ESG disclosures. Such models can be scaled to cover various types of ESG disclosures published by listed companies in real-time, significantly reducing the amount of time and effort required to analyze unstructured information.

The NLP sentiment model classifies statements into “E” “S” or “G” and scores them. For example, we have developed NLP models using supervised deep learning to classify statements extracted from annual and ESG reports into “E” “S” or “G”, then predict their sentiment.

The NLP model classifies statements and scores them by sentiment

By using millions of examples, we have trained a model that can read a statement and classify it into one of the pre-defined categories.

Then a sentiment model, developed using a similar approach as the ESG model, is applied to predict whether the statement represents something that will positively or adversely impact the company.


  • Statement: vehicle weight reduction will effectively reduce energy consumption for all types of vehicles.
  • Classification: Environment
  • Sentiment: Positive

In addition, ESG data can be exported in a structured format (machine-readable format) using machine learning.

The NLP model accurately extracts the table content into a structured format
The machine-readable format generated by NLP

Whilst it’s hard to copy and paste data tables from PDF documents, our machine learning model can recognize key data tables from a document, extracting and parsing these tables into Excel or a structured database format. This methodology can be applied to a wide range of table formats commonly found in ESG reports.

With the help of NLP, investors will have the necessary tools to incorporate ESG data into their existing investment processes, and potentially implement sustainable investing.

Wizpresso NLP-Powered ESG solutions

If you are interested in learning how our cutting-edge NLP models can comprehend ESG reports in real time, contact hello@wizpresso.com to learn more or request a demo.