The era of digital transformation enhanced business activities and processes, leading to an evolution of the value chain in every industry. The large amount of data available opened up new scenarios and innovative approaches in the collection and in the analysis of information, making the so-called big data smart data.

Today we will discover how to use text mining to analyze the value creation of an emerging technology: the blockchain.

Before starting… What does it mean “value creation”? Value creation is related to the ability of companies to generate wealth or profit through their economic activity and provide new and original benefits for customers that use their products or services. There are several approaches for achieving this goal:

  • increase the efficiency of the process to reduce costs and obtain higher profit;
  • leverage on the unique attributes of product or services to increase price and obtain higher profit,
  • leverage on the price-quality ratio of product or services to increase price or reduce cost and obtain higher profit.

Blockchain is a digital, open-source, peer-to-peer transaction system made of a list of records, registered on different computers, so we have a decentralized databases managed by a self-governing community. The records are linked to each other using cryptography. Encrypted and security procedures ensure the privacy, the integrity, and the validity of the data.

Some applications of this technology are:

  • the distributed ledger for cryptocurrencies, such as the Bitcoin (, in this case the users in the network make transactions by trading electronic money;
  • the  supply chain management, from food to luxury goods, you can create records containing the history of transactions of each product tracking the origins, the processing and the distribution.

As every technological change, the blockchain offers certain advantages but hides some drawbacks, that lead to a gap between the promise of value creation and the value actually generated. Indeed, the blockchain aims at creating trust-less and intermediary-free systems for the sharing-economy thanks to decentralisation and security, but it underestimated the behavioural component of the real-world interactions of the users, so far impeding the success of entirely trust-free peer-to-peer systems [1] .

This deals with a typical problem in engineering design, i.e. the gaps between the design of the offer (product or service), the expectation and the following experience of the user, as well explained in the tree swing story, depicted in Figure 1.

Figure 1 – The tree swing story first came out in the 1970s and explain the variation of the interpretation and implementation of a requirement in the development of a tree swing, producing gaps in promising, expectation and experience.

As anticipated, we can use text mining and NLP techniques to analyse the advantages and the drawbacks of a technological systems. This may help in anticipating disadvantages and failures of an emerging technology, mitigating the cognitive distortions of decisionmakers, detecting performance shortcomings of given device or system.

In this B4DS Notebook, we talk about a scientific study done by our group (Filippo Chiarello, Paola Belingheri, Andrea Bonaccorsi, Gualtiero Fantoni, & Antonella Martini), that are published in Technology Analysis & Strategic Management journal [2]. They examined a collection of 6.893 abstracts belonging to the blockchain technological field, using sentiment analysis and topic modelling with the purpose to identify the issues in this technological domain.


”Sentiment analysis is a type of data mining that measures the inclination of people’s opinions through natural language processing (NLP), computational linguistics and text analysis, which are used to extract and analyze subjective information from the Web – mostly social media and similar sources. The analyzed data quantifies the general public’s sentiments or reactions toward certain products, people or ideas and reveal the contextual polarity of the information.”

“Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents”.


After the typical textual pre-processing operations, that we have learned about in this series of notebooks, sentiment analysis is used to assign a polarity score (positive or negative) to each sentence of the collected abstracts.

The sentences with a negative polarity score are selected to inspect the problems of blockchain. Those sentences resulted in 218.618 tokens (single word)!! So, there was necessary to identify the relevant words that can suggest an issue. This means: how we can recognize in such a big amount of tokens the meaningful ones that communicate information about the problem that the sentence is describing? Filtering steps is the approach largely used in knowledge extraction from documents. Therefore, using specific lexicon and rules, “only” 64.051 final lemmas from 7.085 sentences were selected.

At this point, the data were ready for the topic modelling!

The Latent Dirichlet Algorithm (LDA) allows to identify the relevant topics and the evaluation from a panel of expert lead to the definition of the proper number of topics (i.e., 9) and so to the final list of topics that represent the blockchain problems.

The map of nine topics of blockchain problems is reported in Figure 2, where on the horizontal axis there are the topics and on the vertical one there are the keywords identified in the analyzed documents; the size of the nodes is proportional to the number of occurrences in the collection of documents.

Figure 2 – Map of nine topics of blockchain problems. [2]

Those problems “can be interpreted as follows:

  1. Power: Blockchain has an environmental cost and indeed this topic is strongly related with
    topic 5.
  2. Storage: refers to the infrastructure itself, or how data is stored, transformed, and shared in
    This creates issues to be solved in terms of trust and scalability of the systems.
  3. Design: The need to consider requirements coming from different disciplines (e.g. how to use sensors, how to certify transactions, how to scale the system).
  4. Communication: Problems related to data and meta-data exchange in the network are crucial to ensure scalability, integrity, and energy efficiency This is particularly true for cryptocurrencies such as Ethereum and Bitcoin.
  5. Cost: The cost of the technology is still a major issue, but related problems also include
  6. Bitcoin: Bitcoin issues are discussed at length as it is the first and most notable blockchain application. Bitcoin’s main issues are related to consensus, trust, transparency and safe storage, especially in the cloud.
  7. Environment: Blockchain technology could be a game-changer for the environment if properly managed, mitigating its high energy consumption.
  8. Consensus: Consensus protocols are paramount since they ensure a common, unambiguous
    ordering of transactions and blocks and guarantee the integrity and consistency of the blockchain across geographically distributed nodes.
  9. Trust: The principles of encryption and distributed ledgering behind blockchain, are hard to
    understand and thus hard to trust for most potential users.” [2]

In conclusion, all the cutting-edge technologies take over the technological landscape thanks to the promised advantages. But, as the development keeps going, problems and issues emerge, increasing the gap between experience and expectation and producing value creation distortions. In this context text mining and NLP techniques can help in mapping problems and identify potential areas of improvement.


Now… the floor is yours! What are the emerging technologies that may hide some drawbacks that can be detected using NLP tools? We are waiting for your comments!!!

By Irene Spada and Nicola Melluso


[1] Hawlitschek, F., Notheisen, B., & Teubner, T. (2018). The limits of trust-free systems: A literature review on blockchain technology and trust in the sharing economy. Electronic commerce research and applications29, 50-63.

[2] Chiarello, F., Belingheri, P., Bonaccorsi, A., Fantoni, G., & Martini, A. (2021). Value creation in emerging technologies through text mining: the case of blockchain. Technology Analysis & Strategic Management, 1-17.