How can generative AI improve textual data analysis in retail?

Data analysis is at the heart of the digitalization of retail. The development of LLM (Large Language Models) and generative AI opens up new perspectives for exploiting textual data.

In recent years, retailers have come to realize the strategic value of the data they and their customers generate. They understand that they can transform this data into useful, actionable information to optimize their processes and better serve customers.

In practice, however, the need for in-depth data analysis still came up against operational obstacles that were difficult to overcome.

The development of new models not only facilitates the interpretation of textual data, but also makes it much more accessible.

Why is text data analysis a key issue for retailers?

In most retail verticals, competition is exacerbated. Traditional players have been joined by pure web players, broadening the scope of competition. Competition now takes place on several fronts (online, in-store) as strategies become omnichannel.

To remain competitive, you need to :

Keeping in step with consumer needs and expectations
Understanding market trends
Optimize decision-making and processes

Analysis of internal and external data is a response to these challenges. It facilitates strategic decision-making. It also helps retailers to better situate themselves in relation to their environment (customers, competitors, etc.).

In particular text data provide customer knowledge and/or benchmark data. Players who make the subject their own will gain a competitive edge.

Here are a few examples of uses that illustrate the importance of text data analysis:

Understanding customer needs and preferences Textual data (customer reviews, product reviews, etc.) provide information on customers’ experiences and feelings about their products and services. Analyzing this data helps retailers to better understand what motivates their customers, what they like and what they criticize, and adjust their offer and strategy accordingly.
Detection of emerging trends The latest trends: retailers can analyze online conversations, comments on social networks and customer reviews to quickly detect changes in consumer behavior and adapt to new trends.
E-reputation management The analysis of customer reviews and social media comments enables retailers to monitor their online reputation and implement actions to optimize it.
Product Matching: You can also analyze textual data to identify identical or similar products in assortments, based on their names.

The main obstacles to text data analysis

The benefits of analyzing textual data are obvious. But many obstacles made the job difficult.

First and foremost, semantic analysis is particularly useful for analyzing large datasets. If a retailer decides to analyze product reviews, for example, it will do so on all the reviews it has, and potentially at least on a selection of strategic products. This means analyzing large datasets. Now, until the emergence of LLMs, analyzing large datasets was both tedious and much more costly.

The second obstacle lies in the ability to ability to appreciate the context and semantics of words. If you use a basic template to summarize reviews, you can isolate keywords. But unless you go and find the reviews and check the context, you don’t know whether these words are used positively or negatively. The approach therefore loses its appeal. It should be noted that is possible without LLMs, models such as BERT and, to a lesser extent, Doc2Vec, are able to “understand” the context of a corpus.

The third obstacle is directly linked to the model limitations. Traditional models couldn’t cope with spelling mistakes. However, many of the notices contain errors or have a syntax close to oral, which makes it even more difficult to assess the context or analyze the occurrence of words in the notices. There encore is perfectly feasible without LLM, either via pre-processing (levenshtein distance), or by using models like Word2Vecs, which often contain errors learned from their training corpus, or even models like BERT, which tokenize words and are therefore less prone to this problem.

Finally language management was another obstacle. If many of the reviews are not written in French, how can they be interpreted? Most pre-trained models are monolingual (or effective only in English). To overcome this problem, it is possible to use a language detection model to retrieve only reviews written in French. But how can you analyse multilingual reviews?

Most of these issues involve considerable technical complexity. It would therefore have been very complex and time-consuming (if not impossible, in some cases) to produce customer review summaries with machine learning, ensuring correct response times and consistent, clear results.

Why are LLMs a game-changer when it comes to interpreting textual data?

LLMs, such as ChatGPT, are “game-changers” in NLP (Natural Language Processing). NLP (Natural Language Processing). They facilitate access to textual data interpretation and greatly optimize performance.

Indeed, LLMs remove almost all the obstacles we listed earlier.

A chatGPT LLM is trained trained on a huge mass of data. This training makes him able to “understand” the context in which words are used, nuances, basic sarcasm, … However, all these “subtleties” are difficult for traditional machine learning models to overcome, or require human intervention and therefore prohibitive costs.

From now on, language detection is no longer an obstacle for new models (depending on the training data” processed). It is possible to training with multilingual datasets. It is no longer necessary to filter reviews by language before analysis. LLM-based solutions also understand words with spelling errors very well, and are able to reconcile them according to context.

In addition to overcoming these traditional obstacles, LLMs offer other benefits.

It is possible to “configure” a conversational AI by giving it a prompt. to perform specific tasks, such as summarizing customer reviews. The prompt allows you to give the model very precise instructions about your expectations: language, response format, etc. You can also imagine asking it to perform a different task. The possible uses are therefore almost infinite.

What are the constraints on using LLMs to exploit semantic data?

LLMs offer a number of advantages to help retailers make the most of text data.

But using them in this way also comes with a number of constraints. a few constraintssuch as :

The model weight This can vary between 5 and 120 GB depending on the complexity of the model and its architecture. the need for RAM or VRAM (models to be loaded into it). In itself, 120GB is no longer a prohibitive size for hard disks.
Calculation time: with “conventional” hardware, calculation time explodes when using such heavy models.

From a technical point of view, there are ways of overcoming these constraints:

At software level, it is possible to “compress” the model without significantly reducing its performance. It is also possible to use libraries (compiled in C++ in our case) specially optimized for matrix calculations.
On the hardware side, high-end graphics processing units (GPUs), specially designed to handle a large number of tasks in parallel, speed up response times considerably.

Freed from the main obstacles to its use, textual data analysis is set to become an integral part of retailers’ analytical solutions.

Indeed, the benefits are clear.

It helps industry players to better understand customer feelings and needs, and to compare customer opinions on different products. As a result, you can use it to obtain actionable information to optimize your assortment, your prices, etc.

The advent of LLMs and generative AI simplifies and amplifies this analysis. The new models make it both more accessible and more relevant. Like the growing integration of data and AI into decision-making processes, LLMs offer retailers new keys to understanding and acting.

Inventory management

Inventory optimization AI analyzes historical and real-time data to determine ideal stock levels. This minimizes the costs associated with excess stock, while avoiding stock-outs.
Demand forecasting: Using advanced algorithms, AI can predict consumption trends and adjust stock levels accordingly. This is particularly useful for seasonal products or high-demand items.
Automated replenishment: AI can automatically trigger replenishment orders when stocks reach a certain threshold, ensuring continuous product availability.

Predictive analysis

Anticipating trends: by analyzing large quantities of data from a variety of sources (past sales, customer behavior, market trends), AI can identify emerging trends and help retailers make informed decisions.
Offer personalization: By better understanding customer preferences, retailers can tailor their offers and promotions in a more targeted way, boosting sales and customer satisfaction.
Price optimization: AI algorithms can analyze market data and adjust prices in real time to maximize margins while remaining competitive.

Many thanks to Siegfried Delannoy and Lucas Duleu, both data scientists, for their valuable time and their essential contribution to our discussions, which enriched the writing of this article.

Subscribe to our Newsletters :

Our Last Articles :