The central task of information retrieval (IR) in ecommerce is to find products that satisfy the users’ needs. Words in a concise description of products might not cover product synonyms in all language dialects. These synonyms are essential for delivering the right search results and increasing the effectiveness of ecommerce site search.
Users typing the query might need to be aware of the domain-specific terminology, so the words in the query might not match the words mentioned in the product description. Chances of this are high, which would eventually mean that shoppers cannot find products in the catalog that didn’t appear because of poor synonym matching capabilities in your site search system.
Say Bob wants to gift Alice a glitter dress for Christmas. He would visit an ecommerce site and type the query “glitter dress.” On the other hand, if all the glitter dresses contain a description as sequin (formal fashion word for glitter) dress, the ecommerce site wouldn’t be able to fetch those results due to no match. So Bob leaves the site unsatisfied, assuming that the glitter dress (which he wants to gift) is unavailable. It would be a loss for both the ecommerce site for losing a valued customer and Bob for being unable to give Alice.
However, this is a typical case of terminology mismatch between the product description and the customer’s query, and this scenario happens a lot. In another scenario, languages like English have a lot of dialects. Words like “mobile phone” and “cell phone” mean the same but are in different dialects (British: Mobile phone, American: Cell phone). The product description might not cover all of these cases.
Now, consider that a word like apple changes its meaning based on the domain. Apple in the technology domain refers to a technology company, while it’s a healthy fruit in the food industry. Cases like these elevate the importance of domain orientation of the synonyms generated.
To overcome such scenarios, we follow Query Expansion (QE). Query expansion (QE) is a process in Information Retrieval that consists of selecting and adding terms to the user’s query to minimize query-document mismatch and improve retrieval performance. For QE, we need to identify domain-specific synonyms.
At Unbxd, we have broadly classified the synonym generation process into 3 categories:
It is a straightforward process. A skilled linguistic individual or community would manually contribute to the existing domain-specific synonyms. Quality would be quite high with this curation, but major cons would be that this process is resource and time-intensive
There is vast public knowledge available for languages like English. For synonyms, we could leverage freely available lexical databases such as Wordnet and Conceptnet. The main issue with this approach is that the synonyms available would be generic (i.e., not domain-specific). Here at Unbxd, we filter domain-specific synonyms using smart filtering algorithms based on clickstream data. One major con is that, with the evolution of new internet slang each day, these lexical databases would take a lot of work to catch up and stay up-to-date.
Considering the high volume of clickstream data, mining synonyms from this data is relatively cheap and high-quality. For example, if Bob is familiar with fashion terminology, he would reformulate the query to sequin and retry when he doesn’t find a glitter dress. We leverage those scenarios to mine the collective intelligence from users’ reformulated searches and generate high-quality synonyms. This approach is heavily based on query chain analysis.
We can successfully test and generate high-quality, reliable synonyms using the discussed approaches in various domains like auto parts, technology, jewelry, and fashion. For Example: In the Auto parts domain; o2 compressor, oxygen compressor, and in an online fashion store; sequin, glitter, etc.
In this way, we ensure that any online shopper doesn't leave the ecommerce site unhappy and unsatisfied and finds what s/he came looking for. Ecommerce sites can't leave money on the table just because their search couldn't understand what the shopper meant by the search query.
Book a demo if you have been a victim of poor synonym-matching capabilities in your search solution, and we'll walk you through all you need to know!