Since Zomato’s inception, our users have played an important role in helping people make informed decisions on what and where to eat.
We now house reviews and photos for about a million listings on our platform and as a team, are constantly working to make this content easily creatable and consumable by all.
To know more about how we made reviewing seamless for everyone, give our post on Reviews 2.0 (Part 1) a read. Overall, the response has been overwhelming and we are witnessing a 35% – 40% increase in reviews being posted month on month.
Isn’t that amazing?
Below are some essential details of the product we wanted to build –
1. For the reviewer – We wanted tags to be showcased to a reviewer at the time of review creation
2. For the reader – We wanted to showcase what each restaurant is known for or as we call this section – “Read what people are talking about”
But before we jump into the ‘what’ and ‘how’ of our implementation, let’s have a look at the overall architecture of Reviews 2.0.
Architecture of Reviews 2.0
Our first step was to aggregate, what you can call a “knowledge bank” – a corpus of all tags one could use in the dining and restaurant industry. From “comfortable seating” to “prompt service”, our corpus incorporates a wide variety of tags. This is shown to reviewers when they wish to point out what they liked / didn’t like about a particular restaurant. We named it Z-Tag Corpus.
At the same time, we also wanted to design a system (an engine to process our reviews, to be specific), which provides insights into each restaurant by analyzing their reviews, taking into consideration the same corpus.
- How did we create Z-Tag corpus?
The implementation aimed to summarize millions of reviews into broader tags, reflecting the mood and sentiment of our users.
We used the good ol’ Python for all our NLP needs and data aggregation. The following libraries were ready at our disposal and were quite useful in tag creation –
- Spacy + NLTK — At Zomato’s scale, we wanted a library equally powerful to process millions of reviews with the least dependencies and goodness of Cython. We went ahead with Spacy for stopwords removal, POS tagging, tokenization, lemmatization, dependency parsing and NER. Spacy’s pipeline flow allowed us to focus on experiments whilst Spacy took care of performance for us. SentiWordNet and VaderSentiment Analyzer from NLTK gave us simple yet effective sentiment analysers.
“awesome food”, “incredible service”, and “pathetic staff” were distinctly classified as having a positive or negative sentiment. Hence, NLTKs inbuilt libraries served as secondary, or one can say fallback classifiers.
- Jellyfish, Fuzzy Wuzzy and Difflib — These were utilised to create a map of correctly spelt words and their possibly misspelled substitutes. We also used them to detect highly similar strings.
- Gensim — We used Gensim for Topic modeling and its phraser module came in handy during phrase (collocation) extraction. We extracted bigrams and trigrams collocations that helped us in capturing the adjectives attached with an entity. “courteous staff”, and “quick service” really bring out the meaning in a context.
- FastText — It is important to realize that a word won’t be of much use to us if we do not know the category to which it belongs. Does “live music” represent ambience? Does “kid friendly” represent service? We used Fasttext to classify the entities into various categories. This gave us more control over showcasing every aspect of a restaurant to our users.
- Scikit-learn — It was our go-to library for modelling data, classification, clustering, etc. Scikit-learn made it possible to implement feature engineering through the wide range of algorithms it provides. Apart from various other use cases, it was also used for building a classifier to measure and classify the contextual sentiment for the extracted key topics.
Remarks like “courteous staff” or “mouth watering food” are positive sentiments, while “pathetic service” is negative.
However, certain mentions like “long waiting time..” or “..the food portions were less..”, have a neutral sentiment in a general sense, but in the restaurants or dining domain, they become contextually positive/negative sentiments.
2. How did we obtain relevant content for each restaurant?
In a restaurant review, users share their experience about several aspects of the visit — food, ambience, service, etc. As the number of reviews is not limited and each one mentions a different viewpoint, grasping the overall sense of these viewpoints from hundreds of reviews is cumbersome and time-consuming. We wanted to devise a way to make this decision making process faster.
To answer this, we designed the section — “Read what people are talking about”
The approach we followed to get this right –
We mine opinions from reviews inspired by ABSA (Aspect Based Sentiment Analysis), which predicts the corresponding sentiment of an extracted aspect mentioned in the text documents.
Consider this sentence for example — “The Chicken Whopper at Burger King was amazing but the service was slow”. In this sentence, even though the overall sentiment is mixed, it clearly mentions two different aspects “Chicken Whopper” and “service” in positive and negative connotations respectively.
In a syntactic grammar-based approach, a set of grammar rules are applied to the dataset to extract aspects. A syntactic grammar is defined with a clause and corresponding chunking rule, for example the VBG_DESCRIBING_NN_VV clause defines the following syntactic pattern:
This clause chunks the sentence when a verb (VB) describes the opinion on a target. For example, in the sentence “The place was awesome,” the verb awesome is describing the opinion on target place.
How ABSA works –
When syntactic rule produces expected chunks: The snippet above shows chunking by the syntactic grammar clause- VBG_DESCRIBING_NN_VV. A relation extractor processes the chunked list of trees for the relationship between entities in the sentence. Though the syntactic approach is effective in parsing, it often suffers from noisy extraction.
As the coverage rules are increased, it eventually results in overlapping rules and hence noisy extraction. It is clear from the snippet below, that the same rule also interferes with a different sentence. Since the chunked part in the diagram below has no aspect opinion, it results in a noisy extraction.
When syntactic rule produces unexpected chunks: The particular syntactic rule was not expected to parse this sentence. It not only results in incorrect extraction but also blocks the other syntactic rule, which has resulted in precise extraction.
To address this problem, we proposed a hybrid of rule-based and machine learning models. The rule based model would extract the aspects and their opinion words, while the machine learning model learns the effectiveness of these rules with different sentence structures for a given corpus.
For training the model, a dataset is prepared with the sentence and aspect polarity extracted from each rule. A multi-label classifier is trained for syntactic rule prediction followed by relation extraction. This classifier allows us to select the suitable syntactic rule for parsing as the first step and reduces the noise extraction from other ineffective rules.
The selection process is fully automated owing to our multi-label classifier.
Challenges we faced
For an extracted entity to reach its final disposal in the product form, it must adhere to certain guidelines and business requirements in order to handle a varied set of audiences –
• Quality entity extraction
• Contextual diversity
• Personalization for maximum utility
Bringing such a model to production comes with key challenges –
- Frequent Words – Some common words are widely used in reviews. Although useful, they cannot be helpful if found on every highlight presented. For example, even though the sentence “The food is amazing at Bellagio.”, describes food, but it does not fit into our presentation form. A preferred entity is more specific, such as ‘Kung Pao Soup’ in “Kung Pao Soup served was mouth watering”. It has been explicitly described and is a very specific dish, which qualifies as an entity for our use case.
We went ahead with an approach to find the frequency distribution of the words in our own corpus and selected the parent topic out of secondary ranked items using this frequency ranking.
- Entity Diversity – Showing multiple variations of a single dish (say Pizza) will lead to a lack of diversity. A context-based selection of entity is conducted to incorporate for multiplicity. To achieve this objective, the extracted and classified topics are grouped on underlying intent, and a representative topic is chosen out of the group for conveying high level information.
- Time Decay Factor – By using an exponential time decay function, giving more weight to trending topics, we ensure that our implementation does not result in historical data.
Our work on Reviews 2.0 opens doors for various relation extraction methods to be incorporated in order to minimize the noise. In an implicit aspect, there is an indirect indication of opinionated aspect. In the sentence “The place is quite expensive”, there is no clear mention of an aspect price, but “expensive” indirectly indicates towards it. Currently, our system works only with explicit aspects. The immediate extension of this, can be to incorporate the implicit aspects as well.
Additional efficient syntactic rules and relation extractors can also be included to enhance the process further. Post our launch, as a final step, we deployed web apps on a new subdomain behind an ELB and automated the entire production process using Ansible.
Have a look at our end product, it will surely blow your mind!
“The chicken biryani served here is awesome. Really loved the vibrant decor. Their cheese nachos are worth dying for. I will definitely visit this place again.”