Executive Summary
This PFS document outlines the vision of Meesho’s Search. Through this PFS exercise, we have focused on the need to shift from a query volume-driven approach to a more user-centric (long tail focus) model. We analysed the current search metrics, identified key problems and envisioned search through a new lens.
The key takeaways from this exercise are-
Vision: Make Meesho the go-to app for product searches and become a truly long tail, multi-modal search system that caters to users’ individual preferences. Best-in-Class Metrics: Improve search aided awareness, search adoption, query market share, CTR and MRR while reducing time to first click and increasing conversion rates (Current long tail CTR is 2.1% whereas that for head is 2.8%; long tail MRR is 0.11 whereas that for head is 0.14). Problems: Top relevance issues in Search include ineffective understanding of query and attributes (~26% sessions, ~3% NMV opp), inaccurate product type retrieval (~10% sessions, ~1.5% NMV opp), and vernacular queries (~1% sessions, ~0.5% NMV opp). System architecture: Vision backwards design proposed to solve for long tail query problems and considering multi modal inputs. Align on the short term trade-offs: Since we are proposing to change the roadmap to solve the problems listed on long tail queries, we want to align on a 0.7% NMV/Vi tradeoff for this cycle (Reduce goal of 1.5% this cycle to 0.8%). We plan to recover this tradeoff by taking a goal of 2% NMV/Vi till Dec’25. Objective
The objective of the document is to envision the future of search at Meesho. So far, our approach to solving for Meesho’s search was driven by query volume. Our focus was more on head and torso queries (~75% sessions) and less on tail, which has limited our scope to immediate needs rather than long-term growth opportunities.
Opportunity: Also, we realised that our torso conversion is only marginally better than tail conversion but head conversion is almost twice that of torso and tail, which presents an opportunity of ~5% plat level conversion through improvement in ~45% sessions (long tail).
Scope of this doc-
This document focuses on understanding the current problems in search results’ relevance at Meesho Search especially on long tail queries, the metrics we look at and the opportunity that exists. This document does not detail the implementation plans but focuses on vision, problems, challenges with current system and high-level solutions. Why envision now?-
Unstructured User Queries: We receive around 17M unique queries everyday of which 11M are tail queries. The number of unique tail queries has increased by 10% in the last 6 months. We have identified that even though we are a long tail e-commerce platform knowing that our users express themselves in an unstructured way, we do not serve tail queries very well (Current head CTR 2.9% vs long tail CTR 2.4%). Our major focus has been on improving for Head and Torso queries which has to be shifted to Tail queries. Technological Advancements: Technological advancements around GenAI and new techniques in machine learning algorithms give us the confidence to create systems that can help in understanding the query better and improve relevance of results. Search Vision
Vision- Make Meesho the go-to app for product searches and become a truly long tail multi-modal search system that caters to users’ individual preferences
By envisioning a forward-thinking search structure, we aim to:
Become the go-to platform for any product search for India: When users think of a product, their first instinct should be to open the Meesho app. {Aim for high Search Aided Awareness and high query market share} Be a truly long tail search platform: Deliver the best-in-class relevance using artificial general intelligence even when a query is absolutely new on the platform. {Aim for high CTR} Deliver a truly inclusive and multi-modal search experience: Empower users to interact with Meesho app through voice, image, text, and video-based search, making the journey seamless regardless of input preference. {Aim for high Search adoption} Understand and serve the user intent: Surface relevant products that are highly personalised to the users’ preferences {Aim for high MRR} For this PFS, the focus is on relevance of search results. We plan to cover experience and personalisation related points in separate sessions.
Search Excellence Metrics
To understand how current search performs, we went through the metrics and the numbers we trend at today. The key insight is that we significantly lag behind in relevance metrics for long tail compared to head, especially in CTR, MRR and No clicks sessions.
Metric & Current value (Jan'25)
Problems and Challenges with current system
Problems-
To understand the problems in search, we tried to comprehend the issues with the queries with lowest CTR (relevance indicator) and worst (high number) first click position. To find the directional opportunity, we compared the CTR, MRR and Conversion (orders/session) with head (queries where we do relatively better today).
To understand the search architecture, please refer diagram. L0 is retrieval layer, L1 is intermediate ranker, L2 is final ranker. Challenge with the current system
For the scope of this doc, the relevance problems have been focused upon.
Alignment points
Relevance Goal for 2025 (2% NMV/Vi with Relevance Only Improvements):
By Dec 2025, we intend to bring overall Search relevance performance closer to head query performance by solving for the top 2 relevance problems (while going long tail query first).
Overall: 2.8% against today’s 2.65% (5.6% Increase) Long tail: 2.7% against today’s 2.4% (12.5% Increase) Overall: 0.135 against today’s 0.13 (4% Increase) Long tail: 0.125 against today’s 0.11 (13.6% Increase) This shall lead to overall platform NMV increase of 2% NMV.
Short term impact trade off (0.7% NMV/Vi to be pushed out from the current cycle)
Investing in the solves for long tail need more infra investment and iterations to realise this impact, hence we intend to immediately repurpose pod’s bandwidth on long tail focus. As an effect of this, our iterations on relevance solves (NER for attribute relevance) for head queries will be pushed out to first build for long tail and extend that solve to head queries.
Hence, 0.7% NMV gets pushed out of the current cycle, which will be realised as part of the overall 2% NMV goal for relevance solve for both long tail and head queries.
Vision backwards proposed design-
Guiding Principles-
Relevance Over Conversion: Prioritise relevant products to help users find what they need faster. Scalable System Design: Make decisions with long-tail scalability in mind. Search Experience: Analyse user interactions (e.g., voice search experience like long-pressing the voice icon in WA). Search Relevance & Ranking: Leverage implicit/explicit signals (e.g., some users prefer branded products, which current algorithms overlook). Transparent Communication: Inform users when a product isn’t available to avoid confusion (e.g., clarifying that laptops aren’t sold). Proposed design-
We propose the following design, which incorporates:
Query Understanding: Leveraging LLM/SLM for query correction, translation, and category/attribute understanding, moving away from manually created rules. Retrieval Layer: Implementing multi-vector retrieval, taxonomy attributes, image analysis, and supplier-enriched data to improve catalog understanding, with future enhancements like semantic retrieval and attribute-based boosting. Relevance Filtering: Using LLM-based evaluators to filter irrelevant catalogs, ensuring precision-focused relevance. Ranking: Ranking catalogs based on query, attributes, and user affinity for factors like price, rating, and quality, with future plans for enhanced ranking based on gender, region, and attribute preferences. Post-Ranker Diversification: Introducing a diversity-based ranking to increase variety in the feed, with a planned new component for further improving variety. Catalog enrichment: Enhancing catalog data through inaccuracies’ identification, missing data imputation, supplier-provided enrichments, and AI-driven metadata generation to improve retrieval and ranking effectiveness. This design aims to enhance catalog retrieval, filtering, and ranking through AI-driven methods, better semantic understanding, and personalized relevance adjustments.
Current exploit system design (Optional)-
Currently, we operate differently for head-torso vs tail queries. Head and torso results are pre-computed and ranked real time. Tail results are both computed and ranked real time. For Tail queries, we do not have a L1 ranker and ES yet.
Note 2- Zonal and non-GST CGs have been left out here to reduce complexity of representation.
Next steps
Post alignment, we plan to conduct-
WS: Conduct working sessions on individual solutions, starting with the solution for attribute not considered problem (~3% total NMV opp). In-depth outside-in: Build upon outside-in solutions and architecture understanding through knowledge repositories available outside and expert calls. Long term investment: Follow the proposed solutions for the respective streams and get back on the tentative timelines.