Skip to content

Problem Context -

Sort and filter service has been manually set up by business teams leading to multiple issues in the kinds of label and their values being visible as well as relevance of the results post filter application due to incomplete config. The problem can be bifurcated as -
Issues with filter label comprehensiveness (Print / Pattern, Occasion not present as filters while available as attributes on Backend)
Not all relevant label values mapped to the filter label - Taxonomy attribute additions were never accounted to refresh label values due to the static manual config
Label value definitions aren’t comprehensive - Label values are currently mapped to static mapping of SSCAT attributes which again doesn’t bake in changes in taxonomy as well as had manual misses in defining all mapping SSCAT attributes to the label value during creation

Impact -

Filters are currently used across ~5% of feed opens which is staggeringly lower than competition faring at ~25%.
The automated build would lead to -
More relevant filters being visible across static filter screens and eventually HVF / IF post user’s interaction with the new filters
More relevant selection getting displayed post filter application with a comprehensive map across all SSCAT attributes
We expect the build to enable better usability of relevant filters and compounding fly wheel effect to improve net filter adoption to ~12% (Would be higher with UX updates, planned iteratively)

Current flow -


Filter label and their values are currently manually setup and then persisted in -
Label metadata -
Global Filter labels are persisted in with the following metadata -
Dynamic filter position - Global priority for dynamic filters which shows up in the row between sort and static filter CTA
Name - Label name
All Filters position - Global priority of filter labels on static / all filter screen
Show in all filters - Whether to display the filter label on all filter screen
Valid - Validating / Invalidating a filter label
Type - TBC with tech
Show in dynamic filter - Whether to show the filters in dynamic filter position
Granular filter - TBC with tech
Label value metadata - Global Filter label values are persisted in with the following metadata -
Label ID - Parent filter label ID for the filter value
Value - Name of filter value
Position - label value position within the label
Image URL - HVF v1 UI image for the label value
Valid - Validating / Invalidating a filter label value
config - JSON with the mix of attributes mapping to a filter label value
allow_in_high_visibility - HVF enablement for the filter label value
hvf_image_url_v2 - Images used to power HVF v2 UI
Mid feed filter metadata - Subset of filter values classifying for mid feed filters are persisted in with the following metadata -
Image URL - Mid feed / IF label value image link
show_in_mid_feed_filter- Filter label value classified for mid feed filters
How are these filters served on the app -
Filters are mapped to catalogs in marvel and then persisted in the serving layer / Elastic
Selection of filters at a feed -
HVF - Pushed via heuristic model tables
CLP - gold.clp_hvf_final_v2
Search - gold.search_hvf_final_v2
Collections - gold.collections_hvf_final_v2
IF - Pushed via -
Tech tables publishing IFs
Powered by -
Variant 4 - LLM for top 500 CLPs
Variant 1/2/3 - Existing legacy models
Other variants to be planned by BEx BIs
Static filters -
Decided basis the catalogs classifying for the feed and their subsequent label mappings
Dynamic filters -
Chosen basis “Show in dynamic filter” flag in global filter label table
Prioritised and positioned per feed basis -
Other global checks for filter -
Filter values are not displayed if count of classifying catalogs per attribute < X per real estate. Here X, configurable on ZK -
CLP - 50
Collection - 150
Text Search - 150
For-You - 150
Shop - 3
Visual Search - 2


Suggested flow -

DS -
Create a DS pipeline which -
Analyses all taxonomy attributes and appropriately clusters group of similar attributes into common filter labels and group of similar attribute values into common label values
Update the config for filter values to be be representative of all attribute and value combinations present on the platform
Create a mapper of new label ID to existing label ID and new label value ID to existing label value ID - To be used to update HVF / IF models
Save the new label values and labels in the exact same schema as of the base tables with the following specification -
Filter label table -
Dynamic filter position - Retain this for existing overlapping new labels, Keep null for incremental labels
Name - New / Existing Label name
All Filters position - Give the context of existing static filter screen position for existing labels to LLM and then fetch a logical stack rank across all filter labels
Prompt should prioritise category agnostic filters like -
Category
Gender
Price
Rating
Discount
Color
Size
Gold
Mall
Smartcoin
Combo
Material / Fabric
Followed by relevant taxonomy / category specific labels
The context of existing positions can be passed over to LLM to establish global label priority
Show in all filters - Keep this true for all newly generated labels
Valid - Keep this valid for all newly generated labels
Type - TBC
Show in dynamic filter - Enable for overlapping labels where this was true earlier, Keep false for incremental labels
Granular filter - TBC
Filter values table -
Label ID - Parent filter label ID for the filter value. Map this to the new label IDs created
Value - Name of the new filter value ID
Position - label value position within the label - Retain for old overlapping label values as the existing position for old label values
Create a priority map from LLM for new incremental values - Can also look at other heuristic approaches
Image URL - Copy this for the newly setup value IDs which map to older values and have images present
Valid - Keep this valid for all new additions
Config - The updated config for all newly setup filter label values
allow_in_high_visibility - HVF enablement for the filter label value, Enable across all newly setup values
hvf_image_url_v2 - Copy existing images from new label value IDs which map to older values
Mid feed filter metadata - Subset of filter values classifying for mid feed filters are persisted in with the following metadata -
Image URL - Copy for existing, keep null for rest
show_in_mid_feed_filter- Enable for all newly setup filter label values
DS - Tech automated sync -
Create a pipeline which tech reads to update / insert new label, label values and their config in a weekly batch job run
Do a one time backfill for the first time update
Tech changes -
Integrate the DS pipeline to support upsert of filter labels, label values and their metadata (Eg - Config)
Run the global bootstrap job to allocate the new label and label values across all live catalogs
For all of the DS created labels and label value IDs, ensure maintaining a primary identifier key {Is_ds_setup:”true”}
Integrate the serving flow to be powered via A/B or ABACUS such that -
Users part of A/B and with the feature key enabled see -
New filter labels and values on static filter screen
HVF
IF
Dynamic filter screen
The filter labels should correspond to the new config and users should see the catalogs mapping to those





DS solutioning -


Attribute Key & Value Normalisation
Solution 1 (Naive Approach)
image.png

Issues:
1) Some categories have upto ~80 keys, sending all of them in a single prompt and asking GPT-4 to cluster will lead to precision and recall issues.
2) Sending pairwise K1 and K2 will lead to too many API requests even after batch processing.
3) Similar for values too many values to cluster into at one point. And this has to be done for each SSCatId, Key pairs, which again will lead to too many API requests.
4) A lot of efforts needs to be required at prompt engineering and still can’t guarantee that the same (temperature, top_p) param config will work for all categories.

Solution 2 (Better Optimised Approach)

image.png
Advantages:
1) Not too many API calls for both Attribute key Normalisation as well as Value Normalisation
2) ANN Index takes care of Recall and GPT takes care of Precision
3) The Disjoint Set Data structure ensures one representative root for all the Keys/Values that are similar.
4) The root is decided based on weightage. I.e number of offers is the decider out of K1 and K2 which of key should be treated as Root. This will ensure that the root key is the best name that can be directly shown to the users.
5) Easy to add new similar Keys/Values into the existing data, when run next time.
6) Regular scheduled runs will ensure that the Filter Keys and Values are always updated.
7) Even for Cross Category, the same data structure can be used to cluster similar Keys/Values and have a single representative for all of them.
8) Requests being sent to GPT are paralleled. Hitting multiple APIs together to achieve parallel processing and reducing latency.


/




Merging / creating new filter labels / values -


Men t-shirt (X) - Fabric (Cotton, Pima cotton)
Men shirt (Y) - Fabric (Silk, Jute)
Men suits (Z) - Top Fabric (Silk, Jute, Cotton, Jacquard)/ bottom fabric (Silk, Jute, Cotton, Jacquard)
Men kurta sets (AA) - Kurta Fabric (Silk, Jute, Cotton, Polyester)/ Bottom fabric (Silk, Jute, Cotton, Jacquard)

With the mix of these SSCATs, Intended label and values would be -
Fabric - Mix of Fabric, top fabric and kurta fabric
Values - Cotton - Config (tshirt.fabric = cotton OR shirt.fabric =cotton OR fabric =cotton OR kurtasets.kurtafabric =cotton) - All these values gets mapped under one common label i.e. Fabric
Bottom fabric =mensuits.bottom fabric or kurta.bottom fabric

Old JSON -
Fabric - Cotton → {sscat_id in {X,Y,Z,AA}, {fabric in cotton, top_fabric in cotton, krutafabric in cotton}
New JSON -
Fabric - Cotton → {sscat_id in (X,Y) and taxonomy.fabric in ‘cotton’) OR (Z.top_fabric =’cotton} OR AA.kurtafabric =cotton}


Implementation -
For SSCATs where there’s only one attribute for a particular label like



Men clothing search → Catalogs from all three SSCATs

Configs are for label values
Label value to label mapping tells the user which place to look for attributes









Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.