Skip to content

Real time DS image service


Objective - Creating a real time API which -
Runs DS image service which classifies a product as brand or illegal basis -
OCR match
Image match
Image similarity match
Logo detection match

Why now?
Real time gating based on image matches would ensure current flagging latencies to come down from ~5 hours to more real time which would help reduce ~0.01% platform views going to non compliant listings on a daily basis.
~30% of all DS counterfeit / illegal / infringement classification happens from images. These image models help us flag a product as infringing primarily in cases keywords don’t mention the brand or illegal class name.

Current architecture -

DS overall flow architecture -
Screenshot 2023-07-05 at 11.34.02 AM.png
Catalog deactivation service architecture flow - TBU

DS jobs run at a frequency of ~4 hours for new inflow / edited listings
These jobs take in front / primary image per product and maps in brand name or illegal listings class from -
Image OCR - Optical character recognition to identify the keywords present in image and match them against existing Suraksha list keywords (Primary + Alternative)
Image match - Product's image similarity matched against an updated repo or brand images curated from historical flaggings / competiton scrape / Mall Listings
Brand logo detection - Currently trained for ~50 brands, This is used to detect brand logos from images
Caption generation for images - Generating caption basis image based on object and keyword detected from product's images
Nearest Neighbour search - Currently live for a handful of brands, it's a similar function to Image match but it's a different algo tweaked to give better recall for high false negative brands. Has scaling issues with extending to all the brands as they may give higher false positive rates but we can scale to select brands on demand easily
The identified PIDs are then dumped in silver.brand_infringement or silver.profanity_filter
Tech deactivation CRON picks these in batches of 4 hours and sends for proactive deactivation
CnT ops team then QCs these and reactivates false positives (Current image model accuracy ~30%)
To start off with, the v1 will just include moving catalog deactivation service in above flow real time

Proposed solve -
Setup of a real time API
The real time API would run above DS image jobs based on front image per PID
DS API would filter in associated brand name + Confidence score for brand
Confidence score methodology - TBD
Dynamic request handling - CI/CD build which supports DS related changes with minimal changes on API itself
Cases where model flow changes - OCR after brand logo
Or model considerations change - Logo detection checking for top right part of the image
API would input image link and output would comprise of -
Brand or illegal class detected
Seller has authorisation or not
Confidence score or threshold of the prediction

API contracts to be finalised between tech and DS

Cost estimate for endpoints: Cost = Latency * QPS * container_size/4Gb * 24 * DBU *(1x for cpu, 6x for GPU) KNN(CPU) = 0.7*25 * 1 * 24 * 0.088 * 1 = $37/day
Logo(CPU) = 2.7 * 25 * 1 * 24 * 0.088 * 1 = $142/day OCR (GPU) = 0.3 * 25 * 1 * 24 * 0.088 * 6 = $95/day
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.