Fixing Duplicate Listing for a Real Estate Website

Company is a Mumbai-based real estate search portal which allows customers to search for housing based on geography, number of rooms, and various other filters. The company has over 3.2 Million rent and buy properties serves 40 cities in India including Chennai, Mumbai, Bengaluru, Kolkata and Delhi.


collect properties everyday through its infrastructure of data collectors. The property, collected on field by a data collector is then sent to a back office for review before it goes live. The main positioning of was the amount of data it collects to give information to their users.
One of the main challenge that the team has to ensure is not to make a duplicate property go live as it affects the trust and user experience of potential buyer.
was looking for a way in 2014 to fix their duplicate listing problem before they scale up the collection of property from 1000 per day to 10,0000 per day.

Breakthrough Solution

tried an algorithm on data input values as a primary algorithm to detect duplicates for example looking at lat-long and address they identified potential duplicates and sent them to backend team for manual check.
But that did not solved the problem and not a fool proof solution, when Vishal joined the team in 2014, he was given the first task to resolve the same.
On digging further it was found how the backend team was comparing the flats to actually identify whether they were duplicate or not and it was through comparing images of the flats.
The real insight came when we found that kitchen images of an house is unique to its recipient, it leaves a blue-print of the occupants of that house. Even in flats 2 separate floors may have the same kitchen structure but the way in which it is setup is different and not same. Similar to our finger-prints the kitchen was actually leaving the finger-print of the occupants of the house.
This insight led us to test an image matching algorithm to compare kitchen images.
When a new property gets uploaded on to the servers, the kitchen image of this property then compares that to the near-by properties in a certain mile radius.
After comparing kitchen images the probable duplicates go-through a final manual check and duplicate properties are removed
Once the validation was achieved that we were able to remove the duplicate properties through this algorithm the tech team at Housing created APIs to increase the performance of the algorithm and included the same in data collector app to stop the collection happening right in the field.


This is how is keeping the 3.2 Million Unique properties intact. Every day this algorithm ensures that users get unique properties and increases the trust among the users.
After the algorithm was implemented, was able to quickly remove duplicate properties and is now part of the core operations
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.