Fixing Duplicate Listing for a Real Estate Website
Housing.com is a Mumbai-based real estate search portal which allows customers to search for housing based on geography, number of rooms, and various other filters. The company has over 3.2 Million rent and buy properties serves 40 cities in India including Chennai, Mumbai, Bengaluru, Kolkata and Delhi.
collect properties everyday through its infrastructure of data collectors. The property, collected on field by a data collector is then sent to a back office for review before it goes live. The main positioning of
tried an algorithm on data input values as a primary algorithm to detect duplicates for example looking at lat-long and address they identified potential duplicates and sent them to backend team for manual check.
But that did not solved the problem and not a fool proof solution, when Vishal joined the team in 2014, he was given the first task to resolve the same.
On digging further it was found how the backend team was comparing the flats to actually identify whether they were duplicate or not and it was through comparing images of the flats.
The real insight came when we found that kitchen images of an house is unique to its recipient, it leaves a blue-print of the occupants of that house. Even in flats 2 separate floors may have the same kitchen structure but the way in which it is setup is different and not same. Similar to our finger-prints the kitchen was actually leaving the finger-print of the occupants of the house.
This insight led us to test an image matching algorithm to compare kitchen images.
When a new property gets uploaded on to the servers, the kitchen image of this property then compares that to the near-by properties in a certain mile radius.
After comparing kitchen images the probable duplicates go-through a final manual check and duplicate properties are removed
Once the validation was achieved that we were able to remove the duplicate properties through this algorithm the tech team at Housing created APIs to increase the performance of the algorithm and included the same in data collector app to stop the collection happening right in the field.