[Note - WorkBench was built by PEXAR team for Fynd’s ecommerce platform. This is where the early seeds of the current SaaS product were sown. The new product is bigger, better, and industry agnostic. WorkBench shall give you a fair idea about the approach, tech, and vision.]
started building the inventory platform, that would in future allow us to fetch inventory data from hundreds of brands & thousands of stock-points, in near-real time.
We were faced with a not so unique challenge that all companies with large disparate data sources come across, of data variety and velocity. This data came in from a variety of sources (CRMs, ERPs, APIs, Point-of-Sale systems) & formats, all powered by the connectors built in-house in Java, which were responsible for pulling in & parsing all this inventory data in one uniform schema and exporting it to a NoSQL database, MongoDB. We managed to achieve this in a fairly reliable way by the end of 2015.
Well, how did we really achieve all of this, is a story for another time. What followed next is a more interesting data problem & makes up for a better story.
Our dataset comes in from multiple brands & wasn't production worthy. It had duplicates, typos, extraneous information that required cleansing, had to be standardized, enriched to meet the downstream systems requirements.
is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
We had all this data stored in MongoDB, ready to be processed and power the
Web & App platform. Data wrangling by itself is tedious, however the challenge was not simply doing it in real-time, but also doing it right, governed by the logic the business users are sole experts of.
Data Wrangling! Why?
Below are a few scenarios we went on to solve, that were pretty dynamic in nature and needed a solution that was simple to use & flexible to accommodate the domain rules.
Map Raw Brand Sizes to Fynd’s Standard Size, based on multiple conditions.
Sizes XXXL , 3 XTRA LARGE, 3XL are to be mapped to XXXL
Remove duplicate line items, based on a grouping condition
Remove line items where in price < 20 OR quantity < 1
Transform Existing Attributes
Apply a discount of 10% for products of men category on weekdays & 20% on weekends, and also exclude the ones where in marked price < 1000
Make product code upper case OR Trim the decimals from the Price
Generate New Attributes
Calculate Discount Percentage using the Marked Price & Effective Price. (This could be any mathematical formula).
Generate Product Name by merging multiple attributes.
Cycle of Chaos (Change)!
These rules can be coded, but they change regularly. Moreover, each of this change should involve only business users, but involves 3 of these:
Most of these rules come in from business users, who have a fairly good understanding of both the data and domain. These when passed on to the developers gives in a window for misinterpretation for these rules, and a potential bug and the cycle of change thus continues.
With this simple realization, our aim was to design a platform
Simple enough to be used by business analysts across any domain
Provide ease of Data Access, Discovery, Sharing & Notifications
Instant Activation of business rules with zero production downtime
No developer involvement
A Rule Engine to create business rules for Data Wrangling
Data Driven Alerts over Slack or Email
Allow users to explore the data in a tabular way without really knowing the underlying database, handling both the flat SQL & nested NoSQL data
Create REST APIs at runtime, by writing a query (in SQL or MongoDB) with support for parameters
REST API Scheduler
We created a suite of products, a Data Workbench which at a high level, looks like this.
Data Workbench : High Level Diagram
Here goes a brief summary of various products that comes in as part of the Data Workbench, all of which expose REST APIs.
Ocellus : Business Rules Engine
For Data Discovery & Wrangling
Its a rule engine built on top of MongoDB.
Business rules prepared by business users are converted in real-time to mongo queries and accordingly updates the underlying data, sends notifications on Slack or Email.
Ocellus : Workflow
Hydra : SQL / Mongo Query based API Engine
Pass in a database query (SQL or Mongo) to generate an API, with support for parameters.
Hydra : Workflow
Mercury: Attribute Transformation Engine
Ocellus is limited by the support that MongoDB provides. Mercury was built to overcome this limitation.
Mercury : Workflow
Looper : REST API Scheduler
Ocellus : Workflow
Portus : JSON Schema Transformation
Portus is a service to transform an incoming JSON payload from one schema to another, by passing in a specification, at runtime.
Its a web service built on top of the open source library
Each of these products were built specifically to solve a problem at hand, and is the most important learning for us. As a developer you have to be aware of the problems your business users are facing, and create a product roadmap around it. Also, be lazy, the aim with these products was to empower & make business users the owners of the data, and in the process it allows us, the developers, to move on to solve the next problem.
Sounds interesting? Join us if this looks promising to you -