Blogs

icon picker
Introducing Developer-less Data Workbench — Making business analysts, Masters of the data!

[Note - WorkBench was built by PEXAR team for Fynd’s ecommerce platform. This is where the early seeds of the current SaaS product were sown. The new product is bigger, better, and industry agnostic. WorkBench shall give you a fair idea about the approach, tech, and vision.]
image.png
Late in 2015,
started building the inventory platform, that would in future allow us to fetch inventory data from hundreds of brands & thousands of stock-points, in near-real time.
We were faced with a not so unique challenge that all companies with large disparate data sources come across, of data variety and velocity. This data came in from a variety of sources (CRMs, ERPs, APIs, Point-of-Sale systems) & formats, all powered by the connectors built in-house in Java, which were responsible for pulling in & parsing all this inventory data in one uniform schema and exporting it to a NoSQL database, MongoDB. We managed to achieve this in a fairly reliable way by the end of 2015.
Well, how did we really achieve all of this, is a story for another time. What followed next is a more interesting data problem & makes up for a better story.

Problem

Our dataset comes in from multiple brands & wasn't production worthy. It had duplicates, typos, extraneous information that required cleansing, had to be standardized, enriched to meet the downstream systems requirements.
The challenge at hand was Data Wrangling.
is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
We had all this data stored in MongoDB, ready to be processed and power the
Web & App platform. Data wrangling by itself is tedious, however the challenge was not simply doing it in real-time, but also doing it right, governed by the logic the business users are sole experts of.
image.png
Data Wrangling! Why?

Business Scenarios

Below are a few scenarios we went on to solve, that were pretty dynamic in nature and needed a solution that was simple to use & flexible to accommodate the domain rules.
Mapping
Map Raw Brand Sizes to Fynd’s Standard Size, based on multiple conditions.
Sizes XXXL , 3 XTRA LARGE, 3XL are to be mapped to XXXL
Cleansing
Remove duplicate line items, based on a grouping condition
Remove line items where in price < 20 OR quantity < 1
Transform Existing Attributes
Apply a discount of 10% for products of men category on weekdays & 20% on weekends, and also exclude the ones where in marked price < 1000
Make product code upper case OR Trim the decimals from the Price
Generate New Attributes
Calculate Discount Percentage using the Marked Price & Effective Price. (This could be any mathematical formula).
Generate Product Name by merging multiple attributes.

Approach

image.png
Cycle of Chaos (Change)!
These rules can be coded, but they change regularly. Moreover, each of this change should involve only business users, but involves 3 of these:
Business User
Developer
Server Downtime
Most of these rules come in from business users, who have a fairly good understanding of both the data and domain. These when passed on to the developers gives in a window for misinterpretation for these rules, and a potential bug and the cycle of change thus continues.

Design

With this simple realization, our aim was to design a platform
Simple enough to be used by business analysts across any domain
Provide ease of Data Access, Discovery, Sharing & Notifications
Instant Activation of business rules with zero production downtime
No developer involvement

Features

A Rule Engine to create business rules for Data Wrangling
Data Driven Alerts over Slack or Email
Allow users to explore the data in a tabular way without really knowing the underlying database, handling both the flat SQL & nested NoSQL data
Create REST APIs at runtime, by writing a query (in SQL or MongoDB) with support for parameters
REST API Scheduler

Solution

We created a suite of products, a Data Workbench which at a high level, looks like this.
image.png
Data Workbench : High Level Diagram
Here goes a brief summary of various products that comes in as part of the Data Workbench, all of which expose REST APIs.

Ocellus : Business Rules Engine

For Data Discovery & Wrangling
Its a rule engine built on top of MongoDB.
Business rules prepared by business users are converted in real-time to mongo queries and accordingly updates the underlying data, sends notifications on Slack or Email.
image.png
Ocellus : Workflow

Hydra : SQL / Mongo Query based API Engine

Pass in a database query (SQL or Mongo) to generate an API, with support for parameters.
Add in a Javascript code to trasform the parameters or query results, and save a version of the response to create a point-in-time snapshots (data versioning).
Hydra : Workflow

Mercury: Attribute Transformation Engine

Ocellus is limited by the support that MongoDB provides. Mercury was built to overcome this limitation.
It allows the end user to write a Javascript snippet that will be used to transform the attribute values.
Mercury : Workflow

Looper : REST API Scheduler

Its a REST API scheduler which allows you to schedule the execution of an API at pre-defined intervals, and borrows most of the niceties from Hydra (Data Versioning, Javascript based result transformation)
Ocellus : Workflow

Portus : JSON Schema Transformation

Portus is a service to transform an incoming JSON payload from one schema to another, by passing in a specification, at runtime.
Its a web service built on top of the open source library
.
Portus : Workflow

Conclusion

Each of these products were built specifically to solve a problem at hand, and is the most important learning for us. As a developer you have to be aware of the problems your business users are facing, and create a product roadmap around it. Also, be lazy, the aim with these products was to empower & make business users the owners of the data, and in the process it allows us, the developers, to move on to solve the next problem.
Sounds interesting? Join us if this looks promising to you -

Be the First to know about our launch 🚀
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.