Explore

Introducing Developer-less Data Workbench — Making business analysts, Masters of the data!

[Note - WorkBench was built by PEXAR team for Fynd’s ecommerce platform. This is where the early seeds of the current SaaS product were sown. The new product is bigger, better, and industry agnostic. WorkBench shall give you a fair idea about the approach, tech, and vision.]

⁠

Late in 2015,

we⁠

started building the inventory platform, that would in future allow us to fetch inventory data from hundreds of brands & thousands of stock-points, in near-real time.

We were faced with a not so unique challenge that all companies with large disparate data sources come across, of data variety and velocity. This data came in from a variety of sources (CRMs, ERPs, APIs, Point-of-Sale systems) & formats, all powered by the connectors built in-house in Java, which were responsible for pulling in & parsing all this inventory data in one uniform schema and exporting it to a NoSQL database, MongoDB. We managed to achieve this in a fairly reliable way by the end of 2015.

Well, how did we really achieve all of this, is a story for another time. What followed next is a more interesting data problem & makes up for a better story.

Problem

Our dataset comes in from multiple brands & wasn't production worthy. It had duplicates, typos, extraneous information that required cleansing, had to be standardized, enriched to meet the downstream systems requirements.

The challenge at hand was Data Wrangling.

⁠
Data wrangling⁠
is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

We had all this data stored in MongoDB, ready to be processed and power the

Fynd⁠

Web & App platform. Data wrangling by itself is tedious, however the challenge was not simply doing it in real-time, but also doing it right, governed by the logic the business users are sole experts of.

⁠

Data Wrangling! Why?

Business Scenarios

Below are a few scenarios we went on to solve, that were pretty dynamic in nature and needed a solution that was simple to use & flexible to accommodate the domain rules.

Mapping

Map Raw Brand Sizes to Fynd’s Standard Size, based on multiple conditions.

Sizes XXXL , 3 XTRA LARGE, 3XL are to be mapped to XXXL

Cleansing

Remove duplicate line items, based on a grouping condition

Remove line items where in price < 20 OR quantity < 1

Transform Existing Attributes

Apply a discount of 10% for products of men category on weekdays & 20% on weekends, and also exclude the ones where in marked price < 1000

Make product code upper case OR Trim the decimals from the Price

Generate New Attributes

Calculate Discount Percentage using the Marked Price & Effective Price. (This could be any mathematical formula).

Generate Product Name by merging multiple attributes.

Approach

⁠

Cycle of Chaos (Change)!

These rules can be coded, but they change regularly. Moreover, each of this change should involve only business users, but involves 3 of these:

Business User

Developer

Server Downtime

Most of these rules come in from business users, who have a fairly good understanding of both the data and domain. These when passed on to the developers gives in a window for misinterpretation for these rules, and a potential bug and the cycle of change thus continues.

Design

With this simple realization, our aim was to design a platform

Simple enough to be used by business analysts across any domain

Provide ease of Data Access, Discovery, Sharing & Notifications

Instant Activation of business rules with zero production downtime

No developer involvement

Features

A Rule Engine to create business rules for Data Wrangling

Data Driven Alerts over Slack or Email

Allow users to explore the data in a tabular way without really knowing the underlying database, handling both the flat SQL & nested NoSQL data

Create REST APIs at runtime, by writing a query (in SQL or MongoDB) with support for parameters

REST API Scheduler

Solution

We created a suite of products, a Data Workbench which at a high level, looks like this.

⁠

Data Workbench : High Level Diagram

Here goes a brief summary of various products that comes in as part of the Data Workbench, all of which expose REST APIs.

Ocellus : Business Rules Engine

For Data Discovery & Wrangling

Its a rule engine built on top of MongoDB.

Business rules prepared by business users are converted in real-time to mongo queries and accordingly updates the underlying data, sends notifications on Slack or Email.

⁠

Ocellus : Workflow

Hydra : SQL / Mongo Query based API Engine

Pass in a database query (SQL or Mongo) to generate an API, with support for parameters.

Add in a Javascript code to trasform the parameters or query results, and save a version of the response to create a point-in-time snapshots (data versioning).

⁠

Hydra : Workflow

Mercury: Attribute Transformation Engine

Ocellus is limited by the support that MongoDB provides. Mercury was built to overcome this limitation.

It allows the end user to write a Javascript snippet that will be used to transform the attribute values.

⁠

Mercury : Workflow

Looper : REST API Scheduler

Its a REST API scheduler which allows you to schedule the execution of an API at pre-defined intervals, and borrows most of the niceties from Hydra (Data Versioning, Javascript based result transformation)

⁠

Ocellus : Workflow

Portus : JSON Schema Transformation

Portus is a service to transform an incoming JSON payload from one schema to another, by passing in a specification, at runtime.

Its a web service built on top of the open source library

JOLT⁠

⁠

Portus : Workflow

Conclusion

Each of these products were built specifically to solve a problem at hand, and is the most important learning for us. As a developer you have to be aware of the problems your business users are facing, and create a product roadmap around it. Also, be lazy, the aim with these products was to empower & make business users the owners of the data, and in the process it allows us, the developers, to move on to solve the next problem.

Sounds interesting? Join us if this looks promising to you -

⁠

Click to Apply Now

⁠