Interfacing between PVR and backend services of Jello

The problem

The PVR dataset is currently being consumed by the Accounting respectively for:
Performing analytics using the dataset
Generate a “mec” dataset and tag specific PVR partitions as “mec”.

The dataset is also used (will be used) by Jello for:
Generating residual packages from specific PVR partitions after application of adjustments and advances to the PVR data for TOWGE markets.
Surface residual information in the external UI

The challenge is that the PVR data is retained for 90 days only, while the MEC data is retained for an infinite period of time (10 years). The PVR partition used to generate residual packages can be different from the MEC partition and needs to be retained for an infinite period of time once the package is published/retracted.
The current state of the system means that the PVR partition used to publish packages (last one being 2021-02-08) will expire in 90 days resulting in cleaning up of the source of truth.
To add to this, there will be support for generating net overclaim reports in parallel to PVR dataset, which is planned to be exposed for access and download via .
Hence, we need a consistent storage and retrieval strategy that could be interfaced by Accounting and Jello to create views and visualizations on top of the generated datasets. The workflows ideally should be tied to user’s actions on the internal residual admin portal as well.
Proposed Solution 1 - Create a new partition analogous to “mec” for publishing PVR data pertaining to published residual packages.

As a part of this solution, Jello’s backend components will write the published residual package related partition information to a new table similar to the mec_dates table. The job responsible for creating the “mec” datasets with infinite retention will create a new dataset pertaining to the PVR data of published packages.
The “mec” and PVR datasets corresponding to published residual packages will be in one place for the accountants to consume.
Bidirectional dependency between Jello and TPA having to interface via the new table to track the partitions that have been published.
Proposed Solution 2 - Create a copy of the data within Jello for the published residual packages

In this proposed approach, the residual package generator pulls and stores all the data needed to generate the residual packages in postgres, which is removed when the packages are deleted. However they will be stored permanently when the packages are published.
Avoids duplication of data into a new BigQuery dataset just for the sake of retaining it.
No bidirectional dependencies created between Jello and TPA. Jello will remain one of the consumers of the PVR dataset as before.

For Accountants to consume the PVR data published via residual packages after 90 days in Analytical Tools like Tableau, might either have to connect to the read replica of postgres, or will have to be copied back to BigQuery.
The “mec” and the published package data will be decoupled and will be hard to aggregate and analyze.
Track level over claims will have to be implemented using a similar architecture. The reports will have to be copied to a different location for infinite retention.
Proposed Solution 3 - Interface between Jello and TPA to support on publish workflow

This approach proposes the creation of “interfaces” which are a backend service on the Jello side and data workflow(s) on TPA side. On package publish, the backend service on Jello side will trigger the data workflow(s) to:
Create copies of datasets corresponding to published packages with infinite retention period.
Create a copy of the net over-claims reports with infinite retention period.
Possibly even being able to control the generation and downloading of overclaim reports based on certain filters (specific countries, products, impact) from the External/Internal UI.

The workflows will be triggered based off user actions.

A bidirectional dependency will be created between the two squads. But hopefully this can be a bit loose, like interfacing via pubsub.

Here’s what it could be like in a diagram:
Not a lot of thought has been put into how the track level over claims will be surfaced to the customers.
Not a lot of thought has been put into how the track level over claims will be surfaced to the customers.
Will it be from Spotify for Licensors?
Will it be via FTP?
Do we surface all of the information?
Do we surface a portion of the reports?

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.