Dataset: GA4 clickstream
Scope: D2C India website (web funnel analytics)
Expected effort: ~6-8 hours
Deadline: 18th Jan
Submission method: Link-only (no attachments)
Background
You’ll work with GA4 clickstream data captured on our India D2C website. The dataset includes event-level touchpoints from session start through purchase. Your goal is to (1) build a clean, reusable funnel model and (2) produce an MBR-style narrative: what happened, why it happened, and what to do next.
1) Dataset structure (what you’re getting)
Files
A ZIP containing a folder of parquet files containing event-level data of one month captured by GA4 on website.
Download Base event columns (high level)
user_id (GA4 pseudo user identifier) session_id (session identifier) page_type (categorized page type, if available) event_params (GA4-style nested list of key/value pairs) event_params (nested GA4 params)
event_params is a list/array of key/value objects (not a flat dict). You will typically need to flatten/unnest it into columns to use it.
Common keys you may see (not exhaustive):
Attribution: source, medium, campaign, term, gclid (often on session_start) Commerce: transaction_id, value, currency, payment_type, coupon, shipping, tax, discount (often on purchase) Funnel events (relevant set)
Session start
Product view (choose definition)
Add to cart
Checkout start
gokwik_checkout_initiated Checkout progression
Purchase
Not every journey is linear; events can be missing or out of order. Handling this sensibly (and calling out QA findings) is part of the assignment.
2) Your task
Part A — Build a reusable funnel model (session-level)
Create a session-level table/dataset session_funnel across the full month.
Required fields
Identifiers
Dimensions
source, medium, campaign (from session_start params; null/unknown allowed) landing_page (first page_location in the session) Funnel steps
For each step, include:
first timestamp (optional but encouraged) Steps (use OR logic where there are variants):
product view (view_item OR view_product_page_loaded) add to cart (add_to_cart OR add_to_cart_custom_event) begin checkout (begin_checkout OR gokwik_checkout_initiated) Purchase outputs
orders (distinct transaction_id preferred; state your approach) revenue (sum of purchase value, with deduping logic as needed) Data QA (must include)
A short QA section with checks like:
duplicate purchases / duplicate transaction_id sessions with purchase but no checkout events null spikes in source/medium any other anomalies you notice Part B — MBR-style analysis
Write a concise memo + supporting analysis.
Required analyses
product view rate, add-to-cart rate, checkout-start rate purchase CVR (session → purchase) overall step conversion rates break down by at least two cuts, e.g.: landing page group / page_type Driver diagnosis (surgical)
Compare: First half vs second half or Explain what moved and why:
traffic mix shifts (source/medium/device/market) step conversion changes (view→ATC, ATC→checkout, checkout→purchase)
3 recommendations
Prioritized actions (product/growth/tracking), with:
expected impact (direction + where it helps) how you’d validate (experiment, tracking check, next analysis) Output format
Supporting: 3–5 charts/tables (can be in notebook) 3) Submission instructions (Link-only)
Please submit a single link to a folder that contains your materials.
Allowed: Google Drive / OneDrive / Dropbox / Notion / GitHub repo (public or private with access granted)
Not allowed: email attachments
Folder contents (required)
README (how to run + assumptions + QA checklist) Notebook (.ipynb) or SQL scripts + run instructions Outputs (session_funnel as csv) Naming convention (important)
Name the folder:
KeralaAyurveda_ProductAnalyst_<YourName>
Link permissions
Make sure the link is accessible (viewer access at minimum) If it’s a private repo/folder, grant access to: nishant@keralaayurveda.biz What to include in your email response (copy/paste template)
Subject: Product Analyst Assignment Submission —
Body:
Any assumptions / known limitations (3–5 bullets) 4) Evaluation criteria
Accuracy & definitions (30%) Driver identification (25%) Business usefulness (20%) Pragmatism in chaos (10%)