Skip to content
Gallery
Streaming in Prefect: contributor HQ
Share
Explore

icon picker
Design

Goals

Allow Prefect to act a a streaming event consumer
Integrate out-of-the-box AT LEAST with Apache Kafka, but support arbitrary events
Provide introspection to individual events without too much hassle
Not break Cloud (under the assumption people are more likely to send more API hits, though technically now they can send a pretty arbitrary amount esp with mapping)

What is actually ‘listening’ for the next event?

is it a flow?
is it a separate process (ex “the EventClock” from PIN 14?)

Idea 1: it is a flow

The FlowRunner for a streaming type flow loops in task generation forever. The “endless flow” model.
today, once a flowrunner is instantiated, it loops one time over the tasks in a flow to submit TaskRunners for them, and then exits when it collects the states for them all
instead:
The flow somehow knows it is of ‘streaming type’ and behaves differently:
a while loop in the middle of that that calls ""some user defined thing"" that produces the events/holds the event cursor
instead of `for task in tasks` and run it, instead do for something more like `event in event generator`, THEN task in tasks
For ‘streaming type’ flows, how can we guarantee there will always be a flow run of this run running? (Lazarus?)
Where is the cursor stored? Where do we store the state of the cursor in case the flow goes down?
Can we triangulate tasks with an event id, if we do that in a way that syncs with the event input, for introspection?
How does this look in the UI?
If an event occurs in the world and no prefect worker is there to hear it, what happens?

Idea 2:

proposal:
PIN 14, but we will microbatch with a config sliding window, add a delay to not wait for a batch
essentially now we have a stateful actor because they are making the decision of when to make a batch (so where is this being stored, and what if it dies?)
what if listen just picks backed up on missed events
related: kubernetes controllers that you check what you've missed in the downtime before moving to new things
list() and a watch() API
on startup you list them, somehow you see what you already processed
then look for changes to the bucket concurrently

Idea 3:

something about streamz
potential research route: can we get Prefect into streamz? with a really slim flow
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.