Amazon Kinesis

icon picker
Kinesis Client Library (KCL)

The Kinesis Client Library (KCL) is a Java library designed to facilitate the reading of records from a Kinesis Stream within distributed applications, allowing for the sharing of the read workload among multiple consumers.
image.png
image.png

Purpose and Functionality

Abstraction Layer:
Unlike the Kinesis Data Streams API available in AWS SDKs, the KCL serves as an abstraction layer specifically tailored for processing data in a consumer role.
Record Processing Management:
Handles tasks such as connecting to the stream, enumerating shards, and coordinating shard associations with other workers.
Instantiates a record processor for each shard it manages, facilitating the processing of data records.
Pulls data records from the stream and forwards them to the corresponding record processors for further handling.
Checkpointing:
Tracks the progress of processed records and checkpoints their status into DynamoDB, ensuring that each record is processed only once.
Supports reliable record processing and fault tolerance by maintaining the processing state.
Dynamic Shard-Worker Association:
Balances shard-worker associations dynamically, adjusting the distribution of workload among multiple consumers.
Ensures that each shard is processed by exactly one KCL worker and has a corresponding record processor.

Scaling and Load Balancing

Number of Consumers and Instances:
The KCL manages the number of record processors relative to the number of shards and consumers.
With a single consumer, all record processors are created within a single instance.
Each shard is processed by one KCL worker and one corresponding record processor, ensuring efficient resource utilization.
Scaling Out Consumers:
The number of instances should generally not exceed the number of shards, except for failure or standby purposes.
Each shard can be read by only one KCL instance, but one worker can process multiple shards.

Deployment Options

Supported Environments:
Can be deployed on Amazon EC2, Elastic Beanstalk, and on-premises servers, offering flexibility in deployment options.

Record Processing

Ordering and Consistency:
Records are read in order at the shard level, ensuring consistency and reliability in data processing.

Example

For example, if there are four shards in a stream, a maximum of four KCL instances can be created to process the data.
Similarly, if there are six shards, a maximum of six KCL instances can be utilized for processing.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.