Amazon Kinesis Data Streams is a service that enables the processing and analysis of streaming data in real-time, providing a scalable and reliable platform for building custom applications tailored to specific needs.
Key Features
Real-Time Data Processing:
Enables real-time processing of streaming big data, allowing for rapid ingestion and continuous processing of data as it arrives.
Useful for applications requiring real-time analytics, accelerated log and data feed intake, and complex stream processing.
Scalability and Resilience:
Utilizes shards as the base throughput unit, with each shard providing a capacity of 1MB/sec data input and 2MB/sec data output.
Supports scaling by adjusting the number of shards in the stream to accommodate changes in data flow rates.
Shards can handle up to 1000 PUT records per second, allowing for high throughput.
Custom Applications:
Enables developers to build custom applications to process or analyze streaming data for specialized needs.
Applications can be deployed on EC2 instances to consume data from Kinesis Data Streams and perform real-time processing.
Integration with AWS Services:
Allows consumers to store processed data using various AWS services such as Amazon DynamoDB, Amazon Redshift, or Amazon S3 for further analysis or long-term storage.
Use Cases
Real-Time Analytics: Perform real-time analysis of streaming data to gain insights and take immediate actions based on the results.
Accelerated Log and Data Feeds: Ingest and process logs and data feeds in real-time for monitoring, troubleshooting, and alerting purposes.
Real-Time Metrics and Reporting: Monitor and analyze metrics and generate real-time reports for operational and business insights.
Complex Stream Processing: Implement complex stream processing logic to filter, transform, and aggregate streaming data for various applications.
Key Components
Producers:
Generate and push data records into Kinesis Data Streams.
Can use various methods such as the Kinesis Streams API, Kinesis Producer Library (KPL), or Kinesis Agent.
Shards:
Basic throughput units of a Kinesis data stream.
Handle data ingestion and processing, with each shard providing a specified capacity for data input and output.
Records:
Units of data stored in a Kinesis data stream.
Composed of a sequence number, partition key, and data blob representing the actual data payload.
Streams:
Composed of one or more shards, with the total capacity of the stream determined by the sum of the capacities of its shards.
Supports resharding operations to dynamically adjust the number of shards based on changing data flow requirements.
The diagram below illustrates the high-level architecture of Kinesis Data Streams.
Producers continually push data to Kinesis Data Streams.
Consumers process the data in real time.
Consumers can store their results using an AWS service such as Amazon DynamoDB, Amazon Redshift, or Amazon S3.
Kinesis Streams applications are consumers that run on EC2 instances.
Shards are uniquely identified groups or data records in a stream.
Records are the data units stored in a Kinesis Stream.
Integration and Resilience
Resharding: Supports resharding operations such as shard split and shard merge to adjust stream capacity dynamically.
Integration: Seamlessly integrates with various AWS services for storing, analyzing, and visualizing streaming data.
Resilience: Provides fault tolerance and durability by replicating data across multiple Availability Zones within a region.
Kinesis Data Streams supports resharding, which lets you adjust the number of shards in your stream to adapt to changes in the rate of data flow through the stream.
There are two types of resharding operations: shard split and shard merge.
In a shard split, you divide a single shard into two shards.
In a shard merge, you combine two shards into a single shard.
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (