Share
Explore

Bladerunner Kafka Migration


megaphone
HM must migrate its Bladerunner Kafka cluster to AWS MSK due to the end of support for its current Kubernetes infrastructure and an expiring SSL certificate. The migration is complex due to custom Domain Name support not yet available on AWS MSK, requiring significant changes to the Invicta Agent, DNS, firewall, and NAT settings. An interim solution involves setting up a new HM-managed Kafka cluster on Kubernetes, synchronizing data using Kafka MirrorMaker, and redirecting traffic. The final migration to AWS MSK will occur within a year, contingent on AWS's support for custom DNs. This migration process involves deploying a new Kafka cluster, using Kafka MirrorMaker for data synchronization, updating security groups, and redirecting NLB traffic to the new brokers, ensuring data consistency and performance before decommissioning the old brokers. Post-migration, data sources in receive,search,store must be validated to ensure correct data ingestion.

Requirement :

HM managed Kafka migration to AWS managed MSK.

Migration Plan: Transitioning Kafka from HM-Managed to AWS MSK

Currently, we have a Kafka cluster managed by HM that we intended to migrate to AWS Managed Streaming for Apache Kafka (MSK). However, due to certain limitations with MSK and time constraints, this migration cannot proceed as planned. Our immediate priority is to upgrade the Kubernetes infrastructure for our Kafka cluster because it will no longer be supported by AWS after July 15, 2024. Additionally, the SSL certificate for our custom Domain Name (DN) will expire on August 7, 2024.

Key Challenges

Custom Domain Name (DN):
We use a custom DN for our Kafka cluster.
Currently, AWS MSK does not officially support custom DNs, and there is no official documentation available to guide this configuration.
According to AWS support and our account manager at HM, support for custom DNs is on AWS's roadmap, with official documentation expected to be released soon.
Migration Complexity:
Migrating to AWS MSK involves significant changes on both HM's and Bladerunner's ends.
The changes include:
Invicta Agent: Modifications to the custom Filebeat/Winlogbeat agents.
DNS Configuration: Updates to DNS settings managed by Bladerunner.
Firewall Rules: Adjustments to firewall settings to accommodate the new setup.
NAT Settings: Changes to NAT configurations to ensure seamless network traffic flow.

Interim Plan

Given the constraints and the impending deadlines, we have decided to deploy a new Kafka cluster managed by HM. This will serve as an interim solution until AWS MSK is fully equipped to support our custom DN configuration.

Steps for Interim Migration:

Deploy New Kafka Cluster:
Set up a new Kafka cluster managed by HM with upgraded infrastructure on Kubernetes.
Data Migration:
Use Kafka MirrorMaker to synchronize data between the existing and new Kafka clusters.
Ensure all topics, partitions, and consumer groups are replicated accurately.
Traffic Redirection:
Gradually redirect traffic from the old Kafka cluster to the new one by updating security groups and NLB target groups.
Monitor and Validate:
Continuously monitor the new cluster to ensure stability and performance.
Validate that all producers and consumers are functioning correctly with the new Kafka setup.

Long-Term Plan

With the SSL certificate set to expire in June 2025, we have a one-year window to complete the migration to AWS MSK. This timeframe allows us to:
Monitor AWS MSK's Support for Custom DNs:
Keep track of AWS's progress in supporting custom DNs.
Review and adopt the forthcoming documentation once released.
Prepare for Final Migration:
Plan and execute the necessary changes for migrating to MSK, including:
Updating Invicta agent configurations.
Modifying DNS settings as per Bladerunner's requirements.
Adjusting firewall and NAT configurations to support the new MSK environment.
Perform Final Migration to AWS MSK:
Once AWS MSK is fully ready, carry out the migration with minimal disruption to operations.
Ensure that all systems and applications are aligned with the new setup on MSK.

Kafka Migration to Kubernetes Using Kafka MirrorMaker and Existing NLB Configuration

Migrating a Kafka cluster to Kubernetes involves ensuring seamless data transfer and minimal disruption to ongoing operations. This article outlines a comprehensive approach to migrate bladerunner Kafka cluster to Kubernetes by leveraging Kafka MirrorMaker for data synchronization and using the existing Network Load Balancer (NLB) configuration for a smooth transition. We’ll use the same listener and target groups while managing traffic redirection through security group updates.

Overview of the Migration Process

The migration process includes deploying a new Kafka cluster on Kubernetes, using Kafka MirrorMaker to synchronize data between the old and new clusters, and gradually shifting traffic to the new brokers. By retaining the existing NLB configuration and strategically managing security groups, we ensure minimal downtime and secure data handling.

Key Components Involved:

Kafka MirrorMaker: Used for real-time data replication from the old Kafka cluster to the new one.
Network Load Balancer (NLB): Manages and directs incoming traffic to Kafka brokers.
Listeners and Target Groups: Existing NLB configurations used for traffic routing.
Security Groups: Control access and secure communication between brokers and clients.

Step-by-Step Migration Process

1. Deploy the New Kafka Cluster on Kubernetes

Setup: Deploy the new Kafka brokers on Kubernetes. Ensure they are configured to handle the same topics and partitions as the old cluster.
Networking: Configure network settings to allow the new Kafka brokers to communicate with each other and with external clients.

2. Configure Kafka MirrorMaker

Deploy MirrorMaker: Set up Kafka MirrorMaker to run. This tool will mirror data from the old Kafka cluster to the new one on Kubernetes.
Mirror Configuration:
Source Cluster: Point to the old Kafka cluster.
Destination Cluster: Point to the new Kafka cluster.
Topics: Specify which topics to replicate. We can choose to replicate all topics or a subset.
bash
Copy code
kafka-mirror-maker.sh --consumer.config consumer.properties \
--producer.config producer.properties \
--whitelist=".*"

Run MirrorMaker: Start the Kafka MirrorMaker process to begin replicating data. Monitor the replication to ensure data consistency and correctness.

3. Monitor Data Synchronization

Data Consistency: Ensure that Kafka MirrorMaker is replicating data correctly by comparing topics, partitions, and offsets between the old and new clusters.
Performance: Check the performance of the new Kafka brokers to ensure they can handle the incoming load.

4. Update Security Groups

Identify Security Groups: Identify the security group currently applied to the old Kafka brokers. This group allows traffic from specific CIDRs and the NLB IP addresses.
Reassign Security Groups:
Remove: Detach the security group from the old Kafka brokers.
Add: Attach the same security group to the new Kafka brokers on Kubernetes.
This reallocation ensures that the new Kafka brokers are recognized as healthy by the NLB, while the old brokers are marked as unhealthy due to the absence of the security group.

5. Update Target Groups in the NLB

Add New Brokers: Add the IP addresses of the new Kafka brokers to the existing target groups in the NLB. This step integrates the new brokers into the current traffic flow managed by the NLB.
Health Checks: Verify that the NLB health checks pass for the new Kafka brokers, confirming they are ready to handle traffic.
Traffic Redirection: As the old Kafka brokers are marked unhealthy due to the security group removal, the NLB will automatically redirect traffic to the new brokers.

6. Validate New Cluster Operation

Monitor Traffic Flow: Ensure that the NLB is routing traffic to the new Kafka brokers and that producers and consumers are operating smoothly with the new cluster.
Functional Testing: Conduct comprehensive testing to confirm that all Kafka functionalities are working as expected in the new environment.

7. Complete the Migration

Decommission Old Brokers: Once the new Kafka brokers are stable and handling all traffic, remove the old brokers from the target groups and decommission them.
Finalize Security Configurations: Update or remove any security configurations related to the old Kafka brokers as necessary.

Diagram: Kafka Migration to Kubernetes Using Kafka MirrorMaker and Existing NLB Configuration

Below is a visual representation of the migration process:
plaintext
Copy code
+--------------------------------------------------+
| Producers |
+--------------------------------------------------+
| |
| |
v v
+--------------------------------------------------+
| Network Load Balancer (NLB) |
+--------------------------------------------------+
|
|
v
+-------------------------------------------+
| Listener/s |
+-------------------------------------------+
|
|
v
+------------------------+
| Target Group/s |
| (Existing and New |
| Kafka Brokers) |
+------------------------+
/ | \ / | \
/ | \ / | \
+------+ +------+ +------+ +------+ +------+ +------+
|Broker1| |Broker2| |Broker3| |Broker4| |Broker5| |Broker6|
+------+ +------+ +------+ +------+ +------+ +------+
^ ^ ^ ^ ^ ^
| | | | | |
| | | | | |
+-------------------------------------------------------------+
| Security Groups on Brokers |
| - Old Brokers lose security group (marked unhealthy) |
| - New Brokers gain security group (marked healthy) |
+-------------------------------------------------------------+
^ ^
| |
| |
+-------------------------------------------------------------+
| Kafka MirrorMaker |
| - Replicates topics, partitions, and metadata |
| - Synchronizes data between old and new brokers |
+-------------------------------------------------------------+

Kafka Migration Diagram.jpeg
Kafka Migration Diagram.pdf
72.4 kB

Explanation of the Diagram:

Producers and Consumers: Applications generating and consuming data connect to Kafka via the NLB.
Network Load Balancer (NLB): Manages traffic and routes it to Kafka brokers through a single listener.
Listener/s: A listener/listeners directs traffic to brokers in their respective target groups.
Target Group: Contains both old and new Kafka brokers during the migration.
Kafka Brokers:
Old Brokers: The current brokers handling traffic.
New Brokers: Deployed on Kubernetes, ready to take over traffic.
Security Groups:
Old brokers lose their security group, making them unhealthy in the NLB.
New brokers gain the security group, marking them healthy and eligible to handle traffic.
Kafka MirrorMaker: Facilitates real-time data replication between old and new brokers.

Validate Data sources in elastic/ S3.

After migrating the Kafka cluster, validate that all data sources are correctly ingesting data into ElasticSearch and S3 by checking index health, file counts, and timestamps. Monitor Kafka performance by observing broker health, topic activity, and consumer lag.


Technical Steps for Executing the Migration :


Setup Kubernetes Cluster and node groups
Create Kubernetes cluster for new kafka cluster and zookeeper instances ​
image.png
Create k8’s node groups for zookeeper and kafka with same specification as in previous cluster ​
image.png
Access nodes from CLI ​
image.png
Deploy Zookeeper and Kafka clusters
Deploy these k8’s manifest files of zookeeper and kafka as given in below screenshot: ​
image.png
Deploy Zookeeper first and then Kafka and ensure these are running :
image.png
Check logs of zookeeper and kafka to ensure these are running fine.
Note down the IP’s of all brokers with respect to their ID’s ​
image.png
Setup Kafka Mirror to sync from Old to new kafka on port 9093.
Setup JVM settings
image.png
Producer config ( produce to new kafka cluster ). Ensure 8093 is reachable from this node where you run kafkamirror job. Also, ensure 8093 is defined on the advertised listener on the new cluster. ​
image.png
Consumer Config ( consume from prod kafka ) ​
image.png
Start the kakfa mirror for all topics. Initiate it in background as it would take some time to sync the data and metadata ​
image.png
Kafka Mirror for a specific topic : ​
image.png
Add these IP’s in their target groups of the Network load balancer:
Identify the port for which New IP’s need to be defined :
image.png
Add all the IP’s in the bootstrap target group ​
image.png
Add broker IP’s to their respective target groups example kafka-0 IP goes to target group and likewise for other broker.
Repeat the above steps for all the ports advertised by Kafka configuration

The status of the registered targets will changes upon removal of Security groups assigned to EC2 instances of these brokers.
As shown in below snippet, the highlighted instances are the old kafka brokers in which security groups have been removed and the other ones are newly created where these SG’s are assigned to which makes it healthier as shown in above snippet :
image.png
Connect to one of the broker to check the topic details and volume usage increase : ​
image.png
image.png
image.png
Verify the topics are receiving the new data from endpoints, specify latest epoch time to check the latest data ingesting into the topic ​
image.png
Below screenshot shows latest data is ingesting.
image.png
Same needs to be validated from all other sources/topics using the above command and then ensure this data reaches properly to Elastic / Humio / S3.

Conclusion


The decision to temporarily migrate to a new HM-managed Kafka cluster allows us to maintain operational stability while preparing for a future transition to AWS MSK. This approach provides us with the necessary time to adapt to AWS MSK’s new features and ensures that we are not rushed into a complex migration process under tight deadlines. We are committed to completing this migration seamlessly and will continue to work closely with AWS and our teams to achieve this goal.
Considering all these factors,we have planned to perform the migration to another new deployed kafka cluster managed by HM until the MSK is ready to manage the custom DN.
The SSL certificate will expiry Next year June so we have got a year bandwidth to complete this migration to MSK.

Migrating the Kafka cluster to Kubernetes using Kafka MirrorMaker and maintaining the existing NLB configuration offers a streamlined approach with minimal disruption. By carefully managing security groups and leveraging real-time data replication, you can ensure a secure and efficient transition to a scalable and flexible Kafka deployment on Kubernetes.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.