Customizing Confluent Platform with Ansible AWS EC2 Dynamic Inventory

Sandon Jacobs
4 min readNov 20, 2020

--

How did we get here…

After using AWS CDK to create the hardware for the cluster, with that hardware tagged appropriately for the various components of the Confluent Platform, we clone the cp-ansible repository from GitHub and checkout the 5.5.2-post branch (as we want to install version 5.5.2 of the Confluent Platform).

Out of the box, all is well. We have an “unsecured” cluster, with 3 zookeeper nodes, 3 kafka brokers, 2 schema registries, 2 kafka connect nodes, and Confluent Control Center — all configured to live in harmony.

Note: At this point, I’m not concerned with the “unsecured” part of this equation, as the cluster is in a subnet where access is restricted from the outside world. Moving on…

What we want…

Some points to ponder here with regards to inventory in the Ansible universe:

  1. Using aws-cdk, we have created these EC2 instances and assigned them “roles” using tagging. Given the amazon.aws_ec2 plugin for Ansible, we can use those tags to derive our inventory for cp-ansible to install CP for us. We’d prefer this route over generating a static inventory file and chasing IP addresses/private DNS names around the cloud.
  2. So, a working cluster is as well and good. But how do we change the default configuration values. For instance, I want to manage the creation of topics in my CI/CD pipeline, rather than allowing producers and consumers to create them. So let’s set auto.create.topics.enable to false. Perhaps in my test clusters, I don’t need to retain kafka log data for 7 days by default. Maybe in those clusters, we can have log.retention.hours to 48 or so. And also when a topic is created, I want to ensure that data is replicated. Let’s set default.replication.factor to 3, for starters… According to the CP documentation for ansible, we should be able to apply custom property changes to parts of our inventory via the inventory file itself. However, extensive trial-and-error proved this documented approach did not work with dynamic inventory. (At least not for ME…)

We’ll have multiple clusters to manage in our CICD pipeline — for functional testing each commit, shared integration testing with other groups, and, of course, PRODUCTION. To avoid the “snowflake” infrastructure conundrum, we want the CICD pipeline to be the sole arbiter of our infrastructure — from provisioning, to installation, to configuration — in ALL environments. Therefore, the CICD pipeline will pass along environment variables to determine the AWS stack and tags to use to lookup the inventory for execution. For instance, a commit to a feature branch for a pending pull request could execute a pipeline that:

  1. Provisions hardware for a CP cluster tagged as cicd.
  2. Create topics and kafka connectors required of the data pipeline — in our case this includes Kafka Streams app(s), packaged as docker images and deployed to ECS.
  3. Deploy those applications, configured to use the newly provisioned CP cicd stack.
  4. Execute a series of functional tests.
  5. Destroy the cicd stack — for cost savings…

In a more persistent, shared environment — like for integration testing or production — most of this pipeline would be the same. The difference is the way the CP cluster is tagged, what tests are executed, and that it would not be destroyed. However, there are steps that would “update” that cluster — perhaps new topics or kafka connectors, or even something like changing the configuration of an existing topic or connector. All of that is CODE and a part of our pipeline. Therefore we have a history of what changed, when, and by who!!!

So fix it…

Turns out Ansible provides a handy mechanism known as group_vars, allowing us to apply variables to the inventory from other files in our repository.

More on Ansible group_vars

So the repo looks something like this:

├── README.md
├── ansible
│ └── confluent # Custom ansible stuff...
├── cdk
│ ├── README.md
│ ├── __init__.py
│ ├── requirements.txt
│ ├── resources
│ ├── stacks # Defines the AWS stacks (cicd, test, prod...)
│ └── tools

If we drill down into the ansible directory:

./ansible/
└── confluent
├── cicd
│ └── kafka_broker.yml # customizations to kafka_broker(s)
├── cicd_aws_ec2.yml # Dynamic inventory for cicd CP stack...

Given the CP is installed on the ec2 instances using the inventory in cicd_aws_ec2.yml, contents below…

If we have a look at the template for the server.properties file in the cp-ansible project, branch 5.5.2-post…

{% for key, value in kafka_broker.properties.items() %}
{{key}}={{value}}
{% endfor %}

We can customize the kafka_broker installations using the contents of the cicd/kafka_broker.yml file as follows…

(Note: group_vars will use these files with an extension of .yml, .yaml, or NO extension at all.)

We can now execute our ansible-playbook command, using these inventory files…

cd cp-ansibleansible-playbook ./all.yml \
-i ../ansible/confluent/cicd_aws_ec2.yml \
-i ../ansible/confluent/cicd/ \
--private-key='~/.ssh/id_rsa' --user ubuntu -b

Now let’s have a look at the server.properties on one of our broker nodes…

# Maintained by Ansible
log.dirs=/var/lib/kafka/data
broker.id=1
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=3000
log.retention.hours=48
log.segment.bytes=1073741824
num.io.threads=16
num.network.threads=8
num.partitions=1
num.recovery.threads.per.data.dir=2
offsets.topic.replication.factor=3
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
transaction.state.log.min.isr=2
transaction.state.log.replication.factor=3
zookeeper.connection.timeout.ms=18000
confluent.license.topic.replication.factor=3
confluent.metadata.topic.replication.factor=3
confluent.security.event.logger.exporter.kafka.topic.replicas=3
confluent.support.metrics.enable=true
confluent.support.customer.id=anonymous
auto.create.topics.enable=false
default.replication.factor=3

What’s next…

We still have plans to customize other parts of the cluster, both via ansible and aws-cdk. For example, I am currently working on using CloudMap for service discovery of different CP components.

I’ll be sure to share what I learn…

Sign up to discover human stories that deepen your understanding of the world.

--

--

Sandon Jacobs
Sandon Jacobs

Written by Sandon Jacobs

husband, father of 2 girls, proud Native, Java developer, wanna-be-pro golfer

No responses yet

Write a response