Difficulty: intermediate
Estimated Time: 30 minutes

What you will learn

In this scenario you will learn more about Debezium, a project that provides change data capture for MySQL, PostgreSQL and MongoDB databases.

You will deploy a complete end-to-end solution that will capture events from database transaction logs and make those events available to processing by downstream consumers via an Apache Kafka broker.

What is Debezium?

Logo

Debezium is a set of distributed services capture row-level changes in your databases so that your applications can see and respond to those changes. Debezium records all row-level changes committed to a particular database table in a dedicated message topic. Each application simply reads the topic(s) they are interested in, and they see all of the events in the same order in which they occurred.

Technically Debezium utilizes the Apache Kafka streaming platform to distribute events captured from database. It is a set of plug-ins for Kafka Connect that publish messages to a Kafka broker.

The minimum components required for skeleton deployment are

  • Kafka broker - consisting of a single Apache ZooKeeper instance for cluster management and a single node of Kafka broker
  • Kafka Connect node - containing and configured to stream data from a database
  • source database

The following diagram shows the minimal deployment

Minimal deployment

In the next steps we will deploy the components and get dataflow running from a MySQL database to a Kafka broker.

In this scenario you learned about the change data capture concept and how you can leverage Debezium for that purpose.

You have learnt what components you need to deploy a solution based on Debezium, how to deploy an Apache Kafka broker and how to deploy a Kafka Connect instance with Debezium inside and create a link between the Kafka Connect and source database.

But this is just a beginning of a long journey. Please take your time and look at these resources:

Don’t stop now! The next scenario will only take about 10 minutes to complete.

Getting Started with Debezium on OpenShift

Step 1 of 3

Deploying a Kafka broker

A fresh project named debezium is prepared with necessary resources required to execute deployment. There are multiple resources created for you in the home directoory, the project itself or configured in OpenShift

  • a cloned repository of Strimzi project
  • Strimzi Cluster Controller managing Kafka brokers
  • MySQL instance containing a small set of data to be streamed
  • templates used to deploy components

1. Run the following commands to switch to debezium project and explore it.

If you click on command it gets automatically copied it into the terminal and is executed

Switch to debezium project

oc project debezium

Check that MySQL instance is running

oc get pods

and that it is exposed as a service

oc get svc

The diagram of deployment now looks like

Empty deployment

2. Deploy Kafka broker with ZooKeeper.

The first component to deploy is a Kafka broker.

Broker deployment

This task is delegated to templates and Cluster Controller provided by Strimzi project. The templates are already present in the home directory in the cloned repository.

The templates by default deploy Kafka broker and ZooKeeper in a high-available configuration with replication factor 3. This is not necessary in development environment so we reduce the number of nodes and replication factor for system topics to 1.

We also deploy an ephemeral variant of the broker. You should use persistent variant in production.

To deploy the broker issue a command

oc new-app strimzi-ephemeral -p ZOOKEEPER_NODE_COUNT=1 -p KAFKA_NODE_COUNT=1 -p KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1

Now let's wait till both ZooKeeper and Kafka broker are deployed

oc get pods -w

The final list of pods should be similar to

NAME                                           READY     STATUS    RESTARTS   AGE
my-cluster-kafka-0                             1/1       Running   0          2m
my-cluster-topic-controller-4124062197-l9lq4   1/1       Running   0          1m
my-cluster-zookeeper-0                         1/1       Running   0          3m
mysql-1-84j4w                                  1/1       Running   0          10m
strimzi-cluster-controller-2044197322-cpmb9    1/1       Running   0          10m

Note: Kafka depends on ZooKeeper so intermittent Kafka failures are expected as ZooKeeper might not be initialized at time of Kafka start.

New services are available

oc get svc -l app=strimzi-ephemeral

NAME                            CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
my-cluster-kafka                172.30.243.180   <none>        9092/TCP,9091/TCP            1m
my-cluster-kafka-headless       None             <none>        9092/TCP,9091/TCP            1m
my-cluster-zookeeper            172.30.147.222   <none>        2181/TCP                     2m
my-cluster-zookeeper-headless   None             <none>        2181/TCP,2888/TCP,3888/TCP   2m

3. Verify the broker is up and running.

A successful attempt to send a message to

echo "Hello world" | oc exec -i my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

and receive a message from

oc exec my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --max-messages 1

the deployed broker indicates that it is available.

Congratulations

You have now successfully executed the first step in this scenario.

You have successfully deployed Kafka broker service and made it available to clients to produce and consume messages.

In next step of this scenario we will deploy a single instance of Debezium.