Difficulty: intermediate
Estimated Time: 30 minutes

What you will learn

In this scenario you will learn more about Debezium, a project that provides change data capture for any of supported databases

  • MySQL
  • PostgreSQL
  • MongoDB
  • Microsoft SQL Server
  • Oracle (incubating)
  • Apache Cassandra (incubating)

You will deploy a complete end-to-end solution that will capture events from database transaction logs and make those events available to processing by downstream consumers via an Apache Kafka broker.

What is Debezium?


Debezium is a set of distributed services capture row-level changes in your databases so that your applications can see and respond to those changes. Debezium records all row-level changes committed to a particular database table in a dedicated message topic. Each application simply reads the topic(s) they are interested in, and they see all of the events in the same order in which they occurred.

Technically Debezium utilizes the Apache Kafka streaming platform to distribute events captured from database. It is a set of plug-ins for Kafka Connect that publish messages to a Kafka broker.

The minimum components required for skeleton deployment are

  • Kafka broker - consisting of a single Apache ZooKeeper instance for cluster management and a single node of Kafka broker
  • Kafka Connect node - containing and configured to stream data from a database
  • source database

The following diagram shows the minimal deployment

Minimal deployment

In the next steps we will deploy the components and get dataflow running from a MySQL database to a Kafka broker.

In this scenario you learned about the change data capture concept and how you can leverage Debezium for that purpose.

You have learnt what components you need to deploy a solution based on Debezium, how to deploy an Apache Kafka broker and how to deploy a Kafka Connect instance with Debezium inside and create a link between the Kafka Connect and source database.

But this is just a beginning of a long journey. Please take your time and look at these resources:

Getting Started with Debezium on OpenShift

Step 1 of 3

Deploying a Kafka broker

A fresh project named debezium is prepared with the necessary resources required to execute the deployment. There are multiple resources created for you in the home directory, the project itself or configured in OpenShift

  • an installed release 0.14.0 of Strimzi project Kafka operator
  • Strimzi Cluster Operator managing Kafka brokers
  • MySQL instance containing a small set of data to be streamed
  • templates used to deploy components

1. Run the following commands to switch to debezium project and explore it.

If you click on command it gets automatically copied it into the terminal and is executed

Switch to debezium project

oc project debezium

Check that MySQL instance is running

oc get pods

and that it is exposed as a service

oc get svc

The diagram of deployment now looks like

Empty deployment

2. Deploy Kafka broker with ZooKeeper.

The first component to deploy is a Kafka broker.

Broker deployment

This task is delegated to templates and Cluster Controller provided by Strimzi project. The templates are already present in the home directory in the cloned repository.

The templates by default deploy Kafka broker and ZooKeeper in a high-available configuration with replication factor 3. This is not necessary in the development environment so we reduce the number of nodes and replication factor for system topics to 1.

We also deploy an ephemeral variant of the broker. You should use persistent variant in production.

To deploy the broker issue a command


Now let's wait till both ZooKeeper and Kafka broker are deployed

oc get pods -w

The final list of pods should be similar to

NAME                                          READY     STATUS    RESTARTS   AGE
my-cluster-entity-operator-798b74565c-bkjwh   3/3       Running   1          32s
my-cluster-kafka-0                            2/2       Running   0          1m
my-cluster-zookeeper-0                        2/2       Running   0          1m
mysql-1-w7shk                                 1/1       Running   0          9m
strimzi-cluster-operator-5658b55c84-89mf5     1/1       Running   0          9m

Note: Kafka depends on ZooKeeper so intermittent Kafka failures are expected as ZooKeeper might not be initialized at the time of Kafka start.

New services are available

oc get svc -l app=strimzi-ephemeral

NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
my-cluster-kafka-bootstrap    ClusterIP   <none>        9091/TCP,9092/TCP,9093/TCP   2m
my-cluster-kafka-brokers      ClusterIP   None            <none>        9091/TCP,9092/TCP,9093/TCP   2m
my-cluster-zookeeper-client   ClusterIP   <none>        2181/TCP                     3m
my-cluster-zookeeper-nodes    ClusterIP   None            <none>        2181/TCP,2888/TCP,3888/TCP   3m

3. Verify the broker is up and running.

Note: The complete initialization of all components can take a couple of minutes. Please make sure that all pods are in Running state and are Ready before you try the next steps.

A successful attempt to send a message to (no output expected here)

echo "Hello world" | oc exec -i -c kafka my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

and receive a message from

oc exec -c kafka my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --max-messages 1

the deployed broker indicates that it is available.


You have now successfully executed the first step in this scenario.

You have successfully deployed Kafka broker service and made it available to clients to produce and consume messages.

In the next step of this scenario, we will deploy a single instance of Debezium.