What you will learn
In this scenario you will learn more about Debezium, a project that provides change data capture for any of supported databases
- MySQL
- PostgreSQL
- MongoDB
- Microsoft SQL Server
- Oracle (incubating)
- Apache Cassandra (incubating)
You will deploy a complete end-to-end solution that will capture events from database transaction logs and make those events available to processing by downstream consumers via an Apache Kafka broker.
What is Debezium?
Debezium is a set of distributed services capture row-level changes in your databases so that your applications can see and respond to those changes. Debezium records all row-level changes committed to a particular database table in a dedicated message topic. Each application simply reads the topic(s) they are interested in, and they see all of the events in the same order in which they occurred.
Technically Debezium utilizes the Apache Kafka streaming platform to distribute events captured from database. It is a set of plug-ins for Kafka Connect that publish messages to a Kafka broker.
The minimum components required for skeleton deployment are
- Kafka broker - consisting of a single Apache ZooKeeper instance for cluster management and a single node of Kafka broker
- Kafka Connect node - containing and configured to stream data from a database
- source database
The following diagram shows the minimal deployment
In the next steps we will deploy the components and get dataflow running from a MySQL database to a Kafka broker.
In this scenario you learned about the change data capture concept and how you can leverage Debezium for that purpose.
You have learnt what components you need to deploy a solution based on Debezium, how to deploy an Apache Kafka broker and how to deploy a Kafka Connect instance with Debezium inside and create a link between the Kafka Connect and source database.
But this is just a beginning of a long journey. Please take your time and look at these resources:
- Debezium website
- Extensive Debezium tutorial
- Debezium examples repository
- How to run Debezium on Minishift
- The Strimzi project for running Apache Kafka on OpenShift

Steps
Getting Started with Debezium on OpenShift
Deploying a Kafka broker
A fresh project named debezium
is prepared with the necessary resources required to execute the deployment.
There are multiple resources created for you in the home directory, the project itself or configured in OpenShift
- an installed release 0.14.0 of Strimzi project Kafka operator
- Strimzi Cluster Operator managing Kafka brokers
- MySQL instance containing a small set of data to be streamed
- templates used to deploy components
1. Run the following commands to switch to debezium
project and explore it.
If you click on command it gets automatically copied it into the terminal and is executed
Switch to debezium
project
oc project debezium
Check that MySQL instance is running
oc get pods
and that it is exposed as a service
oc get svc
The diagram of deployment now looks like
2. Deploy Kafka broker with ZooKeeper.
The first component to deploy is a Kafka broker.
This task is delegated to templates and Cluster Controller provided by Strimzi project. The templates are already present in the home directory in the cloned repository.
The templates by default deploy Kafka broker and ZooKeeper in a high-available configuration with replication factor 3
.
This is not necessary in the development environment so we reduce the number of nodes and replication factor for system topics to 1
.
We also deploy an ephemeral variant of the broker. You should use persistent variant in production.
To deploy the broker issue a command
oc new-app strimzi-ephemeral -p ZOOKEEPER_NODE_COUNT=1 -p KAFKA_NODE_COUNT=1 -p KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1
Now let's wait till both ZooKeeper and Kafka broker are deployed
oc get pods -w
The final list of pods should be similar to
NAME READY STATUS RESTARTS AGE
my-cluster-entity-operator-798b74565c-bkjwh 3/3 Running 1 32s
my-cluster-kafka-0 2/2 Running 0 1m
my-cluster-zookeeper-0 2/2 Running 0 1m
mysql-1-w7shk 1/1 Running 0 9m
strimzi-cluster-operator-5658b55c84-89mf5 1/1 Running 0 9m
Note: Kafka depends on ZooKeeper so intermittent Kafka failures are expected as ZooKeeper might not be initialized at the time of Kafka start.
New services are available
oc get svc -l app=strimzi-ephemeral
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-cluster-kafka-bootstrap ClusterIP 172.30.136.36 <none> 9091/TCP,9092/TCP,9093/TCP 2m
my-cluster-kafka-brokers ClusterIP None <none> 9091/TCP,9092/TCP,9093/TCP 2m
my-cluster-zookeeper-client ClusterIP 172.30.82.207 <none> 2181/TCP 3m
my-cluster-zookeeper-nodes ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 3m
3. Verify the broker is up and running.
Note: The complete initialization of all components can take a couple of minutes. Please make sure that all pods are in Running state and are Ready before you try the next steps.
A successful attempt to send a message to (no output expected here)
echo "Hello world" | oc exec -i -c kafka my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
and receive a message from
oc exec -c kafka my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --max-messages 1
the deployed broker indicates that it is available.
Congratulations
You have now successfully executed the first step in this scenario.
You have successfully deployed Kafka broker service and made it available to clients to produce and consume messages.
In the next step of this scenario, we will deploy a single instance of Debezium.