Skip to main content

How can I use the new JSON Data Type with Kafka?

Learn how to load JSON messages from Apache Kafka directly into a single JSON column in ClickHouse using the Kafka table engine and JSON data type.

Kafka and the JSON Data Type

With the introduction of the new JSON data type, ClickHouse is now a good choice of database for doing JSON analytics. In this guide, we're going to learn how to load JSON messages from Apache Kafka directly into a single JSON column in ClickHouse.

Setup Kafka

Let's start by running a Kafka broker on our machine. We're also going to map port 9092 to port 9092 on our host operating system so that it's easier to interact with Kafka:

Ingest data into Kafka

Once that's running, we need to ingest some data. The Wikimedia recent changes feed is a good source of streaming data, so let's ingest that into the wiki_events topic:

We can check that the data is being ingested by running the following command:

So far, so good.

Ingest data into ClickHouse

Next, we're going to ingest the data into ClickHouse. First, let's enable the JSON type (which is currently experimental), by setting the following property:

Now, we'll create the wiki_queue table, which uses the Kafka table engine.

Note that we're using the JSONAsObject format, which will ensure that incoming messages are made available as a JSON object. This format can only be parsed into a table that has a single column with the JSON type.

Next, we'll create the underlying table to store the Wiki data:

Finally, let's create a materialized view to populate the wiki table:

Querying JSON data in ClickHouse

We can then write queries against the wiki table. For example, we could count the number of bots that have committed changes:

Or we could find out the users that make the most changes on en.wikipedia.org:

· 4 min read