The json in kafka is a complex nested json, how to parse it and put it into clickhouse

·

2 min read

Table of contents

No heading

No headings in the article.

To parse complex nested JSON data in Kafka and insert it into ClickHouse, you can follow the steps below:

  1. Create a Kafka engine table to read data from Kafka. For example:
CREATE TABLE my_kafka_table (
    key String,
    value String
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'localhost:9092',
         kafka_topic_list = 'my_topic',
         kafka_format = 'JSON',
         kafka_row_delimiter = '\n'

In the example above, we create a table named "my_kafka_table" to read data from Kafka.

  1. Create a new table to store the data read from Kafka. In the new table, you need to specify a data schema that matches the structure of the JSON data. For example, if the JSON data contains fields such as id, name, and address, you can create a new table as follows:
CREATE TABLE my_clickhouse_table (
    id Int64,
    name String,
    address String
) ENGINE = MergeTree()
ORDER BY id

In the example above, we create a table named "my_clickhouse_table" to store the parsed JSON data. The data schema includes fields such as id, name, and address, where name and address are string types.

  1. Use a SELECT statement to select data from the Kafka engine table, and use the JSONExtract function to parse the JSON data. For example:
SELECT 
    JSONExtract(value, 'id') AS id,
    JSONExtract(value, 'name') AS name,
    JSONExtract(value, 'address') AS address
FROM my_kafka_table

In the example above, we use the JSONExtract function to extract the values of fields such as id, name, and address from the JSON data.

  1. Insert the SELECT query results into the new table. For example:
INSERT INTO my_clickhouse_table (id, name, address)
SELECT 
    JSONExtract(value, 'id') AS id,
    JSONExtract(value, 'name') AS name,
    JSONExtract(value, 'address') AS address
FROM my_kafka_table

In the example above, we insert the SELECT query results into the new table named "my_clickhouse_table".

Note that if the JSON data contains arrays, you can use the "[]" notation to specify the array index. For example, if the JSON data contains an array named "phone_numbers", you can use the following syntax to select the first element from the array:

JSONExtract(value, 'phone_numbers[1]')

In summary, by using the JSONExtract function in ClickHouse and the correct data schema, you can easily parse complex nested JSON data from Kafka and store it in ClickHouse.