Skip to main content

How to calculate the ratio of empty/zero values in every column in a table

Learn how to calculate the ratio of empty or zero values in every column of a ClickHouse table to optimize sparse column serialization.

How to calculate the ratio of empty/zero values in every column in a table

If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.

This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:

For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:

Here is response:

From the results above:

  • the sensor_id columns is not sparse at all. In fact, every row has a non-zero value
  • the sensor_type is only sparse about 15.9% of the time
  • the P0 column is very sparse: 99.9% of the values are zero
  • the pressure column is quite sparse at 83%
  • and temperature column has 53.2% of its values missing or zero

Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!

· 2 min read