How to calculate the ratio of empty/zero values in every column in a table
How to calculate the ratio of empty/zero values in every column in a table
If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization
setting to optimize serialization.
This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:
For example, we ran this query above on the environmental sensors dataset table named sensors
which has over 20B rows and 19 columns:
Here is response:
From the results above:
- the
sensor_id
columns is not sparse at all. In fact, every row has a non-zero value - the
sensor_type
is only sparse about 15.9% of the time - the
P0
column is very sparse: 99.9% of the values are zero - the
pressure
column is quite sparse at 83% - and
temperature
column has 53.2% of its values missing or zero
Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!