APPROX_COUNT_DISTINCT

The approximate number of distinct values in the time window. Use this function instead of COUNT_DISTINCT to improve performance. This should be used when there are relatively few (under 1 million) rows, and where the total number of distinct values is high.

Syntax

APPROX_COUNT_DISTINCT(VALUE)

Arguments

VALUE: Can be of any type

Returns

An integer.

Notes

Duplicate rows are not counted. NULLs are not counted.

Example

Data

[
   {
      "GROUP_ID":"G1",
      "USER_ID":"U1",
      "CONNECTION_TIME":"2020-06-26 02:31:29,573"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U2",
      "CONNECTION_TIME":"2020-06-26 18:11:45,783"
   },
   {
      "GROUP_ID":"G2",
      "USER_ID":"Z1",
      "CONNECTION_TIME":"2020-06-26 23:54:27,687"
   },
   {
      "GROUP_ID":"G2",
      "CONNECTION_TIME":"2020-07-26 23:54:27,687"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U2",
      "CONNECTION_TIME":"2021-07-01 02:31:29,573"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U1",
      "CONNECTION_TIME":"2021-07-01 18:11:45,783"
   },
   {
      "GROUP_ID":"G2",
      "USER_ID":"Z1",
      "CONNECTION_TIME":"2021-07-01 23:54:27,687"
   }
]

Query

Count the number of distinct entries for each unique entry in a field:

Results

COUNT_DISTINCT APPROX_COUNT_DISTINCT_EACH

Dialog

Last updated

Was this helpful?