APPROX_COUNT_DISTINCT_EACH

The approximate count of distinct values per group in the time window.

Syntax

APPROX_COUNT_DISTINCT_EACH([MAX VALUES,] GROUP, VALUE) ‌

Arguments

MAX VALUES: The maximum number of groups. Groups beyond this amount will be discarded. In some cases, discarding groups via MAX VALUES may cause some data to be discarded from groups that would otherwise have been kept. MAX VALUES is optional. If it is omitted there is, in effect, no limit. GROUP: The field in which to group the results. VALUE: The field to count distinct values relative to each group.

Returns

A number

Notes

NULLs are not counted.

‌ Example

Data

[
   {
      "GROUP_ID":"G1",
      "USER_ID":"U1",
      "CONNECTION_TIME":"2020-06-26 02:31:29,573"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U2",
      "CONNECTION_TIME":"2020-06-26 18:11:45,783"
   },
   {
      "GROUP_ID":"G2",
      "USER_ID":"Z1",
      "CONNECTION_TIME":"2020-06-26 23:54:27,687"
   },
   {
      "GROUP_ID":"G2",
      "CONNECTION_TIME":"2020-07-26 23:54:27,687"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U2",
      "CONNECTION_TIME":"2021-07-01 02:31:29,573"
   },
   {
      "GROUP_ID":"G1",
      "USER_ID":"U1",
      "CONNECTION_TIME":"2021-07-01 18:11:45,783"
   },
   {
      "GROUP_ID":"G2",
      "USER_ID":"Z1",
      "CONNECTION_TIME":"2021-07-01 23:54:27,687"
   }
]

Query

Count the number of distinct USER_IDs for each unique GROUP_ID:

SET
   DATE_TIME = TO_DATE(data.CONNECTION_TIME);
SELECT
   APPROX_COUNT_DISTINCT_EACH(data.GROUP_ID, data.USER_ID) 
       AS approx_count_distinct_each_data_group_id__data_user_id 
FROM
   "SAMPLE_DATA_G1U1 - json"

Results:

{
   "approx_count_distinct_each_data_group_id__data_user_id":[
      {
         "key":"G1",
         "value":2
      },
      {
         "key":"G2",
         "value":1
      }
   ]
}

APPROXIMATE_COUNT_DISTINCT

Dialog

Last updated