BigQuery Workbook

This workbook contains all the SQL concepts required to effectively query GA4 data in Google BigQuery.

Analytic functions

Introduced in topic Identify The Exit Page Of A Single Session

Analytic functions take a function expression (e.g. FIRST_VALUE), and they run it over a partition of the data, returning a single result for each row being queried in the source table.

The partition is the window of rows around the current row that is being queried.

If you look at the image above, you can see how this works.

FIRST_VALUE specifies that we want the function to return the first event_timestamp value in the partition.

The OVER keyword specifies that what follows is the window we’ll query.

When the query engine runs the PARTITION BY user_pseudo_id, it creates a “temporary table”, where it orders all hits from the current user_pseudo_id (that’s what the data is partitioned by) by event_timestamp in ascending order.

Finally, FIRST_VALUE picks the first value from this partition, which would be the event with the lowest event_timestamp value for the current user_pseudo_id.

Here are what the partitions look like:


-- How the data is aligned with "PARTITION BY user_pseudo_id ORDER BY event_timestamp"

+----------------+------------+-----------------+
| user_pseudo_id | event_name | event_timestamp |
+----------------+------------+-----------------+
| user123        | page_view  | 12345           |
| user123        | page_view  | 12346           |
| user123        | page_view  | 12348           |
| user234        | page_view  | 12349           |
| user234        | page_view  | 12350           |
+----------------+------------+-----------------+

As you can see, it’s different from the source table. Hits are “grouped” by user, and ordered by event timestamp in an ascending order. Then, when FIRST_VALUE is evaluated, it fetches the first value of the partition where the user_pseudo_id matches the current row.

Thus, all rows that have user_pseudo_id of user123 will have the analytic function return 12345 (the event_timestamp of the first row in the partition), and all rows that have user_pseudo_id of user234 will have the analytic function return 12349.

It’s a difficult concept. But try to look past the syntax into what you’re trying to achieve here. You want to explore the data beyond the current row. In this example, you want to look for all hits from the user_pseudo_id of any given row, and fetch the first event_timestamp for this user.

BigQuery Workbook

Analytic functions

ARRAY_AGG

AS

BETWEEN … AND

CASE WHEN

Code comments

CONCAT

COUNT

COUNTIF

CROSS JOIN

CURRENT_DATE

DATE_DIFF

DATE_SUB

DISTINCT

EXCEPT

EXTRACT

FORMAT

FORMAT_DATE

FROM

GROUP BY

HAVING

IFNULL

IN

LAG and LEAD

LEFT JOIN

LIMIT

MAX

NULLIF

ORDER BY

PARSE_DATE

PIVOT

QUALIFY

REGEXP_EXTRACT

ROW_NUMBER

SAFE_DIVIDE

Scalar subquery

SELECT

SUM

_TABLE_SUFFIX

TIMESTAMP_MICROS

UNION

UNNEST

WHERE

WITH…AS