The Definitive Guide to Advanced Google Analytics for 2025

By Naoki Shibayama
Cover image of the article

How to export Google Analytics data to BigQuery and integrate it with your company's data for advanced analysis? We cover schema details, data integration methods, and practical techniques for visualizing the complete picture of customer behavior.

The Importance of Exporting Google Analytics to BigQuery

Google Analytics 4 (GA4) is a standard analytics tool used by many websites and applications. However, the standard GA4 interface has limitations when it comes to deeper data analysis and flexible data utilization. This is where the BigQuery export function becomes extremely important.

Use Cases

Here are specific benefits and use cases for GA4 data exported to BigQuery:

  1. Integrated Data Analysis

    : Integrate data from sites, apps, and internal systems in one place to comprehensively understand customer behavior.

  2. Long-term User Behavior Analysis

    : While the GA4 UI has period limitations, BigQuery allows you to retain and analyze user behavior history over extended periods.

  3. Advanced Segment Analysis

    : Leverage BigQuery's SQL to extract user segments with complex conditions that would be difficult through the GA4 UI.

  4. Custom Metric Creation

    : Freely define and calculate complex metrics based on your unique business logic.

  5. Integrated Dashboard Building

    : Connect with various BI tools to build a single dashboard integrating GA4 data with internal data.

  6. Detailed Conversion Analysis

    : Analyze complex purchase paths and exit points in detail to optimize conversions.

Schema

The data exported from GA4 to BigQuery has a unique schema structure. Here we'll explain the schema related primarily to web events.

Event Types

GA4 records various events, but here are the main event types that are particularly useful:

  • page_view

    : Event that occurs when a user views a page

  • session_start

    : Event recorded at the start of a user session

  • first_visit

    : Event recorded during a user's first visit

  • form_submit

    : Event that occurs when a form is submitted

  • user_engagement

    : Event recorded when a user interacts with content

Key Columns

event_name

A string indicating the type of event. The event types explained above (page_view, session_start, etc.) are stored here.

event_date

Indicates the date the event occurred. Note that this column is stored as a STRING type (YYYYMMDD format). Conversion is necessary when computing as a date:

SELECT
  PARSE_DATE("%Y%m%d", event_date) as event_date 
FROM ...

event_timestamp

An INT64 type value that records the exact time the event occurred in microseconds. It represents the elapsed time from the UNIX epoch (January 1, 1970) in microseconds.

event_params

A complex structure that stores detailed parameters related to the event. The structure of this field is somewhat special, with the following structure:

ARRAY<STRUCT<
        key STRING,
        value STRUCT<
            string_value STRING,
            int_value INT64,
            float_value FLOAT64,
            double_value FLOAT64
        >
    >
>

Each parameter is a key-value pair, with values stored in different fields (string_value, int_value, etc.) depending on the type. This structure contains important properties, so we'll explain how to extract them in the "Data Handling Techniques" section.

user_pseudo_id

A pseudo-ID value used to identify users. This ID is generated for each device and browser and functions as a cookie-based identifier. A new ID is generated when users delete cookies or use a different device or browser.

user_id

The actual user ID explicitly set when a site or app implements user authentication. This allows for more accurate user analysis by tracking the same user across devices. It can also be used for data integration with other internal systems.

Other Important Columns

  • device

    : Information about device type, browser, operating system, etc.

  • geo

    : Geographic information such as country, region, city, etc.

  • traffic_source

    : Information about the traffic source

There are other columns that could be used, but they are not well-documented, so you need to discover them empirically by examining the data.

This site is partially paid, but it provides a somewhat comprehensive explanation of the schema.

Data Handling Techniques

GA4's BigQuery export data has a unique structure, so here are techniques for efficient analysis.

Using TABLE_SUFFIX

GA4 data is stored in separate tables for each date (with naming conventions like events_20250401). Using TABLE_SUFFIX to specify a specific period narrows the scan range and improves query performance. It's recommended to always include this:

SELECT
  *
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20201101' AND '20201201'

Parsing event_params

The event_params field has a complex structure. For frequently used parameters, it's convenient to extract them using subqueries that filter by key.

Here's an example of extracting the page_location (URL at the time of event sending) from a commonly used page_view event:

SELECT
  (
    SELECT
      param.value.string_value,
    FROM
      UNNEST(event_params) AS param
    WHERE
      param.key = 'page_location'
  )
  AS page_location,
  *
FROM
WHERE
  event_name = 'page_view'

Parsing domain and path

To analyze URL information, you can extract various parts (domain, path, query parameters, etc.) from the page_location parameter using regular expressions. This enables detailed analysis based on specific URL patterns.

The following extracts domain, path, and query parameters from page_location. Additionally, the path is divided into an array in path_segments since path may have a tree structure where the top directory may hold important meaning:

SELECT
  -- page info
  REGEXP_EXTRACT(page_location, r'^https?://([^/]+)') AS domain,
  REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/[^?]*)') AS path,
  REGEXP_EXTRACT(page_location, r'\?(.*)$') AS query_params,
  SPLIT(SUBSTR(REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/[^?]*)'), 2), '/') AS path_segments,
  -- device info
  device.category AS device_category,
  device.operating_system AS device_os,
  device.language AS device_language,
  device.web_info.browser AS device_browser,
  -- geo info
  geo.continent AS geo_continent,
  geo.country AS geo_country,
  geo.region AS geo_region,
  geo.city AS geo_city,
  -- source info
  traffic_source.medium AS source_medium,
  traffic_source.source AS source_name,
  *,
FROM
...

Official Sample Code (Query Cookbook)

Google's official documentation provides examples of basic queries. Here's a brief introduction to what's available:

Basic Queries

  • Specific Period Queries

    : Queries to extract data for a specific date range

  • User Count and New User Count

    : Queries to aggregate total users and new users

  • Average Transactions per Purchasing User

    : Queries to calculate the average number of purchases per user

  • Values for Specific Event Names

    : Queries to aggregate the frequency of event types

  • Top 10 Items Added to Cart

    : Queries to analyze popular products by cart additions

  • Average Page Views by Purchaser Type

    : Queries to compare page-viewing behavior of purchasers and non-purchasers

Advanced Event Queries

  • Products Purchased by Users Who Purchased a Specific Product

    : Queries for cross-sell analysis

  • Average Purchase Amount per Session for Each User

    : Queries for session-based purchase analysis

  • Latest Session ID and Session Number for Users

    : Queries to track the latest user behavior

I won't go into detail on individual queries here, but they mainly cover basics and e-commerce cases, so check them out if you're interested.

I think seeing queries along with their execution results helps create a better image, so for example, if you run the 'Specific Period Query' mentioned above in Codatum, it would look like this:

Basic Analysis Example

Basic View-related Dashboard

Using the techniques and sample code explained so far, you can create a simple dashboard like the one shown.

E-commerce Analysis Reports & Dashboards

Additionally, we have templated an e-commerce analysis dashboard using BigQuery's official sample dataset, complete with SQL code and conceptual approach. Please check this out as well.

Advanced Analysis Example

Now let's look at advanced analysis methods using GA4 data with specific examples.

Integration with Company Data (CRM, etc.)

By connecting online user behavior data obtained from GA4 with internal CRM data or sales management system data, a more comprehensive understanding of customers becomes possible.

Basic Concept of Data Integration

To link GA4 data with internal data, a common key is needed. In many cases, integration is done through the following methods:

  • Link GA4's user_id with the company system's customer_id

    Set the user ID in the internal system as the user_id field in GA4 for logged-in users.

  • Join in the Data Warehouse

    Join GA4 data exported to BigQuery with internal data.

Here's an example of hypothetical SQL (for illustration purposes only):

SELECT
  ga.user_id,
  crm.customer_id,
  crm.customer_segment,
  crm.membership_level,
  ga.event_name,
  ga.event_timestamp
FROM
  `project_id.ga4_dataset.events_*` AS ga
JOIN
  `project_id.crm_dataset.customer_master` AS crm
ON
  ga.user_id = crm.web_user_id
WHERE
  _TABLE_SUFFIX BETWEEN '20250301' AND '20250331'
  AND ga.user_id IS NOT NULL

Detailed Analysis through Customer Attribute Data Integration

By connecting customer attribute data stored in your CRM with GA4 data, it becomes possible to analyze user behavior by different user attributes.

SELECT
  crm.customer_segment,
  COUNT(DISTINCT ga.user_id) AS unique_users,
  COUNT(*) AS total_events,
  COUNT(CASE WHEN ga.event_name = 'page_view' THEN 1 END) AS pageviews,
  COUNT(CASE WHEN ga.event_name = 'purchase' THEN 1 END) AS purchases,
  SUM(CASE 
      WHEN ga.event_name = 'purchase' THEN (
        SELECT param.value.double_value
        FROM UNNEST(ga.event_params) AS param
        WHERE param.key = 'value'
      )
      ELSE 0
    END) AS total_revenue
FROM
  `project_id.ga4_dataset.events_*` AS ga
JOIN
  `project_id.crm_dataset.customer_master` AS crm
ON
  ga.user_id = crm.web_user_id
WHERE
  _TABLE_SUFFIX BETWEEN '20250301' AND '20250331'
GROUP BY
  crm.customer_segment
ORDER BY
  total_revenue DESC

This integration allows you to analyze online behavior data measured in GA4 together with customer attribute information stored in CRM (member rank, purchase history, customer segments, etc.).

This enables more detailed and valuable analysis, such as "pages most viewed by gold members" or "recent site behavior patterns of users who purchased in the last 3 months."

Notes on Numerical Consistency with Google Analytics UI

There are often discrepancies between the figures displayed in the GA4 UI and those calculated from BigQuery export data. These differences can arise from sampling, approximations, and other factors that should be kept in mind.

When there are differences between the data your company is viewing in the Google Analytics UI and your own data, comparing with the detailed data from BigQuery export can be helpful.

Conclusion

We've explained how exporting Google Analytics data to BigQuery enables data analysis and utilization that isn't possible with Google Analytics alone.

While there are initial setup and learning costs, the potential value outweighs these costs. In many cases, the data obtainable through GA4 is only a superficial part of your company's site, and without matching it with your company's data outside GA4, you might not even be measuring conversions in a meaningful sense.

Relying on superficial analysis can lead to continued wrong decisions based on incorrect data.

Codatum provides comprehensive support from building an analytics foundation using GA4 and BigQuery to connecting with your company's data and developing BI dashboards. Please feel free to contact us as a partner to maximize the power of your data and support your business growth.

Start using Codatum for free

Unlock the potential of your data
with Codatum