
How to export Google Analytics data to BigQuery and integrate it with your company's data for advanced analysis? We cover schema details, data integration methods, and practical techniques for visualizing the complete picture of customer behavior.
The Importance of Exporting Google Analytics to BigQuery
Google Analytics 4 (GA4) is a standard analytics tool used by many websites and applications. However, the standard GA4 interface has limitations when it comes to deeper data analysis and flexible data utilization. This is where the BigQuery export function becomes extremely important.
Use Cases
Here are specific benefits and use cases for GA4 data exported to BigQuery:
Integrated Data Analysis
: Integrate data from sites, apps, and internal systems in one place to comprehensively understand customer behavior.
Long-term User Behavior Analysis
: While the GA4 UI has period limitations, BigQuery allows you to retain and analyze user behavior history over extended periods.
Advanced Segment Analysis
: Leverage BigQuery's SQL to extract user segments with complex conditions that would be difficult through the GA4 UI.
Custom Metric Creation
: Freely define and calculate complex metrics based on your unique business logic.
Integrated Dashboard Building
: Connect with various BI tools to build a single dashboard integrating GA4 data with internal data.
Detailed Conversion Analysis
: Analyze complex purchase paths and exit points in detail to optimize conversions.
Schema
The data exported from GA4 to BigQuery has a unique schema structure. Here we'll explain the schema related primarily to web events.
Event Types
GA4 records various events, but here are the main event types that are particularly useful:
page_view
: Event that occurs when a user views a page
session_start
: Event recorded at the start of a user session
first_visit
: Event recorded during a user's first visit
form_submit
: Event that occurs when a form is submitted
user_engagement
: Event recorded when a user interacts with content
Key Columns
event_name
A string indicating the type of event. The event types explained above (page_view, session_start, etc.) are stored here.
event_date
Indicates the date the event occurred. Note that this column is stored as a STRING type (YYYYMMDD format). Conversion is necessary when computing as a date:
SELECT
PARSE_DATE("%Y%m%d", event_date) as event_date
FROM ...
event_timestamp
An INT64 type value that records the exact time the event occurred in microseconds. It represents the elapsed time from the UNIX epoch (January 1, 1970) in microseconds.
event_params
A complex structure that stores detailed parameters related to the event. The structure of this field is somewhat special, with the following structure:
ARRAY<STRUCT<
key STRING,
value STRUCT<
string_value STRING,
int_value INT64,
float_value FLOAT64,
double_value FLOAT64
>
>
>
Each parameter is a key-value pair, with values stored in different fields (string_value, int_value, etc.) depending on the type. This structure contains important properties, so we'll explain how to extract them in the "Data Handling Techniques" section.
user_pseudo_id
A pseudo-ID value used to identify users. This ID is generated for each device and browser and functions as a cookie-based identifier. A new ID is generated when users delete cookies or use a different device or browser.
user_id
The actual user ID explicitly set when a site or app implements user authentication. This allows for more accurate user analysis by tracking the same user across devices. It can also be used for data integration with other internal systems.
Other Important Columns
device
: Information about device type, browser, operating system, etc.
geo
: Geographic information such as country, region, city, etc.
traffic_source
: Information about the traffic source
There are other columns that could be used, but they are not well-documented, so you need to discover them empirically by examining the data.
This site is partially paid, but it provides a somewhat comprehensive explanation of the schema.
Data Handling Techniques
GA4's BigQuery export data has a unique structure, so here are techniques for efficient analysis.
Using TABLE_SUFFIX
GA4 data is stored in separate tables for each date (with naming conventions like events_20250401). Using TABLE_SUFFIX to specify a specific period narrows the scan range and improves query performance. It's recommended to always include this:
SELECT
*
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20201101' AND '20201201'
Parsing event_params
The event_params field has a complex structure. For frequently used parameters, it's convenient to extract them using subqueries that filter by key.
Here's an example of extracting the page_location
(URL at the time of event sending) from a commonly used page_view
event:
SELECT
(
SELECT
param.value.string_value,
FROM
UNNEST(event_params) AS param
WHERE
param.key = 'page_location'
)
AS page_location,
*
FROM
WHERE
event_name = 'page_view'
Parsing domain and path
To analyze URL information, you can extract various parts (domain, path, query parameters, etc.) from the page_location parameter using regular expressions. This enables detailed analysis based on specific URL patterns.
The following extracts domain, path, and query parameters from page_location. Additionally, the path is divided into an array in path_segments since path may have a tree structure where the top directory may hold important meaning:
SELECT
-- page info
REGEXP_EXTRACT(page_location, r'^https?://([^/]+)') AS domain,
REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/[^?]*)') AS path,
REGEXP_EXTRACT(page_location, r'\?(.*)$') AS query_params,
SPLIT(SUBSTR(REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/[^?]*)'), 2), '/') AS path_segments,
-- device info
device.category AS device_category,
device.operating_system AS device_os,
device.language AS device_language,
device.web_info.browser AS device_browser,
-- geo info
geo.continent AS geo_continent,
geo.country AS geo_country,
geo.region AS geo_region,
geo.city AS geo_city,
-- source info
traffic_source.medium AS source_medium,
traffic_source.source AS source_name,
*,
FROM
...
Official Sample Code (Query Cookbook)
Google's official documentation provides examples of basic queries. Here's a brief introduction to what's available:
Basic Queries
Specific Period Queries
: Queries to extract data for a specific date range
User Count and New User Count
: Queries to aggregate total users and new users
Average Transactions per Purchasing User
: Queries to calculate the average number of purchases per user
Values for Specific Event Names
: Queries to aggregate the frequency of event types
Top 10 Items Added to Cart
: Queries to analyze popular products by cart additions
Average Page Views by Purchaser Type
: Queries to compare page-viewing behavior of purchasers and non-purchasers
Advanced Event Queries
Products Purchased by Users Who Purchased a Specific Product
: Queries for cross-sell analysis
Average Purchase Amount per Session for Each User
: Queries for session-based purchase analysis
Latest Session ID and Session Number for Users
: Queries to track the latest user behavior
I won't go into detail on individual queries here, but they mainly cover basics and e-commerce cases, so check them out if you're interested.
I think seeing queries along with their execution results helps create a better image, so for example, if you run the 'Specific Period Query' mentioned above in Codatum, it would look like this:
Basic Analysis Example
Basic View-related Dashboard
Using the techniques and sample code explained so far, you can create a simple dashboard like the one shown.
E-commerce Analysis Reports & Dashboards
Additionally, we have templated an e-commerce analysis dashboard using BigQuery's official sample dataset, complete with SQL code and conceptual approach. Please check this out as well.
Advanced Analysis Example
Now let's look at advanced analysis methods using GA4 data with specific examples.
Integration with Company Data (CRM, etc.)
By connecting online user behavior data obtained from GA4 with internal CRM data or sales management system data, a more comprehensive understanding of customers becomes possible.
Basic Concept of Data Integration
To link GA4 data with internal data, a common key is needed. In many cases, integration is done through the following methods:
Link GA4's user_id with the company system's customer_id
Set the user ID in the internal system as the user_id field in GA4 for logged-in users.
Join in the Data Warehouse
Join GA4 data exported to BigQuery with internal data.
Here's an example of hypothetical SQL (for illustration purposes only):
SELECT
ga.user_id,
crm.customer_id,
crm.customer_segment,
crm.membership_level,
ga.event_name,
ga.event_timestamp
FROM
`project_id.ga4_dataset.events_*` AS ga
JOIN
`project_id.crm_dataset.customer_master` AS crm
ON
ga.user_id = crm.web_user_id
WHERE
_TABLE_SUFFIX BETWEEN '20250301' AND '20250331'
AND ga.user_id IS NOT NULL
Detailed Analysis through Customer Attribute Data Integration
By connecting customer attribute data stored in your CRM with GA4 data, it becomes possible to analyze user behavior by different user attributes.
SELECT
crm.customer_segment,
COUNT(DISTINCT ga.user_id) AS unique_users,
COUNT(*) AS total_events,
COUNT(CASE WHEN ga.event_name = 'page_view' THEN 1 END) AS pageviews,
COUNT(CASE WHEN ga.event_name = 'purchase' THEN 1 END) AS purchases,
SUM(CASE
WHEN ga.event_name = 'purchase' THEN (
SELECT param.value.double_value
FROM UNNEST(ga.event_params) AS param
WHERE param.key = 'value'
)
ELSE 0
END) AS total_revenue
FROM
`project_id.ga4_dataset.events_*` AS ga
JOIN
`project_id.crm_dataset.customer_master` AS crm
ON
ga.user_id = crm.web_user_id
WHERE
_TABLE_SUFFIX BETWEEN '20250301' AND '20250331'
GROUP BY
crm.customer_segment
ORDER BY
total_revenue DESC
This integration allows you to analyze online behavior data measured in GA4 together with customer attribute information stored in CRM (member rank, purchase history, customer segments, etc.).
This enables more detailed and valuable analysis, such as "pages most viewed by gold members" or "recent site behavior patterns of users who purchased in the last 3 months."
Notes on Numerical Consistency with Google Analytics UI
There are often discrepancies between the figures displayed in the GA4 UI and those calculated from BigQuery export data. These differences can arise from sampling, approximations, and other factors that should be kept in mind.
When there are differences between the data your company is viewing in the Google Analytics UI and your own data, comparing with the detailed data from BigQuery export can be helpful.
Conclusion
We've explained how exporting Google Analytics data to BigQuery enables data analysis and utilization that isn't possible with Google Analytics alone.
While there are initial setup and learning costs, the potential value outweighs these costs. In many cases, the data obtainable through GA4 is only a superficial part of your company's site, and without matching it with your company's data outside GA4, you might not even be measuring conversions in a meaningful sense.
Relying on superficial analysis can lead to continued wrong decisions based on incorrect data.
Codatum provides comprehensive support from building an analytics foundation using GA4 and BigQuery to connecting with your company's data and developing BI dashboards. Please feel free to contact us as a partner to maximize the power of your data and support your business growth.