Skip to main content

GA4-Shopify Data Linking

How to connect Google Analytics 4 data with Shopify customer data in BigQuery.


Summary

Link MethodCoverageMatch RateUse Case
transaction_idorder.id100% of purchases94%Purchase attribution
user_idcustomer.id0.59% of events98.8%User behavior tracking

Architecture Overview


Method 1: Transaction ID (Purchases)

Best for: Purchase attribution, revenue analysis, conversion tracking

GA4 FieldShopify FieldExample
event_params.transaction_idorders.id6062491893805

Query Pattern

-- Step 1: Export GA4 transactions (US region)
SELECT DISTINCT
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = "transaction_id") as transaction_id,
user_pseudo_id,
event_timestamp
FROM `eli-health-prod.analytics_361776673.events_*`
WHERE event_name = "purchase"
AND _TABLE_SUFFIX >= FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))

-- Step 2: Match with Shopify orders (northamerica-northeast1 region)
SELECT
o.id as order_id,
o.order_number,
o.email,
JSON_EXTRACT_SCALAR(o.customer, '$.id') as customer_id,
JSON_EXTRACT_SCALAR(o.customer, '$.first_name') as first_name,
o.total_price,
o.created_at
FROM `eli-health-prod.eli_health_shopify.orders` o
WHERE o.id IN (/* transaction_ids from step 1 */)

Coverage Stats (Last 30 Days)

MetricValue
GA4 purchase events1,834
Matched to Shopify orders94%
Unmatched (test orders, etc.)6%

Method 2: User ID (Site Behavior)

Best for: User journey analysis, cross-session behavior

GA4 FieldShopify FieldExample
user_idcustomers.id8614891192365

How user_id Gets Set

The user_id is only populated when:

  1. User is logged into their Shopify account on eli.health
  2. The website calls gtag('set', 'user_id', customerId)

Coverage by Event Type

EventWith user_idCoverage
download_app_link_click3113.2%
form_submit1368.3%
contact_us_form_submit206.4%
scroll_depth_90%2312.4%
page_view780.1%
session_start1230.1%
purchase00%
add_to_cart00%

Key insight: E-commerce events have 0% user_id because Shopify checkout doesn't pass it.

Query Pattern

-- Step 1: Get GA4 user_ids (US region)
SELECT DISTINCT SAFE_CAST(user_id AS INT64) as customer_id
FROM `eli-health-prod.analytics_361776673.events_*`
WHERE user_id IS NOT NULL AND user_id != ""
AND _TABLE_SUFFIX >= FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))

-- Step 2: Match with Shopify customers (northamerica-northeast1 region)
SELECT id, email, first_name, last_name
FROM `eli-health-prod.eli_health_shopify.customers`
WHERE id IN (/* customer_ids from step 1 */)

Match Rate

MetricValue
Unique GA4 user_ids (30 days)259
Matched to Shopify customers256 (98.8%)

Cross-Region Sync

BigQuery cannot directly JOIN tables across regions. GA4 exports to the US multi-region, but Shopify, Klaviyo, and LoopWork data lives in northamerica-northeast1 (Montreal).

Solution: GA4 Cross-Region Sync Cloud Function

A Cloud Function syncs GA4 event tables from US to Montreal daily, enabling direct SQL JOINs.

Sync Details

SettingValue
Cloud Functionga4-cross-region-sync
ScheduleDaily at 6 AM UTC
MethodExport → GCS (Parquet) → Import to Montreal
Tables per runUp to 10
Source datasetanalytics_361776673 (US)
Target datasetanalytics_ga4 (northamerica-northeast1)
Terraform moduletf/modules/global/ga4-sync/

Sync Status

Check which tables are synced:

SELECT MIN(table_name) as oldest, MAX(table_name) as newest, COUNT(*) as total
FROM `eli-health-prod.analytics_ga4.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE 'events_%';

Use Cases

1. Purchase Attribution

Question: Which marketing channels drive purchases?

-- GA4 purchase events with UTM parameters
SELECT
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = "transaction_id") as transaction_id,
traffic_source.source,
traffic_source.medium,
traffic_source.name as campaign
FROM `eli-health-prod.analytics_361776673.events_*`
WHERE event_name = "purchase"

Then join with Shopify to get customer details and LTV.

2. User Journey Before Purchase

Question: What pages do users view before buying?

-- Get user journey for purchasers (limited by user_id coverage)
SELECT
user_id,
event_name,
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = "page_location") as page,
event_timestamp
FROM `eli-health-prod.analytics_361776673.events_*`
WHERE user_id IN (/* purchaser user_ids */)
ORDER BY user_id, event_timestamp

3. Cart Abandonment Analysis

Question: Who added to cart but didn't purchase?

This is already handled by the native Klaviyo-Shopify integration. No GA4 linking needed.


Improving Coverage

To increase user_id coverage on eli.health:

Option 1: Set user_id on Login

// When user logs in via Shopify customer account
gtag('set', 'user_id', shopifyCustomerId);

Option 2: Use Shopify Customer Events API

Shopify's Customer Events API can pass customer ID to GA4.

Option 3: First-Party Data Mode

Use GA4's enhanced conversions to match users post-hoc.


Data Volumes

DatasetRecordsUpdate Frequency
GA4 events (30 days)642,213Real-time
Shopify customers43,628Daily (Airbyte)
Shopify orders8,704Daily (Airbyte)
GA4 user_ids (30 days)259Real-time

Marketing Attribution Dashboard

The GA4-Shopify linking powers the Marketing Attribution Dashboard in eli-kpi, available at /marketing?range=7|30|60|90.

Sections (all loaded via AJAX, responding to date range selection)

SectionData SourcePurpose
Summary StatsGA4 + ShopifyUsers, purchasers, revenue, AOV, conversion rate
Channel AttributionGA4 + ShopifyRevenue by source/medium
Conversion FunnelGA4Session → cart → checkout → purchase by channel
New vs ReturningShopifyRevenue split by customer type
Time to PurchaseGA4 + ShopifyDays from first visit to conversion, with AOV per window
US vs CanadaGA4 + ShopifyGeographic revenue breakdown
CAC & ROASAds + ShopifyCost per acquisition, return on ad spend
Sessions to ConversionGA4How many sessions before purchase
Bounce vs ReturnGA4Single-session vs multi-session conversion rates
High-Intent AbandonersGA4Users who reached cart/checkout but didn't purchase
Lost VisitorsGA4Users who left without engaging
Pages Per SessionGA4Depth of engagement vs conversion
Multi-Touch AttributionGA4First-touch, last-touch, and assisted attribution
New Users by SourceGA4First-time visitor acquisition channels
Cart AbandonmentGA4 + ShopifyDaily abandoned cart trends and recovery
Campaign MatrixAdsSpend, impressions, clicks, CTR, CPC by campaign
User Flow (Sankey)GA4 + ShopifyVisual flow: source → landing page → cart → purchase

URL Parameters

The dashboard supports direct linking via URL parameters:

/marketing?range=7     # Last 7 days
/marketing?range=30 # Last 30 days (default)
/marketing?range=60 # Last 60 days
/marketing?range=90 # Last 90 days

Attribution Model

How User Tracking Works

The dashboard uses two mechanisms to track users:

  1. GA4 First-Party Cookie (_gauser_pseudo_id): A first-party cookie set by GA4's gtag.js on eli.health. This identifies a browser across sessions. It is not a third-party cookie and is not affected by ITP/Safari restrictions (first-party cookies persist for 7 days in Safari, indefinitely in Chrome).

  2. UTM Parameters: When users arrive via paid campaigns or email links, UTM parameters (utm_source, utm_medium, utm_campaign) are captured by GA4 and stored in traffic_source.source, traffic_source.medium, and traffic_source.name.

Attribution Models Available

ModelHow It WorksDashboard Section
Last-ClickCredits the traffic source of the session where the purchase happenedChannel Attribution
First-TouchCredits the traffic source of the user's very first sessionMulti-Touch Attribution
AssistedCredits all channels the user touched between first visit and purchaseMulti-Touch Attribution

The Channel Attribution table uses last-click by default (GA4's traffic_source on the purchase event). The Multi-Touch Attribution section shows all three models side-by-side for comparison.

Revenue Attribution

Revenue figures are real Shopify order totals, not GA4 estimated revenue. The join is:

GA4 purchase event → transaction_id → Shopify orders.id → total_price

This gives 94% match rate. The 6% unmatched are typically test orders or orders that didn't fire GA4 events.


Behavioral Insights

Time to Purchase vs. Average Order Value

Analysis of 30-day data (January 2026) reveals that users who take longer to decide tend to spend more:

Conversion WindowPurchasers% of TotalAOV
Same session (< 1 hour)1,41481.6%$133.94
Same day (1-24 hours)1176.8%$137.76
Same week (1-7 days)1287.4%$152.82
Same month (7-30 days)734.2%$145.45

Key findings:

  • 82% of purchasers convert in the same session -- the product drives strong impulse conversion
  • AOV increases 14% from same-session ($134) to same-week ($153) buyers
  • Deliberate buyers (1-7 days) have the highest AOV, suggesting they may be selecting higher-value bundles
  • Same-month buyers drop slightly to $145, possibly due to discount-driven re-engagement

Implications for Marketing Strategy

  • Top of funnel (ads, landing pages): Should target emotional, fast decision-making. 82% of buyers convert immediately, so the first impression matters most.
  • Retargeting (1-7 day window): These users are deliberating and spending more. Retargeting creative should emphasize product value and bundle options rather than urgency.
  • Lifecycle communications: Once subscribed, the product experience is analytical (hormone data, trends). Messaging should shift from emotional to informational.


Document History

DateAuthorChanges
2026-01-27ChipAdd attribution model docs, behavioral insights (AOV by time-to-purchase), Cloud Function sync, marketing dashboard sections
2026-01-26ChipInitial investigation and documentation