Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
ga4_documentation_anomaly_detection.sql	ga4_documentation_anomaly_detection.sql

Anomaly Detection

GA4 has anomaly detection, but it will also report anomalies based on changes in traffic. This anomaly detection tries to avoid reporting anomalies caused by session fluctuations.

This functionality is in BETA
Anomaly detection flags Events or Parameters with significant spikes or drops that aren’t explained by session fluctuations.
- Event anomalies are detected independently across platforms.
- Parameter anomalies are detected independently across scopes, platforms and events.
  - Parameter anomalies are only flagged if their anomaly aren’t explained by an Event.
New Events and Parameters detected are flagged independently.

This helps you identify if something is potentially broken or has changed.

Anomalies are detected using Standard Deviation
- You can choose between adjusted for day of week (dayofweek), or standard (recommended).

Anomaly Detection Setup

Create 1 Scheduled Query
Create 1 Logs Router
Create 1 Cloud Run Function

Anomaly Settings

Anomaly settings can be adjusted in Google Sheet in Advanced Settings.

Anomaly Query Periods settings

Declaration	Default	Comment
Day Interval Short	1	Number of days to check for anomalies (e.g., last 1 day). Declared in query as day_interval_short.
Day Interval Extended	28	Number of days to query the first time to get some event & parameter count data. If you have lot's of data, cost may occour if you are selecting a long period. For anomaly detection you need at least 28 days of data, but longer is better. Declared in query as day_interval_extended.
Minimum Number of Days before Anomaly Detection	28	Minimum number of days of data collected before running anomaly detection. Declared in query as days_before_anomaly_detection.
Rolling Statistics Interval	90	Number of Days for rolling statistics (e.g., last 56 days). With standard deviation model, 56 days (as minimum) is recommended. With day of week adjustment, 84 (as minimum) is recommended. Declared in query as day_interval_large.

Anomaly Detection Settings

Declaration	Default	Comment
Minimum Expected Count Threshold	10	Minimum expected count threshold for anomaly detection. If expected count is equal to or lower than this number, no anomaly detection will be run. Delcared in query as min_expected_count.
Standard Deviation Multiplier	3	Multiplier for standard deviation. Standard deviation for events and parameters. Scale goes from 1 to 3. Default setting is 3; lower sensitivity, fewer false positives. Declared in query as stddev_multiplier.
Events Explained by Sessions Threshold	0.1	If an event anomaly is reported, and should have been explained by changes in sessions, increase the number (going higher than 0.2 may hide real anomalies). Decrease the number for the opposite scenario. Declared in query as events_explained_by_sessions_threshold.
Parameters Explained by Sessions Threshold	0.1	If an parameter anomaly is reported, and should have been explained by changes in sessions, increase the number (going higher than 0.2 may hide real anomalies). Decrease the number for the opposite scenario. Declared in query as parameters_explained_by_sessions_threshold.
Standard Deviation Model Setting	standard	Standard Deviation model can either be standard or dayofweek. dayofweek = adjusted for day of week. standard = not adjusted for day of week. Day of Week: More accurate in detecting true anomalies by considering natural day-of-week fluctuations, but may fail if the day-of-week effect varies seasonally or due to external factors. Standard: Works well for detecting overall trends and anomalies unrelated to weekly patterns. May work better with seasonally changes. Declared in query as stddev_model_setting.

Scheduled queries logic

All scheduled queries have this logic:

If events_fresh_ table exist (GA 360 only), query only this table including today.
- If events_fresh_ doesn't exist, query events_ table until yesterday.
  - If yesterday doesn't exist in events_ table, query events_intraday_ (if the table exist), between yesterday and today.
    - Else query events_intraday_ only for today.

Scheduled queries settings

Replace your-project.analytics_XXX with your project and data set
Settings can be edited in Google Sheet in the Advanced Settings sheet.

Scheduled query	Comment
ga4_documentation_anomaly_detection	This query creates 2 tables: ga4_documentation_anomaly_detection ga4_documentation_anomaly_detection_session_counts

Scheduled queries should use On-demand Repeat frequency
Do NOT tick the checkbox Set a destination table for query results. That logic is handled within the SQL query
Decide if you need to specify Location type (Ex. Multi-region and EU)
Click Save

The complete setup works like this: When a INSERT is made to the ga4_documentation_parameters_daily_counts table, a Cloud Run Function will automatically run the scheduled query.

Create the Logs Router

Go to Logs Router, and click the CREATE SINK button.
- Give the sink a name, ex. ga4_documentation_anomaly_update.
- Choose Google Cloud Pub/Sub topic as the destination.
- From the list of available Pub/Sub topics, click to create a new topic.
- Create a Topic ID, ex. ga4_documentation_anomaly_update.
- In the Build inclusion filter, copy the filter below, but replace analytics_XXX with your Dataset ID.

Build inclusion filter

protoPayload.methodName="jobservice.jobcompleted"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId="analytics_XXX"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.statementType="INSERT"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId="ga4_documentation_parameters_daily_counts"

Google Cloud Run Functions

You have to create 1 Cloud Run Function.

Go to Cloud Run, and click Write a function.

Configuration page
- Keep environment as 1st gen.
- Give the function a descriptive name.
- Choose Cloud Pub/Sub as the trigger type.
- Select the Pub/Sub topic you created in the previous chapter as the trigger.
- Check the box Retry on failure.
- Click Save to save the trigger settings.
- Ignore the Runtime, build, connections and security settings accordion and click Next to continue.
Code page
- Keep Node.js as the runtime (choose the latest non-Preview version).
- Click package.json to edit its contents.
  - Add the following code:
```
 {
 	"dependencies": {
 	"@google-cloud/bigquery-data-transfer": "^5.0.1",
 	"@google-cloud/functions-framework": "^3.5.1"
 	}
 }
```
- Select index.js from the list of files to activate the code editor.
  - Edit the Entry point field on top of the editor to be runScheduledQuery.
  - Copy-paste the index.js code below into the editor.
  - Change the projectId value to match your Google Cloud Platform project ID.
  - To get values for region and configId, browse to scheduled queries, open your scheduled query, and click the Configuration tab to view its details.
  - region value should be the Google Cloud region of the Destination dataset, so click through to that to check if you don’t remember what it was.
  - configId is the GUID at the end of the Resource name of the scheduled query.

index.js

const functions = require('@google-cloud/functions-framework');
const bigqueryDataTransfer = require('@google-cloud/bigquery-data-transfer');

functions.cloudEvent('runScheduledQuery', async (cloudEvent) => {
  // Update configuration options.
  const projectId = 'REPLACE-THIS';
  const region = 'REPLACE-THIS';
  const configId = 'REPLACE-THIS';

  // Create the run time directly in code.
  // For example, use today's date at 12:00 UTC.
  const now = new Date();
  const runTime = new Date(Date.UTC(
    now.getUTCFullYear(),
    now.getUTCMonth(),
    now.getUTCDate(),
    12  // 12:00 UTC — adjust if necessary
  ));

  // Create a proto-buffer Timestamp using the client's protos.
  const requestedRunTime = bigqueryDataTransfer.protos.google.protobuf.Timestamp.fromObject({
    seconds: Math.floor(runTime.getTime() / 1000),
    nanos: (runTime.getTime() % 1000) * 1e6
  });

  // Create the BigQuery Data Transfer client.
  const client = new bigqueryDataTransfer.v1.DataTransferServiceClient();
  const parent = client.projectLocationTransferConfigPath(projectId, region, configId);

  // Build the request.
  const request = {
    parent,
    requestedRunTime
  };

  // Trigger the manual transfer run.
  const response = await client.startManualTransferRuns(request);
  console.log(`Scheduled query triggered at ${runTime.toISOString()} for config ${configId}`);
  return response;
});

};

Testing scheduled queries setup

To test the setup, simply go to the Google Sheet, select the 📈 GA4 Documentation Menu at the top of the sheet, and select BigQuery -> Export Event & Parameter Documentation.

This will run a BigQuery query using Apps Script. If this is completed without errors, you should now see 2 anomaly tables in BigQuery.

Overview over tables created in BigQuery

ga4_documentation_anomaly_detection is the BigQuery table that you will use in Looker Studio.

ga4_documentation_anomaly_detection
ga4_documentation_anomaly_detection_session_counts

Table: ga4_documentation_anomaly_detection

Field name	Type	Comment
event_date	DATE	Event Date
platform	STRING	Platform can be WEB, IOS or ANDROID
event_or_parameter_name	STRING	event_name or parameter_name
event_or_parameter_type	STRING	Can be either event or parameter
actual_count	INTEGER	Actual Count for the Event or Parameter
expected_count	FLOAT	Standard Deviation Expected Count
anomaly_description	STRING	Anomaly described as text
net_change_percentage	FLOAT	Anomaly change expressed as percent in the format 0.1 = 10%, 1 = 100% etc.
parameter_scope	STRING	Parameter Scope if the anomaly is for a parameter
event_name	STRING	Event Name. Relevant for parameter anomaly
upper_bound	FLOAT64	Upper Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity.
lower_bound	FLOAT64	Lower Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity.

Table: ga4_documentation_anomaly_detection_session_counts

Field name	Type	Comment
event_date	DATE	Event Date
platform	STRING	Platform can be WEB, IOS or ANDROID
session_count_total	INTEGER	Total count of sessions for platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Anomaly Detection

Anomaly Detection Setup

Anomaly Settings

Anomaly Query Periods settings

Anomaly Detection Settings

Scheduled queries logic

Scheduled queries settings

Create the Logs Router

Build inclusion filter

Google Cloud Run Functions

index.js

Testing scheduled queries setup

Overview over tables created in BigQuery

Table: ga4_documentation_anomaly_detection

Table: ga4_documentation_anomaly_detection_session_counts

FilesExpand file tree

Anomaly-Detection

Directory actions

More options

Directory actions

More options

Latest commit

History

Anomaly-Detection

Folders and files

parent directory

README.md

Anomaly Detection

Anomaly Detection Setup

Anomaly Settings

Anomaly Query Periods settings

Anomaly Detection Settings

Scheduled queries logic

Scheduled queries settings

Create the Logs Router

Build inclusion filter

Google Cloud Run Functions

index.js

Testing scheduled queries setup

Overview over tables created in BigQuery

Table: ga4_documentation_anomaly_detection

Table: ga4_documentation_anomaly_detection_session_counts