Weld logo
Blog image
July 09, 2024Tools & Tips

Navigating Data Differences Between Weld and Google Analytics 4

Author image
by Pedro Prazeres

With Weld you can integrate your Google Analytics 4 data for easy transformation, modelling, and combination with other data sources to build the valuable business insights you need.

However, as is normal when combining data from different platforms, you might see some differences between the values in your Google Analytics dashboard and the values imported through their Reporting API into WELD. Don't worry, your data is safe and correct! This is a common occurrence, and it's all due to Google's mechanisms for handling data in their GA4 processes.

In this post, we look into the reasons behind the data discrepancies you might encounter, and what to keep in mind when you need to balance precision and efficiency. Let's start by taking a look at some of the approaches Google Analytics takes to handle your data analysis.


Data Sampling

Google Analytics 4 uses data sampling, a process in which only a subset of a dataset is used to estimate the characteristics of the entire dataset. This allows faster data retrieval and processing, due to the smaller amounts of data involved.

>In Google Analytics, data sampling may occur when the number of events used to create a report, exploration, or request exceeds the quota limit for your property.

[[GA4] About data sampling - Analytics Help]

The quota limits for event-level queries are, as of the writing of this post, 10 million for standard Google Analytics properties and 1 billion for Google Analytics 360 properties. If data sampling is being used, this will be indicated by the *data quality* icon in the top right of the different cards and explorations in your Google Analytics 4 dashboard.

__wf_reserved_inherit

The higher the percentage of data used, the more accurate and better quality your results will be.


HyperLogLog

When performing an exact count of distinct items (or *cardinality*) in a large dataset significant amounts of memory and computing resources are needed. Therefore, to reduce heavy memory usage and provide fast results, Google Analytics 4 utilizes the HyperLogLog++ (HLL++) algorithm, an augmented version of the HyperLogLog algorithm.

The HLL++ algorithm estimates the cardinality of several metrics in GA4, giving an approximation of the total. What this means in practice is that the values in your Google Analytics 4 dashboard are provided in a quick and efficient manner, but they are *approximations*. For most cases, the approximation is quite accurate, with a low error rate.

However, when you connect Google Analytics 4 to your WELD account, the values of the same cardinalities will likely differ. This is due to where and how your data is stored and processed through WELD: the destinations we offer have the time and resources to perform the necessary calculations and, consequently, will give you precise results on the distinct counts of session metrics.

You can see the results of HLL++ in your own GA4 dashboard: the totals presented for some of the metrics do not correspond to the sum of the values in the corresponding columns:

__wf_reserved_inherit

As you can see below, the values are different when the same data is explored through WELD's SQL editor. For example, you total session count from the Organic Search channel might show a value of 3959 in GA4, and a total of 3955 in your WELD account.

__wf_reserved_inherit


Considerations

Whenever you are in need of a quick look at your Google Analytics data, the GA4 dashboard will give you fast results, albeit slightly inaccurate, due to the use of both Data Sampling and the HyperLogLog++ algorithm. But, if precision is what you need, having your data connected through WELD will allow you to use the full power of any of our destinations to easily calculate the values of all the metrics you need.

References

- [Data Sampling in GA4]

- [GA4 Session Definition]

- [HyperLogLog Algorithm]

- [Unique Count Approximation]

Continue reading

Weld logo

Tired of scattered data? Sync and analyze your data with AI in minutes. Connect to 150+ apps, files and databases.

Backed by leading investors
Frontline logoCherry logoInnnovation Fund logo
Twitter LogoLinkedIn Logo
GDPR logoSOC2
© 2024 Weld. All rights reserved.