Can Ozcer

May 21, 2023

E-commerce businesses rely heavily on data analysis to make informed decisions about their operations. One of the most important metrics for any online retailer is the Average Order Value (also called average basket size). AOV measures the average value of each customer transaction. By understanding this metric, businesses can optimize their pricing strategies, promotional campaigns, free shipping tactics and product offerings to increase revenue. It is also one of the main levers of a popular E-Commerce Revenue equation: Sessions * E-Commerce Conversion Rate * Average Order Value.

However, just looking at your AOV can be misleading as averages tend to hide important details. In this blog post, we will explore a better way to conduct this type of analysis for e-commerce businesses using data from Google Analytics 4.

It can be easy to forget that averages are very prone to the impact of outliers. A single high-value order can completely skew your AOV number, potentially giving you the wrong insight.

Consider your store having the following 10 transactions for a given time range:

With only 10 values, it is easy to observe that one of these is not like the others!

If you get your AOV number from this set, you will end up with £98. Obviously, that is not very helpful (and actually, plain misleading!).

Transaction #10 has a much bigger value than the rest and is driving the AOV up to a level where none of the other transactions are close to (£73 is the second highest value in the dataset - more than 25% below the reported AOV).

Looking at your plain AOV number from Google Analytics (or any reporting environment) is potentially leading you to the wrong conclusion.

No, you should not increase your free shipping tier to £100 to increase your AOV - 90% of your transactions aren’t even close to this.

By the way, the AOV when Transaction #10 is excluded is £47. Much more sensible considering the 9 other transaction values. However, even this is not very useful. Let me explain…

Another problem here is the overall lack of actionability from a single AOV number. What does this mean? Do you get any tangible insights regarding how you can increase your AOV?

- Should you change your free shipping limit based on this observation?
- Should you increase your unit prices, or should you try to get more people add more products to their baskets?

AOV may be one of the main drivers of your store’s revenue, but by itself it is surprisingly useless in terms of providing insights and driving you to action. You should observe it. But you cannot really take actions with it.

There are many avenues you can explore to make better sense of this. Today, I will suggest a neat trick (almost a superpower, really) to completely overhaul your AOV reporting: **look at the distribution**!

Looking at the distribution of a dataset is a powerful tool for effective communication with data. The output is almost immediately intuitive and you don’t have to be a trained statistician to make sense of things and derive action points. The distribution reveals important stuff, like:

- How many transactions were below this average?
- How many were above it?
- What is the basket size that most people have on your website?

Answers to these questions can be a lot more action-oriented than just observing your AOV.

Oh, finally, it is worth knowing that this doesn’t only impact averages, but a wide array of statistical summary elements.

It has been a while since Francis Anscombe came up with his neat quartet showcasing how data points with very different distributions can have the exact same statistical summary - 50 years to be exact, since his publication in 1973!

I seriously suggest you read up on Anscombe’s quartet if you aren’t familiar with it.

While you are at it, another cautionary tale about the flaw of averages is how the U.S. air force used its pilots’ average measurements to design the ideal fighter jet cockpit (and failed miserably). If you haven’t heard of this one, give it a read - it will drive the point home.

For our showcase, we have a dataset with 24,805 transactions. They add up to about £1.26M. When we look at the AOV of these transactions, we end up with £50.68.

Let’s say that we are looking for ways to improve this AOV in order to increase our overall revenue. Obviously you can (and should) look at which product categories are driving this average up (and down). You should probably also look at the Units per Transaction (UPT) and Average Unit Price (AUP) metrics to reveal the composition of “big” transactions: do they include more items, or do they include more expensive items?

However, before these arguably cumbersome analyses, one of the immediately useful things you should do is to create a histogram of from this transaction dataset, revealing the distribution of transaction counts over transaction value intervals. Check the below chart:

Immediately, you can observe that many transactions are grouped in the £5-£35 range, with the peak being the £15-£20 interval.

There are certain step changes that are obvious, even without strict statistical analysis. The transaction count experiences high % drops and never go back to the levels before after these limits: £20, £35, £50, £95, £145.

There is a not-insignificant number of high-value orders (£200+, which is 4+ * AOV), driving the AOV up.

Now you know:

- Most transactions are between £15-20 (10% of all orders, in fact).
- Half the transactions are valued £40 and below.
- There seems to be certain order values that act as limits for different transaction segments.
- A small number of transactions are driving significant revenue with an AOV of £285 (the right-most £200+ column).

These insights can fuel potential actions that are much more useful than “let’s increase our AOV”. For really informed plans, you should go ahead and dive deeper in these segments, like understanding their item compositions. However, now you can form a more informed plan:

- Focus on the £15-£20 segment and push AOV-increasing initiatives
- Analyse £90-£95 and £140-£145 segments to figure out what makes the big step changes after these segments
- Figure out the details of high-value transactions - are they composed of more items or more expensive items? Can you create a personalisation plan for the customers conducting these transactions?

Remember, if you are able to move more volume from the left of the distribution to its right, your AOV is going to improve. You should still measure it. But the distribution is what gives you the more action-oriented insights (or at least, shows you where you need to focus your analysis on).

I am hoping you are now convinced about the usefulness of this tactic. Now, let’s take a look at how to build it with GA4 and Google Sheets.

Exploration is the custom reporting module of GA4. It is the ideal place to get the data we need, because our analysis requires a dataset with each individual transaction and its value. For more information on Exploration, you can check the official Google documentation.

If you have properly implemented e-commerce measurement with GA4, you are sending purchase events with successful transactions. These events have a transaction identifier called Transaction ID which is available as a dimension in Exploration.

GA4 reports different revenue types. For our purposes, using any of the following metrics is fine: E-commerce revenue, Total revenue, Purchase revenue.

In the end, your report configuration should look like this:

Now, we will export this dataset to a spreadsheet environment to continue our analysis as the GA4 interface does not allow for the custom calculations we need to make to create our histogram.

You can find the export options on the top right of the Explore report. You can download it as a CSV or simply take it to Google Sheets. We will do the latter:

When your data is in Sheets, you will end up with something like this:

Now, you need to clean up this sheet a bit and add the most important bit for our analysis: the transaction value intervals (otherwise known as bin size for a distribution).

Normally, it would be advised to deploy a proper statistical method such as the Freedman-Diaconis rule to decide the bin sizes for our dataset. However, we will go with a quick rule of thumb: Size your bins to be:

- 10-fold multiples of 5 or 10 (depending on your values this could be 5;10 or 50;100 or 500;1000, etc.)
- decide whether it should be 5 or 10 based on your AOV. Take 10% of your AOV and round it to the nearest 10-fold multiple of 5 or 10

In our case, the AOV is £50.68 - therefore we will take 10% of this and round it to 5 for our bin size (if our AOV were £506.80, we would decide our bin size to be 50).

To get this to our dataset, we will utilise the ROUNDDOWN function (both Excel and Google Sheets have this) on all transaction values. This formula configuration gives us the to find the nearest interval value that fits the bottom of the transaction value.

- 7.5 becomes 5
- 140.25 becomes 140
- 24.75 becomes 20

Essentially, we find the nearest multiple of our bin size that is smaller than the transaction value.

I hope the mathematical logic of this formula is clear. We simply:

- divide our value by our bin size
- round the result down to the nearest integer
- multiply it back by the bin size to get clean bin bottom values

Now, you need to create a pivot to summarise your data by turning transaction-level dataset into a bin-level dataset. Go ahead, select your transaction rows and use the Insert menu option to get a Pivot table.

Configure your Pivot to have *Revenue bin bottom *in your rows, and get count of Transaction ID as your values:

You are basically done - but let’s make sure we have a crystal clear data visualisation. Copy the contents of this pivot table to a new sheet and add a new column that features the actual bin label (*lower bound - upper bound*). For this, simply leverage the string concatenation shorthand (”&”) and concatenate the lower bound value with the upper bound (which is lower bound + bin size):

All that is left is to create a bar chart with your Interval & Transaction Count columns:

You can optionally (highly recommended!) make this a bit more user-friendly by grouping the long tail of high-value transactions (like in our initial example where transactions worth £200+ are grouped together, making the bulk distribution more “zoomed in”):

Congratulations! In 5 easy(ish) steps, you have built your transaction distribution histogram!

Building a histogram to understand the value distribution of your transactions is a valuable tactic, empowering you in building an action plan to improve your store’s AOV.

AOV should still be a KPI you monitor - but without the distribution view, you will have very limited data-driven opportunities to improve it.

Your web analytics tool (which is most likely Google Analytics) already collects the data you need to achieve this.

If you already have a team of stats-savvy data analysts, hopefully you are already doing are lot more than this analysis. But for a lot of e-commerce executives looking for simple and better ways, I hope this guide proves useful in making data-driven decisions

There are many ways to improve this. For example, here we were concerned with only a single transactional dataset for a single date range. What about the evolution of this distribution over time? You probably know if your AOV has decreased or increased and by how much. But what about visualising the distribution? In a subsequent post, we will explore how to leverage the power of generative AI in order to get some Python guidance to visualise this basket size evolution across several months.

Grow your business.

Today is the day to build the business of your dreams. Share your mission with the world — and blow your customers away.

Start Now