Quantcast
Channel: Ardonio Ltd » business
Viewing all articles
Browse latest Browse all 16

Understanding Product Tradeoffs through Modelling – Step 2

$
0
0

In the previous post we built a static P/L for a product, modelled out to a five year period. The key to understanding that model is in moving away from what I would call “first level” numbers and move towards making assumptions for the factors that drive these. In this post we’ll take a closer look at one in particular: the conversion percentage. Previously we assumed this was a static 2%. In this post we’ll go away from this single number and start working with distributions. Because we will introduce some dynamic elements in our model, we’ll need a way of understanding the variability in outcomes and summarising those.

Estimates are hardly ever accurate, and usually not very precise either, so don’t use them as if they were

There is again an excel file, here, to accompany this post. This sheet is currently in a mode that does not automatically update formula results when you make changes. I’ve done this because the sheet uses random variables and data tables, which slow down using the sheet a fair bit. So for browsing just leave it in this mode. If you want to see it update, on windows you need to go to the options menu (click the office button in newer versions) and then go to the “formulas” category. There you’ll find calculation options. Under workbook calculations you should select “automatic”, which updates everything and is the default in excel. Mac users, this is under preferences (Cmd+,), calculation and there under “calculate sheets” you’ll find the same options.

Building the distribution

There are several techniques you can use to estimate the distribution of a variable. From a purely mathematical approach based on a mean and standard deviation, but just as easily you could rely on historical data from similar products. In this particular case I ended up discussing confidence levels with the client, basically stating “we are X% sure we will hit this conversion rate”. I have overlaid this confidence graph on the distribution in the chart below.

EstimatedConversionConfidence

As you can see, in this case, we’re 90% sure to hit a conversion rate of 1.5% or better. It drops to 50% for a conversion of at least 1.8%. I modelled a longer tail on one end to illustrate it doesn’t have to be a perfect normal distribution. In fact, it can be any kind of distribution you like.

Using the distribution

It’s time to start integrating this variation in our existing model. I will integrate this in 2 ways. First we’ll allow the conversion rate to change from one year to the next. Second we’ll make sure the conversion rate is actually taken based on the distribution we just created.

The first is relatively easy to do. Where before we used to multiply “valid customers” with our fixed conversion percentage, we will now put a separate conversion percentage for every year.

The second is a combination of 2 Excel functions: rand() and vlookup(). .

A vlookup (vertical lookup) in excel allows to look for a certain value in the first column of a table and then pick a corresponding value from another column. There are a few caveats in that. The first column of the lookup table needs to be sorted in ascending order and, by default, it does not search for an exact match but takes the value closest below or equal to the target. We’ll be using these 2 features to our benefit. You’ll notice in the spreadsheet I created a first column by summing the individual probabilities, this is the trick to creating the first column to a vlookup based on your desired distribution.

TheoryRandom

Rand() generates a random number between 0 and 1 (but never 1 itself) with equal probabilities. So when we look up the result of rand() in our cumulative distribution (which neatly runs to 100%, i.e. 1), we’ll get an outcome according to our desired distribution. To prove that point, in the graph above I compared the estimated distribution with the result of taking 100,000 random numbers in a vlookup on our estimated distribution.

Multiple experiments

We now have a way to do a single run but by picking a value from this distribution each time it’s clear that our model does not generate a single answer anymore. To illustrate this I simply put 5 different P/L outcomes in one chart. And as expected you’ll see variation in the outcome.

5PL

Knowing there is variation leads to 2 observation.

First, we need an easy way to do a lot of these and, second, we need some way to summarise the results.

In order to answer those 2 problems, we need to make a sidestep from converting all this information to a single number. Allow me to introduce “lifetime value” of the product. Simply defined as the sum of the yearly profit/loss over the lifetime of the product. If you’re familiar with Net Present Value (NPV) calculations, this is the same as an NPV with 0% discount rate (I.e. In constant $, an assumption we made to keep things simple). So going forward I’ll be talking about this number as the result of an experiment.

We can now solve the first part of the problem. How do we run a lot of experiments quickly? The answer is what’s called a monte carlo simulation. If that sounds like too much theoretical statistics to handle, think of it as doing what we did just yet but many times over and writing down the result after each experiment. That’s exactly how we’ll do it in excel, by using a data table. I’ll go into detail on data tables in the next post so for now just construct it without worrying too much about what exactly you’re doing.

DataTable

Constructing a data table is relatively straightforward. Just label the experiments you want to run (for us 1 to 100 in column A) in the first column, label the next column (cell B2) for results and put the outcome (the result from a single experiment, i.e. the value you want to store) in the top left corner (cell A2). Than select the whole lot (so you’re selecting from cell A2 to B102 in my spreadsheet). and click in the menu on data —> data table and select 2 empty cells (outside the selection) for the 2 inputs. What excel will now do is for every result cell in your selection, it will calculate your entire sheet and write down the outcome value in that result cell. So you’ve just run 100 experiments with random variables and have the results in a table. Exactly what we wanted, so that’s the first part of the problem taken care of.

Summarizing the results

You now have a number (100 in our case) of different possible results. That’s nice but a bit clumsy to work with so we need a way to summarise. I will show 3 different parts that together form a good summary of this data: base statistics, standard deviation table and a histogram showing the distribution of the results. Current versions of Excel on windows provide standard functionality to generate these, however I am using the actual formulas and techniques so mac users can follow along or you can try and find the equivalent formulas/techniques in google drive, open office, … .

The first table is what I consider base statistics, if you are going to look at this kind of dataset these would be the first and minimum you generate. Mean is what most people refer to as average, the sum of all values divided by the number of values. Standard deviation is a measure for how far the results are away from the mean. The higher the standard deviation, the further numbers will be drifting away from the mean. I hope I don’t need to explain min and max… . Next is the upper and lower bound of the 95% confidence interval. The “95% confidence interval” is the range in which you are 95% sure the average result will be. Please do not confuse this with 95% sure the result will be in this range!! Mean is an exact number and people tend to stare blindly at that number, but it is pretty useless without knowing the standard deviation. Because the interval is created based on a fraction of standard deviation above and below the mean, it’s clear to see the range will vary with larger or smaller standard deviations. Therefore you get a range that reflects your uncertainty in results.

The 95% interval combines the mean and standard deviation into a range that is easier to digest

The next part is to graph the results distribution. The first thing to do is define the intervals (“bin” as excel calls these), in fact you only define the upper bound. Because we have min and max in the previous table you can pretty easily make a judgement call on how many bins you want and what the bounds are. In our case I started at $150,000 and incremented each bin by $50,000 up to 1 million. Next select the cells where you want the results to be and type “=FREQUENCY(“ you can now select the data (I.e. The results from the data table) first and the bins (which you just defined) second. Then press ctrl+shift+enter this will make an array in excel and put the frequency in place. It basically counts the number of results that are in each bin, you could do the same by using the COUNTIF() function if you wanted but it’s a bit more work. All that’s left is to plot that in a column chart like I did here

ResultsDistribution

From the graphic you can already tell that this particular result set is rather close to a normal distribution. Some people will have heard of the 68, 95, 99.7 rule. In an experiment with normally distributed results, 68.3% of results are in a range that is one standard deviation above and below the mean. 95.4% of results is within 2 standard deviations away from the mean and 99.7% is within 3 standard deviations away. And so, many assumptions are made with people assuming their data is normally distributed. The standard deviation table will help you understand how normal your results are (not) distributed. This can have some fairly serious ramifications. Constructing it is easy because we already have the mean and standard deviation from our base statistics. So it’s just a bit of standard arithmetic, as you can see below (and in the excel). It’s clear that our result distribution has slightly more body (inside 2 standard deviations) but less outliers. Again, this is based on 100 experiments, you probably want to ramp this up quite significantly if you are doing real scenarios.

Standard Deviation Table
1 2 3
lower bound $416,993.33 $296,531.05 $176,068.76
upper bound $657,917.89 $778,380.17 $898,842.46
actual count 68 97 100
theory count 68 95 99.7
actual % 68.00% 97.00% 100.00%
theory % 68% 95% 99.70%

In conclusion

This has been a fairly excel heavy post and we introduced a couple of key concepts and related techniques.

First, we have a way of modelling insecurity about a certain variable by applying any kind of distribution to it, which we can incorporate in our model easily.

Second we have a way of running a lot of trials very quickly in excel. Because the law of large numbers applies to these scenarios I would encourage to go way beyond the 100 I used in the download. This was just as a demo and it keeps the excel file relatively light.

Third we also have a way of summarising the results from these runs, this will be useful in the future as we start to introduce more and more variation in the model

In the next step we’ll take a more detailed look at using data tables to run various scenarios and how to interpret those results.


Viewing all articles
Browse latest Browse all 16

Trending Articles