Doing Statistical Analysis in Excel

Pages: 12 (3520 words) Published: June 19, 2013
Doing Statistical Analysis
Excel provides a large number of statistical functions. These and the tools in the Analysis ToolPak cover most types of statistical analysis that you would need to do in finance and include in your models.

Many people look at functions as "black boxes." You input the arguments, and the functions provide the answers. In a way this view is true, but it is also dangerous - you may use a function or tool incorrectly or misinterpret the results. So take the time to know the underlying statistical theory before using any of these functions or tools.

We will be using will use the monthly return data for the S&P 500 from Feb 1950 to Feb 2009. In column C 1 have calculated the monthly ordinary (uncompounded) returns using formulas such as = (B2-B3)/B3*100 in C2 and in column D the monthly continuously compounded returns using formulas such as =LN(B2/B3)*100 in D2. For financial work we generally use continuously compounded returns, although for the purpose of this chapter either return will be fine.

Obviously, we can calculate monthly returns starting one month after the date of the first data point, that is, starting in March 1950.

Descriptive Statistics
For any data set you often want to calculate certain descriptive statistics such as mean, standard deviation, and median. The Analysis ToolPak offers a tool called Descriptive Statistics that calculates in one step many of the most common descriptive statistics for a data set.

Example 1: Refer to the excel file: “Doing Statistical Analysis.xlsx” SP 500 Monthly Returns Worksheet. Perform Descriptive Statistical Analysis. Show the 5th largest and smallest data values in the set also.

Data Analysis Data Analysis Descriptive Statistics. In the Input Range enter the range for returns using the pointing method. Under Output options select New Worksheet Ply. Further down select all the four boxes, enter 95% next to Confidence Level for Mean and 5 in each of the two boxes below it. Click OK.

You are probably already familiar with the descriptive statistics calculated by this tool. Some more detail is as follows:

AVERAGE
Returns the arithmetic average (mean) of its arguments that are numbers. Includes zeros in the calculations but ignores text and other values.

GEOMEAN
Returns the geometric mean of an array or range of positive numbers. If there are 10 numbers in the dataset, they are multiplied and then the 10th root is taken (which is the same as raising it to the power of 1/10).

Example 2: If the annual return for an investment for the past three years were 15%, -20%, and 5%, calculate the arithmetic mean and geometric mean. Which one is correct?

As per arithmetic mean, if you had originally invested \$100 in this asset, you would have \$100 at the end of the three years. But this is not true. To determine the true average return, you have to calculate the compounded average return over the period by taking the geometric mean of 1.15, 0.8, and 1.05 and subtracting 1. The result will be -1.15%, which correctly reflects the fact that at the end of the 3-year period, the value of your investment would be \$96.60 and not \$100.

MEDIAN
Returns the median of the given numbers. Ignores texts, logical values, and empty cells but includes zeros. The median is the number in the middle of the data set, that is, half the numbers are greater than the median and half are smaller.

MODE
Returns the most frequently occurring, or repetitive, value in an array or range of data. Ignores texts, logical values, and empty cells but includes zeros.

STDEV
Estimates the population standard deviation assuming the data provided is a random sample. Logical values (such as TRUE and FALSE) and text are ignored. Standard deviation measures how widely the data is dispersed around its mean.

VAR
Estimates (population) variance based on a sample. Logical values (such as TRUE and FALSE) and text are ignored. This function parallels STDEV...