Simple Statistics with Excel and Minitab

Elementary Concepts in Statistics

Multiple Regression

ANOVA

Elementary Concepts in Statistics

Overview of Elementary Concepts in Statistics. In this introduction, we will briefly discuss those elementary statistical concepts that provide the necessary foundations for more specialized expertise in any area of statistical data analysis. The selected topics illustrate the basic assumptions of most statistical methods and/or have been demonstrated in research to be necessary components of one's general understanding of the "quantitative nature" of reality (Nisbett, et al., 1987). Because of space limitations, we will focus mostly on the functional aspects of the concepts discussed and the presentation will be very short. Further information on each of those concepts can be found in the Introductory Overview and Examples sections of this manual and in statistical textbooks. Recommended introductory textbooks are: Kachigan (1986), and Runyon and Haber (1976); for a more advanced discussion of elementary theory and assumptions of statistics, see the classic books by Hays (1988), and Kendall and Stuart (1979).

• What are variables?

• Correlational vs.

experimental research

• Dependent vs. independent

variables

• Measurement scales

• Relations between variables

• Why relations between

variables are important

• Two basic features of every

relation between variables

• What is "statistical

significance" (p-value)

• How to determine that a

result is "really" significant

• Statistical significance and

the number of analyses

performed

• Strength vs. reliability of a

• Why significance of a relation between

variables depends on the size of the sample

• Example: "Baby boys to baby girls ratio"

• Why small relations can be proven

significant only in large samples

• Can "no relation" be a significant result?

• How to measure the magnitude (strength) of

relations between variables

• Common "general format" of most statistical

tests

• How the "level of statistical significance" is

calculated

• Why the "Normal distribution" is important

• Illustration of how the normal distribution is

used in statistical reasoning (induction)

• Are all test statistics normally distributed?

• How do we know the consequences of

violating the normality assumption?

Statistics with Ms Excel 2

relation between variables

• Why stronger relations

between variables are more

significant

Use of Excel for Statistical Analysis

Neil Cox, Statistician, AgResearch Ruakura

Private Bag 3123, Hamilton, New Zealand

16 May 2000

This article gives an assessment of the practical implications of deficiencies reported by McCullough and Wilson (1999) in Excel’s statistical procedures. I outline what testing was done, discuss what deficiencies were found, assess the likely impact of the deficiencies, and give my opinion on the role of Excel in the analysis of data. My overall assessment is that, while Excel uses algorithms that are not robust and can lead to errors in extreme cases, the errors are very unlikely to arise in typical scientific data analysis in AgResearch.

THE DEFICIENCIES OF EXCEL’S STATISTICAL ALGORITHMS

What Aspects Were Examined?

Excel’s calculation of distributions (tail probabilities), mean and standard deviation calculations, analysis of variance, linear regression, non-linear regression (using Solver) and random numbers were scrutinised using data sets designed to reveal any shortcomings in the numerical procedures used in the calculations of statistics packages. The distributions were tested by Knusel (1998), the other aspects by McCullough and Wilson (1999). McCullough (1998, 1999) describes the methodology and the performance of SAS, SPSS and S-Plus.

How Did Excel Rate?

Generally Excel performed worse than the 3 statistics packages (SAS, SPSS, S-Plus) also examined, particularly in the non-linear regression problems. See below for...