A Synopsis of How to Lie with Statistics by Darrell Huff
When most people hear or read a statistic, they quickly have to decide if the numbers listed are valid or invalid. It is usually assumed that the author of the statistic is knowledgeable in the field to which the statistic pertains. However, on many occasions, the statistic is false, due to the author’s wording. Darrell Huff’s novel How to Lie with Statistics is a manual that can help individuals catch these lies. The novel allows readers to solve marketing ploys and dismiss certain statistics as faulty.

The first chapter focuses on bias. The book states that all statistics are based on samples, and these samples have bias. This means that no matter what the reader will have a biased opinion. This bias is spawned from the respondents replying dishonesty, the author choosing a sample that gives better results, and the availability of data. Huff uses a survey of readership of two magazines, which had refuting results. This is because, due to the readers’ personal biases, they answered the survey dishonestly. This example closes the chapter, teaching readers to always assume that the sample has a bias. The second chapter focuses on averages. It states that there are actually three types of averages: mean, median, and mode. Mean is the arithmetic average. Median is the name given to the midpoint of the date. Finally, mode is the data point that occurs the most often in the data. Thus, the type of average used can alter the results of the statistics. The next chapter explains how sample data is chosen to prove certain results. Many marketing campaigns use this technique. They choose sample sizes that give their wanted results. Huff’s solution is that one must determine if the information is a discrete quantity or if a range is involved. The following chapter discusses errors in measurement. It explains two measures for measuring error: Probable Error and Standard Error. The probable error uses the error in...

...How to Lie with Statistics Summary
There are some people that rely heavily on the statistical information provided by the media, government, and other research groups in order to form opinions or come to a conclusion on a particular idea or product. However they fail to realize that a lot of the time the data is manipulated in such a way that leads them to believe something that is not actually the case. Statistics canlie in many ways the first way is by using a sample that has a bias. For instance, the data collected would only be of one particular group of people, but they would claim it was the population. Another way data is manipulated is through averages. The data will be presented as the average, but the type of average that is taken is not given. For example is it the arithmetical average, median, or mode that is being used to present the data. This can completely skew the data one way or another. Furthermore, when data is presented the presenter can lie by leaving out certain things that will usually go unnoticed by the reader. In addition, many people make a big deal about something that doesn’t matter when using statistics, which leads the reader to believe that whatever the made a big deal about actually is significant. There could be a difference that is so tiny that it doesn’t have importance, however leaving out the range of error could also be a way of...

...How to Lie with Statistics Book Summary
The book How to Lie with Statistics written by Darrell Huff shows you howstatistics are used to mislead; sometimes unintentionally, other times on purpose. It gives the readers the knowledge necessary to intelligently question and understand the story behind the numbers. In other words, it shows the tricks the crooks use, so that honest men can use this knowledge for self defense.
I think it’s particularly useful for a manager or an executive to read and understand this book, because they are usually presented with a lot of numbers, graphs and charts and are expected to make decisions based on these numbers. People collecting and presenting the numbers to management could employ some of the tricks explained in this book and therefore, we should be careful when basing our decisions on those numbers.
It’s interesting that although this book was written in 1954, the concepts explained are just as pertinent today. Some salary figures seem to be outdated but the tricks remain pretty much the same.
The book starts with explaining the importance of sample selection and built-in bias. Sampling is critical in statistics because we can’t always count or observe every item in a population and therefore have to base our judgments on a selected sample. However, a sample with a built-in bias could...

...sound good.
Keep in mind that a statistic is only worthwhile when it satisfies the assumptions on the test. Knowing whether the assumptions are met is dependent on the competence of the person running the test.
Just because two things seem to have a relationship, could it have been by pure chance? It cannot be determined by causation and effect. The two variables have no effect on each other at all.
Chapter 9 – How to Statisticulate
Statisticulate is the process of misleading people using statistics. It is also misinforming with figures, or statistical manipulation might not be a mathematician purpose.
Lying with statistics – is this dishonesty or incompetence? Mostly dishonesty.
The author list various tricks – things like measuring profit on cost price, showing a graph with a finer Y-axis scale just to show the steep growth is, how income calculations mislead by involving children in the family as individuals for the average amongst a few.
Chapter 10 – How to Talk Back to a Statistic
In how we talk back to a statistic one should ask themselves to find out if the statistic that you are reading being presented is it genuine or not.
There are 5 simple steps, in Huff’s own words, “how to look a phoney statistic in the eye and face it down”. (page **)
Question 1 – Who says so?
Find...

...scheduling and network models.
Chapter 1 illustrates a number of ways to summarise the information in data sets, also known as
descriptive statistics. It includes graphical and tabular summaries, as well as summary measures
such as means, medians and standard deviations.
Uncertainty is a key aspect of most business problems. To deal with uncertainty, we need a basic
understanding of probability. Chapter 2 covers basic rules of probability and in Chapter 3 we
discuss the important concept of probability distributions in some generality.
In Chapter 4 we discuss statistical inference (estimation), where the basic problem is to estimate
one or more characteristics of a population. Since it is too expensive to obtain the population
information, we instead select a sample from the population and then use the information in the
sample to infer the characteristics of the population.
In Chapter 5 we look at the topic of regression analysis which is used to study relationships
between variables.
In Chapter 6 we study another type of decision making called decision analysis where costs and
proﬁts are considered to be important. The problem is not whether to accept or reject a statement
but to select the best alternative from a list of several possible decisions. Usually no statistical
data are available. Decision analysis is the study of how people make decisions, particularly
when faced with imperfect information or uncertainty....

...of 1000 flights and proportions of three routes in the sample. He divides them into different sub-groups such as satisfaction, refreshments and departure time and then selects proportionally to highlight specific subgroup within the population. The reasons why Mr Kwok used this sampling method are that the cost per observation in the survey may be reduced and it also enables to increase the accuracy at a given cost.
TABLE 1: Data Summaries of Three Routes
Route 1
Route 2
Route 3
Normal(88.532,5.07943)
Normal(97.1033,5.04488)
Normal(107.15,5.15367)
Summary Statistics
Mean
88.532
Std Dev
5.0794269
Std Err Mean
0.2271589
Upper 95% Mean
88.978306
Lower 95% Mean
88.085694
N
500
Sum
44266
Summary Statistics
Mean
97.103333
Std Dev
5.0448811
Std Err Mean
0.2912663
Upper 95% Mean
97.676525
Lower 95% Mean
96.530142
N
300
Sum
29131
Summary Statistics
Mean
107.15
Std Dev
5.1536687
Std Err Mean
0.3644194
Upper 95% Mean
107.86862
Lower 95% Mean
106.43138
N
200
Sum
21430
From the table above, the total number of passengers for route 1 is 44,266, route 2 is 29,131 and route 3 is 21,430 and the total numbers of passengers for 3 routes are 94,827.
Although route 1 has the highest number of passengers and flights but it has the lowest means of passengers among the 3 routes. From...

...Organization of Terms
Experimental Design
Descriptive
Inferential
Population
Parameter
Sample
Random
Bias
Statistic
Types of
Variables
Graphs
Measurement scales
Nominal
Ordinal
Interval
Ratio
Qualitative
Quantitative
Independent
Dependent
Bar Graph
Histogram
Box plot
Scatterplot
Measures of
Center
Spread
Shape
Mean
Median
Mode
Range
Variance
Standard deviation
Skewness
Kurtosis
Tests of
Association
Inference
Correlation
Regression
Slope
y-intercept
Central Limit Theorem
Chi-Square
t-test
Independent samples
Correlated samples
Analysis-of-Variance
Glossary of Terms
Statistics - a set of concepts, rules, and procedures that help us to:
organize numerical information in the form of tables, graphs, and charts;
understand statistical techniques underlying decisions that affect our lives and well-being; and
make informed decisions.
Data - facts, observations, and information that come from investigations.
Measurement data sometimes called quantitative data -- the result of using some instrument to measure something (e.g., test score, weight);
Categorical data also referred to as frequency or qualitative data. Things are grouped according to some common property(ies) and the number of members of the group are recorded (e.g., males/females, vehicle type).
Variable - property of an object or event that can take on different values. For example, college major...

...Professor Dumonceaux
Descriptive Statistics Paper
2 June 2014
Finding a New Home
According to Trochim, “Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data” (Trochim, 2006). For many years, many studies and researches have been done in real estate market. Buyers need to conduct researches to decide which house they will purchase. Buyers’ concerns include the price of the house, the number of bedrooms, and location. Real estate agents need to gather all the necessary information to provide their services to buyers. Additionally, the agents must be able to predict what types of houses are most likely to sell. In this paper, I will provide the summary of what I have been studying. The paper will include the measure of central tendency, dispersion, and skew for data. In addition, this paper will also contain graphic data as well as tabular data to demonstrate my findings and studies. In the end, conclusion will present whether my research findings answered the problem statement or if more research may be needed.
Examining the data collected for the current real estate market desires, following are the conclusions based on its findings. There are many key factors to consider when purchasing a home. Some of the factors include interest...

...central tendency of the sample.
6. Measures of dispersion: range, the interquartile range, the variance, and the standard deviation. What do these measures tell you about the “spread” of the data? Why is it important to spend time performing basic descriptive statistics prior to conducting inferential statistical tests?
Variance of a sample = S2 = =
Standard Deviation of sample S=
Range is the difference between the highest and the lowest values (250-100) = 150
Interquartile Range takes into consideration the fact that there are data extremes that affect the range. In the case of the data above, most of the values are around the median but two values (250 and 275) are extremes. In this scenario, Interquartile range is a better indication of the dispersion of the distribution
100 100 103 104 105 Q1 107 110 110 114 115 M 115 115 115 115 117 Q2 117 118 120 250 275
• Q1 = (105+107)/2 = 106
• Q2 = (117+117)/2 = 117
• IR = 117-106 =9
It is important to evaluate data and look at the entire picture to determine whether something fits or does not. The fact that we get two measurements that were extreme might be an indication that something may have gone wrong. Descriptive statistics in such a case becomes instrumental in our analysis
Type I and Type II Error: The concept of Type I and Type II Error is critical and will come into play with each statistical test you perform. Discuss the implications of...

