There are 4 basic sampling methods we have learned do far: simple random sampling, stratified sampling, clusters and systematic sampling. When we do experiments we need to use the right sampling method in order to make the experiment useful and successful. First, simple random sampling; it gives a sample selected in a way that gives every different sample of size n an equal chance of being selected. Second, stratified sampling; it divides a population into subgroups and then takes a separate random sample from each stratum. Next, cluster sampling; it divides a population into subgroups and forms a sample by randomly selecting clusters and includes all individuals or objects in the selected clusters in the sample. The last one is systematic sampling; a sample selected from an ordered arrangement of a population by choosing a starting point at random from the first n individuals on the list and then selecting every nth individual thereafter. In an experiment we want to select and test the sex of a pond of fish. We cannot use census so we need to use sampling. So we use the simple random sampling.

Stem plot A stemplot (or stem-and-leaf display), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. Unlike histograms, stemplots retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and non-parametric statistics. To construct a stem plot, the observations must first be sorted in ascending order: this can be done most easily if working by hand by constructing a draft of the stem and leaf plot with the leaves unsorted, then sorting the leaves to produce the final stem and leaf plot. Here is the sorted set of data values that will be used in the following example: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 Next, it must be determined what the stems will...

...
Contents
Question 1 3
Question 2a 5
Question 2b 6
Question 2c 7
Question 3a 8
Question 3b 8
Question 3c 10
Question 3d 11
Question 4 12
Question 5 14
References 15
Question 1
The samplingmethod that Mr. Kwok is using is Stratified Random SamplingMethod. In this case study, Mr Kwok collected a random sample of 1000 flights and proportions of three routes in the sample. He divides them into different sub-groups such as satisfaction, refreshments and departure time and then selects proportionally to highlight specific subgroup within the population. The reasons why Mr Kwok used this samplingmethod are that the cost per observation in the survey may be reduced and it also enables to increase the accuracy at a given cost.
TABLE 1: Data Summaries of Three Routes
Route 1
Route 2
Route 3
Normal(88.532,5.07943)
Normal(97.1033,5.04488)
Normal(107.15,5.15367)
Summary Statistics
Mean
88.532
Std Dev
5.0794269
Std Err Mean
0.2271589
Upper 95% Mean
88.978306
Lower 95% Mean
88.085694
N
500
Sum
44266
Summary Statistics
Mean
97.103333
Std Dev
5.0448811
Std Err Mean
0.2912663
Upper 95% Mean
97.676525
Lower 95% Mean
96.530142
N
300
Sum
29131
Summary Statistics
Mean
107.15
Std Dev
5.1536687
Std Err Mean...

...
Sampling methodologies
Sampling
It may be defined as a process of selecting units that may be people, organizations etc, from a larger whole i.e. from a population of interest, so that by studying the sample we may come up with general characteristics of the entire population under consideration.
Types of samplingmethods:
Probability sampling
Probability sampling is a type ofsampling that includes random selection. And in order to achieve random selection, it must be made sure that different units of population have equal probability of being chosen.
Some relevant terms:
N = the number of cases in the sampling frame
n = the number of cases in the sample
f = n/N = the sampling fraction
I] Simple Random Sampling
It is the simplest type of probability sampling, wherein the probability of an element getting selected is directly proportional to its frequency. It is equivalent to say that every element has the same probability of getting chosen if they have the same frequency. For example in a random number generator each element has the same frequency and hence the same probability i.e. f=n/N.
It may be the simplest method but it is not considered as the statistically efficient.
II] Systematic Random sampling:
In systematic sampling, we...

...Trajico, Maria Liticia D.
BSEd III-A2
REFLECTION
The first thing that puffs in my mind when I heard the word STATISTIC is that it was a very hard subject because it is another branch of mathematics that will make my head or brain bleed of thinking of how I will handle it. I have learned that statistic is a branch of mathematics concerned with the study of information that is expressed in numbers, for example information about the number of times something happens. As I examined on what the statement says, the phrase “number of times something happens” really caught my attention because my subconscious says “here we go again the non-stop solving, analyzing of problems” and I was right. This course of basic statistic has provided me with the analytical skills to crunch numerical data and to make inference from it. At first I thought that I will be alright all along with this subject but it seems that just some part of it maybe it is because I don’t pay much of my attention to it but I have learned many things. I have learned my lesson.
During our every session in this subject before having our midterm examination I really had hard and bad times in coping up with this subject. When we have our very first quiz I thought that I would fail it but it did not happen but after that, my next quizzes I have taken I failed. I was always feeling down when in every quiz I failed because even though I don’t like this...

...1. A density curve consists of a straight line segment that begins at the origin (0, 0) and has
slope of 1.
a. Sketch this density curve. What are the coordinates of the right endpoint of the segment?
(Note that the right endpoint should be fixed so that the total area under the curve is 1.
This is required for a valid density curve.)
b. Determine the median, the first quartile (Q1), and the third quartile (Q3).
c. Relative to the median, where would you expect the mean of the distribution to lie?
Explain briefly.
The distribution is skewed left, so the mean will be left of the median.
d. What percent of the observations lie below 0.5? Above 1.5?
2. The following are the salaries of employees in a small business:
15,000 15,000 17,500 23,500 23,750 25,000 26,000 27,500 29,000 45,000
a. Before using your calculator, do you think that the mean or the median for these values
will be higher? Why do you think so?
b. What are the mean and median of the salaries? Was your answer to (a) correct?
c. Find the standard deviation and the IQR of the data
d. Describe the shape of the distribution terms of its overall shape, including outliers and
skewness. Be sure to give the locations of the main features
e. Speculate on the reasons for why the distribution is shaped as it is.
3. The following represents the outcomes of rolling a die 600 times (actually, it was
simulated on the TI83 as follows: RandInt (1,6,600)_L1).
Face
1
2
3 ...

...population with a specific distribution.
The Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function (ECDF). Given N ordereddata points Y1, Y2, ..., YN, the ECDF is defined as
\[ E_{N} = n(i)/N \]
where n(i) is the number of points less than Yi and the Yiare ordered from smallest to largest value. This is a step function that increases by 1/N at the value of each ordered data point.
The graph below is a plot of the empirical distribution function with a normal cumulative distribution function for 100 normal random numbers. The K-S test is based on the maximum distance between these two curves.
Characteristics and Limitations of the K-S TestAn attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid). Despite these advantages, the K-S test has several important limitations:
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation....

...descriptive statistics to summarize the training time data for each method. What similarities or differences do you observe from the sample data?
Descriptive analysis in excel has been used to come up with relevant figures of the given data samples which is tabulated below:
Descriptive Statistics | Current | Proposed |
Mean | 75.06557 | 75.42623 |
Standard Error | 0.505094 | 0.32091 |
Median | 76 | 76 |
Mode | 76 | 76 |
Standard Deviation | 3.944907 | 2.506385 |
Sample Variance | 15.5623 | 6.281967 |
Kurtosis | -0.06933 | 0.58694 |
Skewness | -0.22053 | -0.28749 |
Range | 19 | 13 |
Minimum | 65 | 69 |
Maximum | 84 | 82 |
Sum | 4579 | 4601 |
Count | 61 | 61 |
Analysis of descriptive statistics shows that both the current and the proposed plan have almost similar mean completion hours which stand at 75.06 and 75.42 for the current and proposed respectively. Both the plans have exact same median and mode. However, the standard deviation in the current plan (3.94) is higher than that in the proposed plan (2.56), which is ultimately leading to the higher variance in the current plan. This suggests that the completion hours are more dispersed the mean value in the current plan, hence the mean does not give the true picture of data distribution whereas in the proposed plan, data for completion hours is comparatively more congregated.
2. Use the methods of Chapter 10 to...

...committed when a. a true alternative hypothesis is not accepted b. a true null hypothesis is rejected c. the critical value is greater than the value of the test statistic d. sample data contradict the null hypothesis ANSWER: b In determining an interval estimate of a population mean when σ is unknown, we use a t distribution with a. n − 1 degrees of freedom
3.
b. c. d. ANSWER: 4.
n degrees of freedom
n − 1 degrees of freedom n degrees of freedom c
The purpose of statistical inference is to provide information about the a. sample based upon information contained in the population b. population based upon information contained in the sample c. population based upon information contained in the population d. mean of the sample based upon the mean of the population ANSWER: b
FMBA SQA Final Exam
Prof. Kihoon Kim Oct. 10, 2012
2. (10 points) A researcher is interested in estimating the average number of years employees of a company stay with the company. If past information shows a standard deviation of 7 months, what size sample should be taken so that at 95% confidence the margin of error will be 2 months or less? ANSWER: 48
3. (10 points) The average lifetime of a light bulb is 3,000 hours with a standard deviation of 696 hours. A simple random sample of 36 bulbs is taken. Please describe the sampling distribution of the average life in a sample of 36 bulbs. Ans. it follows a normal distribution with the mean of 3,000 and...

...Cluster Sampling
Cluster sampling, also called block sampling. In cluster sampling, the population that is being sampled is divided into groups called clusters. Instead of these subgroups being homogeneous based on selected criteria as in stratified sampling, a cluster is as heterogeneous as possible to matching the population. A random sample is then taken from within one or more selected clusters. For example, if an organization has 30 small projects currently under development, an auditor looking for compliance to the coding standard might use cluster sampling to randomly select 4 of those projects as representatives for the audit and then randomly sample code modules for auditing from just those 4 projects. Cluster sampling can tell us a lot about that particular cluster, but unless the clusters are selected randomly and a lot of clusters are sampled, generalizations cannot always be made about the entire population. For example, random sampling from all the source code modules written during the previous week, or all the modules in a particular subsystem, or all modules written in a particular language may cause biases to enter the sample that would not allow statistically valid generalization.
Advantages
ü There is no need to have a sampling frame for the whole population.
ü usually less costly comparing to random sampling...