1. What is the difference between R2 and adjusted R2?

R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. The adjusted R2 can be negative, and will always be less than or equal to R2. Adjusted R2 does not have the same interpretation as R2. As such, care must be taken in interpreting and reporting this statistic. Adjusted R2 is particularly useful in the Feature selection stage of model building. Adjusted R2 is not always better than R2: adjusted R2 will be more useful only if the R2 is calculated based on a sample, not the entire population. For example, if our unit of analysis is a state, and we have data for all counties, then adjusted R2 will not yield any more useful information than R2.

2. How does testing the significance of the entire multiple regression models differ from testing the contribution of each independent variable?

When testing the significance of the entire multiple regression, we are testing the jointly affect of the regressors (predictors) all together. On the other hand, when testing the contribution of each independent variable, we are testing the affect of that specific variable on the dependent variable.

3. Why and how do you use dummy variables?

The use of dummy variables allows you to include categorical independent variables as

part of the regression model. If a given categorical independent variable has two categories, then you need only one dummy variable to represent the two categories.

...Trajico, Maria Liticia D.
BSEd III-A2
REFLECTION
The first thing that puffs in my mind when I heard the word STATISTIC is that it was a very hard subject because it is another branch of mathematics that will make my head or brain bleed of thinking of how I will handle it. I have learned that statistic is a branch of mathematics concerned with the study of information that is expressed in numbers, for example information about the number of times something happens. As I examined on what the statement says, the phrase “number of times something happens” really caught my attention because my subconscious says “here we go again the non-stop solving, analyzing of problems” and I was right. This course of basic statistic has provided me with the analytical skills to crunch numerical data and to make inference from it. At first I thought that I will be alright all along with this subject but it seems that just some part of it maybe it is because I don’t pay much of my attention to it but I have learned many things. I have learned my lesson.
During our every session in this subject before having our midterm examination I really had hard and bad times in coping up with this subject. When we have our very first quiz I thought that I would fail it but it did not happen but after that, my next quizzes I have taken I failed. I was always feeling down when in every quiz I failed because even though I don’t like this...

...QUESTION 21
The finishing process on new furniture leaves slight blemishes. The table below displays a manager's probability assessment of the number of blemishes on one piece of new furniture.
Number of Blemishes
0
1
2
3
4
5
Probability
0.34
0.25
0.19
0.11
0.07
0.04
1. On average, how many blemishes do we expect on one piece of new furniture?
2. What is the variance of blemishes on one piece of new furniture? (round to the nearest hundredth) QUESTION 22
The probability that a person catches a cold during the cold-and-flu season is 0.4. Assume that 10 people are chosen at random.
On average, how many of these ten people would you expect to catch a cold?
What is the standard deviation of the number of people who catch a cold? (round to the nearest hundredth)
QUESTION 23
The number of nails in a five-pound box is normally distributed with a mean of 566 and a standard deviation of 33.
What is the probability that there are less than 500 nails in a randomly-selected five-pound box of nails? (express as a decimal, not a percentage)
The probability is 0.99 that a randomly-selected five-pound box of nails contains at least how many nails approximately?
QUESTION 24
You are the owner of a small casino in Las Vegas and you would like to reward the high-rollers who come to your casino. In particular, you want to give free accommodations to no more than 10% of your patrons....

...Practice Problems 1-KEY 1. The closing stock price of Ahmadi, Inc. for a sample of 10 trading days is shown below. Day Stock Price 1 84 2 87 3 84 4 88 5 85 6 90 7 91 8 83 9 82 10 86
For the above sample, compute the following measures. a. b. c. The mean = ∑X/n = 860/10 = 86 The median = (85+86)/2 = 85.5 The variance = ∑ X - X 2/ n-1 = {(84-86)2 + (87-86)2 + (84-86)2 + (88-86)2 + (85-86)2 + (90-86)2 + (91-86)2 + (83The standard deviation = √8.89 = 2.98 The coefficient of variation = 2.98/86 * 100% = 3.47%
86)2 + (82-86)2 + (86-86)2 } / (10 -1) = 8.89 d. f.
2. In 2008, the average age of students at GUST was 22 with a standard deviation of 3.96. In 2009, the average age was 24 with a standard deviation of 4.08. In which year do the ages show a more dispersed distribution? Show your complete work and support your answer. CV2008 = 3.96/22 * 100% = 18% CV2009 = 4.08/24 * 100% = 17% So, 2008 shows more dispersed distribution 3. A local university administers a comprehensive examination to the recipients of a B.S. degree in Business Administration. A sample of examinations are selected at random and scored. The results are shown below. Grade For the above data, determine a. The mean = ∑X/n = 664/8 = 83 b. c. The median = (85+87)/2 = 86 The standard deviation = √variance ariance = ∑ X - X 2/ n-1 = {(93-83)2 + (65-83)2 + (80-83)2 + (97-83)2 + (85-83)2 + (87-83)2 + (97-83)2 + (60 - 83)2 } / (8 -1) = 196.29 S0, standard deviation = √196.29 = 14.01 d. The coefficient of...

...DETERMINANTS IN A BUSINESS STATISTICS COURSE AT A LARGE URBAN INSTITUTION
CIS 3300
November 30, 2012
INTRODUCTION
This research paper discusses the effects of several different factors on a student’s success in a Business Statistics course. The different variables include areas related to the student’s academic factors as well as factors related to the student’s personal life. The academic related variables are: course of study, study hours per week, semester credit hours, GPA, class year, semester and class time. The personal life variable is: work hours per week. All of the above listed variables are highly related to a students’ ability to succeed in a Business Statistics course.
The main purpose of this research paper is to determine which factors show the greatest significance in predicting a students’ success in a Business Statistics course. This study will provide valuable information to both students and professors by helping both to modify certain factors to produce a higher success rate in this course. This information could be used by students to decide what the best time of day would be to take a statistics course, how many study hours are needed or to decide which school year is best to enroll in the course. Professors could use this information to schedule their statistics courses at the peak hours of the day that are the best times to take a...

...Organization of Terms
Experimental Design
Descriptive
Inferential
Population
Parameter
Sample
Random
Bias
Statistic
Types of
Variables
Graphs
Measurement scales
Nominal
Ordinal
Interval
Ratio
Qualitative
Quantitative
Independent
Dependent
Bar Graph
Histogram
Box plot
Scatterplot
Measures of
Center
Spread
Shape
Mean
Median
Mode
Range
Variance
Standard deviation
Skewness
Kurtosis
Tests of
Association
Inference
Correlation
Regression
Slope
y-intercept
Central Limit Theorem
Chi-Square
t-test
Independent samples
Correlated samples
Analysis-of-Variance
Glossary of Terms
Statistics - a set of concepts, rules, and procedures that help us to:
organize numerical information in the form of tables, graphs, and charts;
understand statistical techniques underlying decisions that affect our lives and well-being; and
make informed decisions.
Data - facts, observations, and information that come from investigations.
Measurement data sometimes called quantitative data -- the result of using some instrument to measure something (e.g., test score, weight);
Categorical data also referred to as frequency or qualitative data. Things are grouped according to some common property(ies) and the number of members of the group are recorded (e.g., males/females, vehicle type).
Variable - property of an object or event that can take on different values. For example, college major...

...Worksheet 1 - Basic Concepts
1. What is Inferential statistics?
Inferential statistics uses observations of past occurrences or available data i.e. descriptive statistics to make decisions about future possibilities and/or the nature of the entire body of data. Inferential statistics draws conclusions or makes interpretations, predictions and inferences about a population based upon an analysis of a sample.
2. Give 2 different techniques which are used in descriptive statistics to represent the data.
Tables or graphs (histograms, boxplots, etc) or numerical summaries
3. Define each of the following terms:
a) Variable
The topics/issues under investigation in statistical analysis. The variable is a characteristic or property of the members of the population which may vary e.g. height, weight, perception etc.
b) Population
The total group about which information is being sought. If information is sought about voting intentions, the population is all those people eligible to vote in an electorate, or a state or the nation.
c) Sample
A sample is a group taken from the population. Most statistical situations do not allow an entire population to be used for analysis (usually because it is too large, the geographical dispersion of subjects, logistical issues, funding, time restraints etc) so a sample must be used. The sample chosen should be representative of and reflect all of the...

...The History of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of the word statistics. In early times, the meaning was restricted to information about states. This was later extended to include all collections of information of all types, and later still it was extended to include the analysis and interpretation of such data. In modern terms, "statistics" means both sets of collected information, as in national accounts and temperature records, and analytical work which require statistical inference.
Statistical activities are often associated with models expressed using probabilities, and require probability theory for them to be put on a firm theoretical basis: see History of probability.
A number of statistical concepts have had an important impact on a wide range of sciences. These include the experiments and approaches to statistical inference such as Bayesian inference, each of which can be considered to have their own sequence in the development of the ideas underlying modern statistics.
The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of state" (then called political...

...descriptive statistics to summarize the training time data for each method. What similarities or differences do you observe from the sample data?
Descriptive analysis in excel has been used to come up with relevant figures of the given data samples which is tabulated below:
Descriptive Statistics | Current | Proposed |
Mean | 75.06557 | 75.42623 |
Standard Error | 0.505094 | 0.32091 |
Median | 76 | 76 |
Mode | 76 | 76 |
Standard Deviation | 3.944907 | 2.506385 |
Sample Variance | 15.5623 | 6.281967 |
Kurtosis | -0.06933 | 0.58694 |
Skewness | -0.22053 | -0.28749 |
Range | 19 | 13 |
Minimum | 65 | 69 |
Maximum | 84 | 82 |
Sum | 4579 | 4601 |
Count | 61 | 61 |
Analysis of descriptive statistics shows that both the current and the proposed plan have almost similar mean completion hours which stand at 75.06 and 75.42 for the current and proposed respectively. Both the plans have exact same median and mode. However, the standard deviation in the current plan (3.94) is higher than that in the proposed plan (2.56), which is ultimately leading to the higher variance in the current plan. This suggests that the completion hours are more dispersed the mean value in the current plan, hence the mean does not give the true picture of data distribution whereas in the proposed plan, data for completion hours is comparatively more congregated.
2. Use the methods of Chapter 10 to...