Assignment #2
EC1204 Economic Data Collection and Analysis
Student No.
110393693

Part 1:
Question 2
From analysing the Data on the Scatter Plot the relationship between the GDP and the Population of Great Britain from 1999-2009 appears to be a moderate positive correlation relationship. Both variables are increasing at a similar rate and following a similar pattern which would indicate this relationship. This relationship would tend to be a positive one as more people are available to the work. Question 3

The correlation relationship between the GDP and the Population represents a strong positive correlation at 0.897922049. This indicates that the two indicators have a close relationship and any change in either of the indicators will be represented by a similar change in the other. This figure is close to 1 which would indicate a perfectly positive correlation relationship. This would indicate that Population was a perfect indicator for Great Britain’s GDP. Question 4

The correlation of determination indicates a variation of 80.6%. This means that 80.6% of variation in Great Britain’s GDP can be accounted to the nation’s Population variation. This is a large percentage and represents the strong relationship between Great Britain’s population and GDP. This figure indicates a proportion of the total variation in the dependent variable, population that is explained by the variation in the independent variable GDP. This figure is easier interpreted compared to the correlation relationship due to its percentage format. Question 5

The Slope of these two indicators is 77.038. This figure means that for every addition to Great Britain’s population, GDP will increase by £77.03. This Figure represents further the close relationship between Great Britain’s population and GDP. The intercept coefficient of these indicators is -3375.39. This figure indicates that if Great Britain’s population dropped to zero that the nation’s GDP would also fall to this figure....

...[Geben Sie den Firmennamen ein]
The Political Economy of Government Responsiveness: Theory and Evidence from India
Self-Study Assignment – Media Economics
Content
1. Introduction 2
2. Theory, Propositions and Empirical Strategy 2
3. Results 3
4. Evaluation of Empirical Strategy 4
5. Conclusion 5
6. References 6
1. Introduction
Extensive research has been conducted on the topic of how media circulation affects political accountability and government policy. Theory predicts that for a higher share of media receivers, political accountability and hence government expenditures increase. Besley & Burgess (2002) give additional insight into this topic by analyzing the impact of media circulation on government responsiveness to falls in food production and crop flood damage in Indian states. The authors use the extent of public food distribution and calamity relief as proxies for government responsiveness. In addition to media factors, political and economic factors are introduced as potential determinants of policies. The predictions of the theory are underlined by the results of the paper: Government responsiveness increases with a higher amount of media users within a state. Further, political factors are also relevant determinants, whereas, economic factors are of low importance.
In the following, the paper will be critically assessed within these sections. First, the theory, the propositions as well as the empirical...

...Introduction:
Data analysis is an attempt by the researcher to summarize collected data either quantitative or qualitative. Generally, quantitative analysis is simply a way of measuring things but more specifically it can be considered as a systematic approach to investigations. In this approach numerical data is collected or the researcher transforms collected or observed data into numerical data. It is ideal for finding out when and where, who and what and any relationships and patterns between variables. This is research which involves measuring or counting attributes (i.e. quantities). It can be defined as:
“The numerical representation and manipulation of observations for the purpose of describing and explaining the phenomena that those observations reflect is called quantitative analysis”
Quantitative analysis gives base to quantitative geography and considered as one of important parts of geographical research. As, subject matter of quantitative geography is comprehended by the following key issues:
Collection of empirical data
Analysis of numerical spatial data
Development of spatial methods for measurements, theories and hypothesis
Construction and testing of mathematical models of spatial theory
Concisely, all above mentioned activities develop understanding of spatial processes. Quantitative geography is not bound by deep-routed philosophical stance as its most...

...Statistical Techniques for Handling Missing Data Dr. John M. Cavendish
4 Part a1 Data were collected from 430 undergraduate college students for the purpose of examining the relationship between student personality characteristics and their preference for personality styles in their lecturers. Table 1 below presents a summary of the data collected. Of the 430 subjects for whom data was attempted, with 5 subjects providing nodata, Of the 425 subjects included in data analysis, 307 were female, 117 were male, and 1 failed to indicate their gender. With the exception of Age and Student wants Extroversion in lecturers, the Coefficients of Skewness and Kurtosis are within normal limits. In the instance of Age, the lower outliers are obviously mistakes, since it would be impossible to have students with age 0 and 2 years old in the study. However, even when they are eliminated, this variable does not approach normality. It is also apparent from Table 1 that all variables have missing values. While most have less than 5% of the values missing, the Student wants Extroversion in lecture variable has 147 (34%) of its values missing.
Table 1 Summary of Lecture Preference Study Data N Statistic lecturE Age studentN studentE studentO studentA studentC lectureN lecturO lecturA lecturC Valid N (listwise) 283 404 420 418 418 413 416 417 420 417 417 265 Statistic 12.96 19.60 23.63 30.10...

...output we generated for each row from test data. Below shows the regression model and scoring summary.
3. a) the data of purchaser only is in “Purchasers_only” sheet
b) Partition is shown in “Data_Partition2” sheet
c) Multiple Linear regression output can be seen in “MLR_Output1”. Target variable is “spending”. We select every variable except sequence_number(meaningless variable), source_w(removed from one of “source” variables because it is redundant), and purchase(all are 1 here).
d) To select best subset, the first criteria we consider is adjusted R square, finding the point where R square value stop improving, which is around 8 coefficients. Next we check Cp value, since Cp is not approaching the number of coefficient at all until more than 20 coefficient and Cp is our second criteria, we decided to choose 8 coefficients as our regression model, so that we can keep our simple model and avoid over-fitting problem.
We applied the regression model to testing and validation dataset (output is in “MLR_Output2”, and “MLR_ValidLiftChart2”). The table below shows the regression model.
4. Score Analysis
The outputs of these steps are shown in “Testdatascored” sheet from column AJ to AQ, regression model on testing data is shown in “MLR_Output3”, “MLR_NewScore3”, and “MLR_Stored3”. The table below is a part of score analysis output; Lift chart is displayed in Exhibit1.
Estimation of gross profit from mailing based on...

...Experiment 1
PENNY PINCHING: STATISTICAL TREATMENT OF DATA
Objective
Interpretation is one of the important steps for a chemical analysis. Upon receiving raw data, anyone whether scientists or non-scientists can give some thoughts about the results, such as the similarity or difference between the values or the connection between measurements. Scientists are believed to give a better interpretation as they are able to recognize a significant difference between raw data and final results. These results, which are mainly based on the mean values, average values, and standard deviations, however, can still be biased and misinterpreted without using appropriate statistical tools, such as the Q test and the Student’s t test. The Q test allows us to determine if a value can be discarded or retained, while the Student’s t test is used to determine the uncertainty and confidence associated with the assignment of a value. These analysis tools are proved to be very helpful as the data and results can be interpreted in a less biased manner.
In this experiment each group of students obtained a sample of 20 pennies. Each penny was weighted and the mass was recorded along with the year of penny. The Q test was performed to determine if any values would be rejected. The whole class data set was used to construct a frequency histogram and two distinct distributions in the penny masses were noticed. The Student’s t...

...Mid Term Exam
15.062 Data Mining
Problem 1 (25 points)
For the following questions please give a True or False answer with one or two sentences in
justification.
1.1 A linear regression model will be developed using a training data set. Adding variables to the
model will always reduce the sum of squared residuals measured on the validation set.
1.2 Although forward selection and backward elimination are fast methods for subset selection in
linear regression, only step-wise selection is guaranteed to find the best subset.
1.3 An analyst computes classification functions using discriminant analysis for a data set with
three classes C1, C2 and C3. She assumes that all three classes are equally likely to arise in the
application. She later learns that the probability of C1 is twice that of C2 and C3. The
probabilities for C2 and C3 are equal. If she re-computes the classification functions using this
information, the value of the classification function for C1 will increase for every data point.
1.4 A classification model's misclassification rate on the validation set is a better measure of the
model's predictive ability on new data than its misclassification rate on the training set.
1.5 A neural net classifier for two classes constructs a separating boundary between the classes
that is linear in weighted sums of the input values.
Problem 2 (10 points)
A dataset of 1000 cases was...

...Appalachian Coal Mining believes that it can increase labor productivity and, there- fore, net revenue by reducing air pollution in its mines. It estimates that the marginal cost function for reducing pollution by installing additional capital equipment is MC = 40P where P represents a reduction of one unit of pollution in the mines. It also feels that for every unit of pollution reduction the marginal increase in revenue (MR) is MR =1,000 =10P. How much pollution reduction should Appalachian Coal Mining undertake?
The installation of additional capital equipment will reduce pollution and increase the labor productivity..But look at the additional cost...It is not offsetting the benefit
So fix the level of pollution reduction in an optimal manner...The optimal value P is
Marginal cost = marginal revenue
Set MC = MR and solve for P
MC = 40P MR = 1000 -10P
That is
40P = 1000-10P
Take -10 p to that side...
40P + 10P = 1000
50P = 1000
P = 1000/50
P = 20
So 20 [ what unit would come here?]
pollution reduction must be undertaken by Appalachian Coal mining.
(1) Appalachian coal mining believes that it can increase labor productivity and, therefore, net revenues by reducing air pollution in its mines. It estimates that the marginal cost function for reducing pollution by installing additional capital equipment is
MC= 40P
Where P represents a reduction of one unit of pollution in the mines. It also feels that for every unit of pollution the marginal increase in...