Regression Analysis of Pricing of IPL Players
Project Report




Pricing of Players in the Indian Premier League
Executive Summary
In the project, price for the players in IPL are analysed against various factors. Not all factors drove the price of a player were directly related to their performance on the field, whereas there are specific factors which had a direct impact on player’s remuneration. These factors ranged from performance measure of players such as Strike rate (in case of a batsman) to physical attributes of players such as age. We applied techniques of multiple linear regression to determine such factors which were deterministic in pricing the players.
Best Regression model(s)
The following are the independent variables which are derived after doing regression analysis.
Where,
* Country = 1 for India, 0 for nonIndia
* Age_3=1 for age < 25 years, 0 for others
* T_Runs is test run scored
* Runs_ODI is run scored in ODIs
* ODI_Wickets is wickets taken in ODIs
* RUN_S is runs scored
* BASE_PRICE is base price
* YEAR = 0 for year <2011, 0 for others
Linear regression model has been developed using Backward variable selection method. The criterion used for Backward method is Probability of Ftoremove >= 0.100
As seen from the above table in our model the ‘R Square’ value of is 0.618 and ‘Adjusted R Square’ value is 0.592. Team variable is removed
Cricket in the T20 format is considered a young man’s sport, is there evidence that the player’s price is influenced by age?
From our analysis we have seen that the price of a player is greater if the player is less than 25 years of age.
...associated with a β1 change in Y.
(iii) The interpretation of the slope coefficient in the model ln(Yi ) = β0 + β1 ln(Xi ) + ui is as
follows:
(a) a 1% change in X is associated with a β1 % change in Y.
(b) a change in X by one unit is associated with a β1 change in Y.
(c) a change in X by one unit is associated with a 100β1 % change in Y.
(d) a 1% change in X is associated with a change in Y of 0.01β1 .
(iv) To decide whether Yi = β0 + β1 X + ui or ln(Yi ) = β0 + β1 X + ui fits the data better, you
cannot consult the regression R2 because
(a) ln(Y) may be negative for 0 < Y < 1.
(b) the TSS are not measured in the same units between the two models.
(c) the slope no longer indicates the effect of a unit change of X on Y in the loglinear
model.
(d) the regression R2 can be greater than one in the second model.
1
(v) The exponential function
(a) is the inverse of the natural logarithm function.
(b) does not play an important role in modeling nonlinear regression functions in econometrics.
(c) can be written as exp(ex ).
(d) is ex , where e is 3.1415...
(vi) The following are properties of the logarithm function with the exception of
(a) ln(1/x) = −ln(x).
(b) ln(a + x) = ln(a) + ln(x).
(c) ln(ax) = ln(a) + ln(x).
(d) ln(xa) = aln(x).
(vii) In the loglog model, the slope coefficient indicates
(a) the effect that a unit change in X has on Y.
(b) the elasticity of Y with respect to X.
(c) ∆Y/∆X.
(d)
∆Y
∆X
×
Y
X
(viii) In the...
...
Unit 5 – RegressionAnalysis
Mikeja R. Cherry
American InterContinental University
Abstract
In this brief, I will demonstrate selected perceptions of the company Nordstrom, Inc., a retailer that specializes in fashion apparel with over 12 million dollars in sales last year. I will research, review, and analyze perceptions of the company, create graphs to show qualitative and quantitative analysis, and provide a summary of my findings.
Introduction
Nordstrom, Inc. is a retailer that specializes in fashion apparel for men, women and kids that was founded in 1901. The company is headquartered in Seattle, Washington with over 61,000 employees worldwide as of February 2, 2013. (Business Wire, 2014)
Nordstrom, Inc. offers on online store, ecommerce, retail stores, mobile commerce and catalogs to its consumers. It operates 117 fullline stores within the United States and 1 store in Canada, 167 Nordstrom Rack stores, 1 clearance store under the Last Chance Banner, 1 philanthropic treasure & bond store called Trunk Club and 2 Jeffrey boutiques. The option of shopping online is also available at www.nordstrom.com along with an online private sale subsidiary Hautelook. They have warehouses, also called fulfillment centers, which manages majority of their shipping needs that are located in Cedar Rapids, Iowa. (Business Source Premier, 2014)
Nordstrom, Inc. continues to make investments in their ecommerce...
...RegressionAnalysis Exercises
1 A farmer wanted to find the relationship between the amount of fertilizer used and the yield of corn. He selected seven acres of his land on which he used different amounts of fertilizer to grow corn. The following table gives the amount (in pounds) of fertilizer used and the yield (in bushels) of corn for each of the seven acres.
Fertilizer Used Yield of Corn 
120 138 
80 112 
100 129 
70 96 
88 119 
75 104 
110 134 
a. With the amount of fertilizer used as an independent variable and yield of corn as a...
...
Mortality Rates
RegressionAnalysis of Multiple Variables
Neil Bhatt
993569302
Sta 108 P. Burman
11 total pages
The question being posed in this experiment is to understand whether or not pollution has an impact on the mortality rate. Taking data from 60 cities (n=60) where the responsive variable Y = mortality rate per population of 100,000, whose variables include Education, Percent of the population that is nonwhite, percent of population that is deemed poor, the precipitation, the amount sulfur dioxide, and amount of nitrogen dioxide.
Data:
60 Standard Metropolitan Statistical Area (SMSA) in the United States, obtained for the years 19591961. [Source: GC McDonald and JS Ayers, “Some applications of the ‘Chernoff Faces’: a technique for graphically representing multivariate data”, in Graphical Representation of Multivariate Data, Academic Press, 1978.
Taking the data, we can construct a matrix plot of the data in order to take a visible look at whether a correlation seems to exist or not prior to calculations.
Data Distribution:
Scatter Plot Matrix
As one can observe there seems to be a cluster of data situated on what appears to be a correlation of relationship between Y=Mortality rate and X= potential variables influencing Y.
From this we construct a correlation matrix in order to see a relationship in matrix form....
...REGRESSIONANALYSIS
Correlation only indicates the degree and direction of relationship between two variables. It does not, necessarily connote a causeeffect relationship. Even when there are grounds to believe the causal relationship exits, correlation does not tell us which variable is the cause and which, the effect. For example, the demand for a commodity and its price will generally be found to be correlated, but the question whether demand depends on price or viceversa; will not be answered by correlation.
The dictionary meaning of the ‘regression’ is the act of the returning or going back. The term ‘regression’ was first used by Francis Galton in 1877 while studying the relationship between the heights of fathers and sons.
“Regression is the measure of the average relationship between two or more variables in terms of the original units of data.”
The line of regression is the line, which gives the best estimate to the values of one variable for any specific values of other variables.
For two variables on regressionanalysis, there are two regression lines. One line as the regression of x on y and other is for regression of y on x.
These two regression line show the average relationship between the two variables. The regression line of y on x gives the most probable...
...RegressionAnalysis (Tom’s Used Mustangs)
Irving Campus
GM 533: Applied Managerial Statistics
04/19/2012
Memo
To:
From:
Date: April 19st, 2012
Re: Statistic Analysis on price settings
Various hypothesis tests were compared as well as several multiple regressions in order to identify the factors that would manipulate the selling price of Ford Mustangs. The data being used contains observations on 35 used Mustangs and 10 different characteristics.
The test hypothesis that price is dependent on whether the car is convertible is superior to the other hypothesis tests conducted. The analysis performed showed that the test hypothesis with the smallest Pvalue was favorable, convertible cars had the smallest Pvalue.
The data that is used in this regressionanalysis to find the proper equation model for the relationship between price, age and mileage is from the Bryant/Smith Case 7 Tom’s Used Mustangs. As described in the case, the used car sales are determined largely by Tom’s gut feeling to determine his asking prices.
The most effective hypothesis test that exhibits a relationship with the mean price is if the car is convertible. The RegressionAnalysis is conducted to see if there is any relationship between the price and mileage, color, owner and age and GT. After running several models with different independent...
...Quantitative Methods Project
RegressionAnalysis for the pricing of players in the
Indian Premier League
Executive Summary
The selling price of players at IPLauction is affected by more than one factor. Most of these factors affect each other and still others impact the selling price only indirectly. The challenge of performing a multiple regressionanalysis on more than 25 independent variables where a clear relationship cannot be obtained is to form the regression model as carefully as possible.
Of the various factors available we have leveraged SPSS software for running our regressionanalysis. One of the reasons for preferring SPSS over others was the ease with which we can eliminate extraneous independent variables. The two methodologies used for choosing the best model in this project are:
* Forward Model Building:
Independent variables in order of their significance are incrementally added to the model till we achieve the optimum model.
* Backward Elimination:
The complete set of independent variables is regressed and the least significant predictors are eliminated in order to arrive at the optimum model.
Our analysis has shown that the following variables are the most...
...deviation of the data can influence overall result including confidence level. And he thinks that’s why we should have large enough data if possible to strengthen our conclusion.
2. RegressionAnalysis Jake recently learned a very interesting statistical topic, regressionanalysis. Although he can tell the investment returns on DJIA and AT&T are somewhat dependent, he can’t tell how much one influences the other. Additionally he isn’t sure that there is any significant time trend in DJIA and AT&T. So now he is going to do regressionanalysis by using collected data.
a) Time Trend in AT&T In order to do a hypothesis test with time trend and the investment return on AT&T, firstly he marked the data from March 2008 to September 2012 from 1 to 54. Then he set time trend as the independent variable (denoted x) and the investment return on AT&T as the dependent variable (denoted y), and used Excel program to see the regression result by using “Data Analysis” as below.
Table 2. Regression result with the investment return on AT&T against time trend
Cohort 2 Team 5
Page 7
He can now make an estimated regression line, ̂ = b0 + b1x, here is b0 and b1 are called the least squares point estimates of and and b0 is 2.276, b1 is 0.1043 (Coefficients of Table 2). Therefore the estimated regression line is ̂ = 2.276...