Colonial Broadcasting Co (CBC), a major American television network, must determine which of the different factors plays a key role in optimizing the ratings of its movie. The following report contains statistical analysis on the different relationships between the factors influencing ratings. The Regression Model
For a detailed description of the variables and the defined statistical terms used in this report, see [ Annex 1 ]. Based on the sample data provided and the statistical analysis, the following regression equation has been derived: Ratings = 13.729 - 1.540*BBS + 1.281*Winter + 1.164*Sunday +1.593*Monday + 1.854*Fact + 0.910*(SQRT)Stars + 8.413*Log (Previous Rating) - 10.206 *Log (Competition) This equation accounts for 44.3% of the observed variation in ratings, with a standard error of 1.897 (see [ Annex 3 ] for full details). Assumptions for this model can also be found on the same Annex. Methodology
Set up the model, choose the data
The sample size of 88 observations is greater than 30 and therefore sufficient to be considered representative of the entire population. Ratings was chosen as the most appropriate dependent variable since the success of a network relies on how many people watch their particular program/movie. An initial multiple regression was then run with all the remaining non-transformed variables against ratings. This resulted in an adjusted R2 value of 36%, meaning the regression equation accounts for 36% of the observed variation in ratings. The standard error was 2.04, and the t-stats showed that every explanatory variable was statistically relevant except ABN, Month and Day (see [ Annex 7 ]). Intuitively, some data points possibly could have a non-linear relationship and different tests were performed to see what kind of relationships existed. It was concluded that several did exist and an explanation will be shown later why these were so. Look at the scatter plots
Scatter plots for both the initial and final model (both Residual and Line Fit) were produced for each of the explanatory variables in relation to ratings (see [ Annex 4 ]). Without any apparent relationships noticeable, the variables were further tested to see whether some had a non-linear relationship with ratings. Given this view and after further testing, in the context of the multiple regressions, a logarithm of both the previous ratings and the competition understandably gave better results and hence was included in the formula. This also makes better sense, as the impact of the previous ratings is positive, but as it goes higher, then people who will stay for the next show add up but in a decelerating manner. Simply put, a previous rating of 30 will bring in more audiences than a previous rating of 20, but at diminishing marginal returns. The reverse can also be accounted for in competition. Increased competition has a negative effect, but the power to veer away the audience decreases. In terms of using stars, the square root was used because adding more stars to a film already saturated with many popular actors will not have a large effect on the movie’s popularity. In real world sense, increasing from 9 stars to 10 stars does not make a big difference as to a no-name movie that will finally have one star. Outlier
Since the data set consists of more than 30 observations, some standard residuals (from the residual output) that were slightly off the acceptable region were considered in the regression. The descriptive statistics (see Annex 6) did not reveal any outliers too. Hence, all 88 observations were included in the final model.
Check the correlations
A correlation table was produced and no multicollinearity was observed (see [ Annex 2 ]). Carry out the regression
In the initial regression model (see Annex 7), the variables BBS, ABN, Autumn, SQRT(Stars) did not pass the t-test, p-value and confidence interval test. BBS and SQRT(Stars) were near the...