Through the use of numerical measure, the Motion Picture Industry can be analyzed more specifically. Descriptive statistics can assist analyst to measure data in terms of location, variability, association between two variables, as well as using data for exploratory analysis and the shape, relative location, and the identification of outliers. The data presented offers a look at four data sets including opening gross income, total gross income, number of theaters, and weeks in the top 60 movies for a sample of 100 movies. These data sets reveal numerous findings about the motion picture industry that reveal useful information to an analyst.
The measure of location illustrates the location of data samples from a point of central tendency or the measure of the samples, grouped together, and there position spread across intervals. Locational measures include the mean, the median, the mode, percentiles and quartiles. The mean is the average of the date set. The median represents the middle of the group in ascending order. The mode can be various numbers in which similar numbers appear most frequently in the sample; there can be more than one mode. The percentiles and quartiles represent the position(s) that a set of numbers are grouped throughout the sample. Percentiles are defined by a particular percentages, while quartiles split the group into even quarters of 25% (Anderson, Sweeney, & Williams, 2012).
The measure of variability is expressed through an element of dispersion across a population sample. Variability is measured through the calculation of range, interquartile range, variance, standard deviation, and coefficient of variation. The range is not commonly used as it is influenced significantly by outliers; however, it is measured by the subtracting the smallest value by the largest value. Interquartile range represents the difference between the first quartile and the third quartile. This allows analyst to see the interior 50% of the data sample; therefore, it is less likely to be influenced by outliers that significantly impact the range data. Next, variability is measured through variance. Variance simply analyzes all sample data and compares these samples to the mean of the set. Standard deviation illustrates how much variation exists between the sample and the mean. This component is useful because it allows the analyst to derive the understanding of how spread the data is, meaning that a low standard deviation can illustrate just how little the dispersion within the sample population is. Finally, the coefficient of variation is a percentage number that illustrates relative difference between the mean and the standard deviation (Anderson, Sweeney, & Williams, 2012).
In statistical analysis, it is also important to analyze a data sets distribution shape, relative location, and to detect outliers. Distribution shape can be visualized through the use of a histogram, which can visually show skewness, which is the numerical measure of shape. Skewness can be both positive and negative, as well as zero. Skewness again illustrates where the overall sample measures from the mean. For example, a highly skewed right sample indicates that the skewness is positive and that the mean of the sample is significantly greater that the median. Thus, we can conclude that there are a number of large data points that are influencing the mean. Another important measure is the relative location of samples within the data set. The z-score is the number of standard deviation that a sample is from the mean. Finally, it is important for any analyst to determine the number of outliers that exist within the data set. This is important because it allows one to identify possible errors within that set and analyze each outlier as needed. Under Chebyshev’s Theorem and the Empirical Rule is a guideline of dispersion along the bell and sets what percentage each data point should be within the mean, or...
Please join StudyMode to read the full document