Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things. A variable is any change of an individual. A variable can take different values for different individuals. A categorical variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. The distribution of a variable tells us what values the variable takes and how often it takes these values A time plot of a variable plots each observation against the time at which it was measured. Always mark the time scale on the horizontal axis and the variable of interest on the vertical axis. If there are not too many points, connecting the points by lines helps show the pattern of changes over time. Standard Deviation s – The variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations X1, X2, …, Xn is CLASS 1: Two basic types of data: 1) Quantitative – Response is a #; 2) Qualitative (categorical) – original question being asked is not a number, usually a word; “what % fell into each category. Different types of quantitative data: 1. Ratio has a point of origin, like on the kelvin scale the 0 means a complete absence of the thing being measured. A ratio scale has a logical zero value. In measuring distance around the track, the starting line is a 0 point and half way around the mile-long outer track would be 2,640 feet. A horse that has run 100 yards has run twice far as a horse that has run 50 yards. One can say that the outer track is three times as long as the inner. 2. Ordinal – you can say this is higher than that one, can also be used to put people into groups 3. Interval scales measure distance but do not have a logical zero point that makes absolute magnitudes measurable; scale is consistent throughout; usually required to make numerical comparisons. One can say that the orange hat jockey is ahead of the green hat jockey by 1 length, the green hat ahead of the white hat by 4 lengths. But since we don’t know the exact magnitude of a “length,” we can’t say exactly how far ahead each horse is. 4. Nominal – just puts things into groups Three important characteristics of Central tendency: 1) Mode (most); 2) Mean (arithmetic sum/total #); 3) Median – middle observation Sample mean v. population mean: n vs. N; x bar vs. Mu.
Variance – how spread our are these numbers, how far is each obs from the mean; calculating the mean square dist. from the mean: population – divide by N, sample / n – 1 (lost one degree of freedom, don’t need to know what the nth number is); Sample variance (S^2) vs. pop variance (lowercase sigma^2) St. dev. Absolute dispersion - Used to describe dist., just the SQRT of VAR Coeff. Var. everyone relative to the mean- relative dispersion – Sample: S/x bar ; Population: s/mu;Range – lower vs. highest; InterQ range– how far is 3rd Q from 1st Q /\--- positively skewed (e.g. income) ; --/\-- symmetrical ; ---/\ negatively skewed *If data is skewed, mean is likely not best description of central tendency Bimodal distribution: -/\--/\- bimodal distribution, treats two groups separately Empirical Rule (normal rule) assuming data is normally dist.; = P (M+-1SD) =~ 68% = P (M+-2SD) =~ 95%; = P (M+-3SD) =~ 99.7%
Z score = (x – Mean)/SD i.e. # of SDs below or above the mean; Standardized data (i.e. scores) will have a MeanZ = 0 and SDZ = 1
Density curve: curve that: 1) is always on or above the horizontal axis; and 2) has an area exactly one underneath it. The 68-95-99.7 Rule: In the normal distribution with mean µ and standard deviation. 1) 68% of the observations within SD of the mean. 2) 95% of the observations fall within 2 SD of mean. 3) 99.7% of the observations fall within 3 SD of mean. Standard Normal Distribution:...