Table 1.001. The data set of phones.sav for year 2010
Variable name cell_subs fixed_line_subs GDP_percap pop_total pop_15_24 pop_15-59 pop_60plus area urbfrac comped tert_enrol tert_femfrac Description the number of cellular phone subscriptions the number of fixed-line telephone subscriptions the GDP per capita in US$ the total population (1000s) the population aged 15 to 24 (1000s) the population aged 15 to 59 (1000s) the population aged 60 and over (1000s) the total area of the country (in km ) the urban population as a fraction of the total the number of years of compulsory education the number of students in tertiary education the fraction of female students in tertiary education 2
From the above Table 1.001, we could define that pop_total, pop_15_24, pop_15-59 pop_60plus and urbfrac were related to population variables. On behalf of the education comped, tert_enrol and tert_femfrac meant the number of years of compulsory education, the total number of students as well as the fraction of female students in tertiary education respectively. The area merely defined as the total area of the country. The GDP_percap meant each people earned in the 2010. The fixed_line_subs was the variable related to the number of fixed-line telephone subscriptions. The cell_subs will be become the dependent variable for the forecast in the regression model. Then, we could look them at the variable View in the SPSS. At that time, we could find the column Missing and enter the Discrete missing values to be 0. After then, we turned back to the data view. We could find that there were 196 cases in this data file. Moreover, we knew that their sequences of arrangement were sorted by the alphabet letter of variable for country in the ascending order.
Afterward, we used the function Transform->Recode into Same Variables-> Old and New Values to change 0 to be the missing value. Lastly, we created a new variable which was called ID and offered them their sequential number at that moment because we may swap the sequence of the case of the data during the following works. If we hoped to recovery the original sequence, we could right click the variable and select the ascending order.
(a) At initial, we should check the data whether they had extreme points through the function of graph that was called box-plot. Then, it would firstly show the summary of the data. On the one hand, we could know that 34 cases of data are having missing data that occupied around 20%. On the other hand, 162 cases valid account for approximately 80%. They were showed in the following Table 1.002.
Table 1.002. The summary of the data set
Variable GDP_percap cell_subs fixed_line_subs pop_total pop_15_24 pop15_59 pop_60plus Area Urbfrac Comped tert_enrol tert_femfrac N 162 162 162 162 162 162 162 162 162 162 162 162 Valid Percentage .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 N 34 34 34 34 34 34 34 34 34 34 34 34 Missing Percentage .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 N 196 196 196 196 196 196 196 196 196 196 196 196 Total Percentage 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Moreover, we could see the box-plot that was showed as the following Figure 1.001
Figure 1.001. The box-plot of 12 variables
Figure 1.002.The box-plot of cell_subs
Figure 1.003.The box-plot of fixed_line_subs
Figure 1.004.The box-plot of GDP_percap
Figure 1.005.The box-plot of pop_total
Figure 1.006.The box-plot of pop_15_24
Figure 1.007.The box-plot of pop_15_59
Figure 1.008.The box-plot of pop_60plus
Figure 1.009.The box-plot of area
Figure 1.010.The box-plot of urbfrac
Figure 1.011.The box-plot of comped
Figure 1.012.The box-plot...