Descriptive analysis of statistical data

INTRODUCTION

There have always been crimes, from a treachery to an assassination. Happens in every country you can think of, and every government has to deal with it. It is really stressful to try to understand the nature of the crimes: why are they done and where could they happen next. Out of this preoccupation is that we found studies gathering data from communities; we focused on one specific crime: murders. In several communities, it is thought that the murder rate is somehow related to several factors. For instance, it is common to hear that the murder rate depends on poverty and unemployment. Starting from this hypothesis, the database found to make this analysis relates the number of murders per year per 1,000,000 inhabitants with the number of inhabitants, the percentage of families’ incomes below $5000, and the percentage unemployed.

OBJECTIVE OF THE STUDY

Trying to estimate how many murders will happen in a year in a specific place is difficult, but not impossible. This is why we are using the dataset found with the variables mentioned above, with which we’ll be able to find a formula. So, after this project, if we want to know how many murders will be on a city, for example Monterrey, we’d just plug in the data from that city (the inhabitants, the percentage of families income below $5000, and the percentage unemployed) and we’ll get a number, which would be the predicted number of murders in that specific year.

DESCRIPTION OF THE PROCEDURES USED

For this study, we used a computer package named Minitab. Minitab is a PC program designed to launch statistical functions, both basic and advanced. Also, we used Microsoft’s Excel. First of all, using Excel we did a scatterplot (the graph that connects each X and Y) of each variable paired with murders, and we added a trend line on each graph to see if there was a linear trend in them that would mean a linear relation. In other words, to see if when the percentage of any of the independent variables rose, the murder rate rose as well. In each of the three graphs of each variable with a line marking the trend the data follows:

Scatterplot relating Inhabitants (Y) with murders (X). (We can observe that there is not a linear trend which most probably means that the variable Inhabitants is not linearly related with the number of murders.)

Scatterplot relating % of people with income below 5000 (Y) and murders (X).

Scatterplot relating the % of people unemployed (Y) with murders (X)

Using Minitab, we wanted to test if the factors (% of unemployment, % of families with income below 5000, and the inhabitants), in general, were really influent on the murder rates. This means that we had to see if the three variables helped us to estimate the number of murders that happened in a year. So, the first thing we did was to use a procedure called regression, which finds the best linear equation between the variables observed. This equation is the one that can be used to do later predictions about the murders with a specific given percentage of unemployment, number of inhabitants and percentage of families with income below $5000. Minitab gave us the equation, but also many other information. We saw some data that gave us the idea that maybe some of the factors (variables) used were not being helpful. So, because our measure of how much evidence we have against the hypothesis “it is not useful” (called P-value) was big, we decided to test if the factor “inhabitants” was being useful; this means we wanted to see if it is a significant variable in the role of estimating the murder rate. After running another test not including inhabitants (called partial F test), we compared both tests (the one with inhabitants and this new one) and we found out that the number of inhabitants was not an important determinant of the murder rates; in other words, the murder rate were not related to the...