# 2.9 Assignment 2

## Purposes

This assignment has two parts. The first part assesses your knowledge of choosing proper measures to describe the centre and spread (variation) of the distribution of the given data, calculating the mean, median, mode, standard deviation, range, and interquartile range of the given data if applicable, explaining the meaning of a [latex]z[/latex]-score, calculating the quartiles and five-number summary of the data, and drawing and interpreting a box plot. The second part assesses your skills in using R commander to create a box plot and side-by-side box plot, and obtain descriptive measures of a given data set.

## Resources

## Instructions

**Part A**

Complete the following:

- In one Winter Olympics, Michelle Kwan competed in the women’s singles short program. From nine judges, she received the following scores in technical component ranging from 1 (poor) to 6 (perfect).

5.8 5.7 5.9 5.7 5.5 5.7 5.7 5.7 5.6 with [latex]\sum_{}^{}x_{i} = 51.3[/latex]- Find the mean, median, and mode of the data. (6 marks)
- Calculate the standard deviation of the data using the defining formula

[latex]s = \sqrt{\frac{\sum_{i = 1}^{n}{{(x}_{i} - \bar{x})}^{2}}{n - 1}}.[/latex]

Interpret the number obtained. (6 marks: 4+2) - Calculate the standard deviation of the data using the computing formula

[latex]s = \sqrt{\frac{\left( \sum_{}^{}{x_{i}}^{2} \right) - \frac{{(\sum_{}^{}x_{i})}^{2}}{n}}{n - 1}}[/latex]

Compare the result obtained in part (b). Which formula needs less calculation, part (b) or part (c)? (6 marks)

- A random sample of 21 patients yielded the following data on length of stay (in days) in a hospital.

4 4 12 18 9 6 12 3 6 15 7 3 55 1 10 12 5 7 1 12 9 - Obtain and interpret the five-number summary. (10 marks)
- Obtain and interpret the interquartile range. (4 marks)
- Is there any potential outlier? Justify by calculation. (3 marks)
- Draw a box plot based on the data. What can you tell about the distribution of the data? (5 marks)
- Choose the proper measures for the centre and spread (variation) of the data. Verify your choice. (3 marks)

- Complete the following sentences.
- A standardized variable [latex]Z = \frac{X - \mu}{\sigma}[/latex] always has mean ____ and standard deviation_____. (2 marks)
- A positive [latex]z[/latex]-score indicates that the observation is ____ the mean; whereas a negative [latex]z[/latex]-score indicates the observation is ____ the mean. (2 marks)

- Suppose that you obtain 350 out of 400 in one exam whose mean score is 280 and the standard deviation is 20. Did you do well in the exam? Explain why. (3 marks)

**Part B**

**Finish the following questions using R and R commander**:

Read the data set “**M01_SaleHome.xlsx**” and use R commander to complete the following tasks. **For each, you need to copy or do a screenshot of the output in R commander (we later call it computer output) and paste it into the space below the questions**. To save space, you only need to copy and paste what is asked for in the questions, and sometime may need to shrink the size.

- Choose the most proper measure for the centre of each of the 10 variables and provide the values of the selected measures. (20 marks)
- Use proper numerical summaries to compare the prices of homes with a tile roof and a non-tile roof. Briefly explain your findings based on the numerical summaries. (5 marks)
- Draw a side-by-side box plot to compare the prices of homes with a tile roof and a non-tile roof. Briefly explain your findings based on the graph. (5 marks)
- Use proper descriptive statistics and graphs to compare the prices of homes with a swimming pool and without a swimming pool. Briefly summarize your findings. (10 marks)