{"id":97,"date":"2020-06-22T17:38:33","date_gmt":"2020-06-22T21:38:33","guid":{"rendered":"https:\/\/openbooks.macewan.ca\/rcommander\/?post_type=chapter&#038;p=97"},"modified":"2025-05-07T17:35:01","modified_gmt":"2025-05-07T21:35:01","slug":"1-2-data-collection","status":"publish","type":"chapter","link":"https:\/\/openbooks.macewan.ca\/introstats\/chapter\/1-2-data-collection\/","title":{"raw":"1.2 Data Collection","rendered":"1.2 Data Collection"},"content":{"raw":"Most decision-making is based on facts and evidence, and data collection is an excellent means of obtaining evidence. For example,\r\n<ul type=\"disc\">\r\n \t<li>If you wanted to determine whether three cars provide sufficient capacity for the LRT train running between the Health Sciences and MacEwan stations, you could stand at the entrance of the MacEwan Station and count the number of people entering and exiting the station during peak hours on a regular school day. The data would help to determine whether more or less cars are needed.<\/li>\r\n<\/ul>\r\nAnother example is in medical science.\r\n<ul type=\"disc\">\r\n \t<li>If you want to confirm whether a new drug is more effective than the old one, you would need to obtain a sample of patients with a similar condition and randomly assign them to two groups: subjects in one group receive the new drug, and subjects in the other group receive the old drug. After each drug has taken effect, you would compare the outcomes of the two groups.<\/li>\r\n<\/ul>\r\nCommon ways to collect data are sampling surveys, observational studies, and designed experiments. For sampling methods, students in this course are only required to understand simple random sampling, which will be introduced in the next section. The main difference between an observational study and a designed experiment is that a designed experiment involves <strong>manipulations<\/strong> of the subjects, while an observational study does not.\r\n<h2>1.2.1 Sampling Methods<\/h2>\r\nIn statistics, a sampling survey describes the process of selecting a sample of individuals\/items from a target population in order to conduct a survey. A <strong>census <\/strong>is a type of survey in which the researcher samples the entire population. Typically, a census requires a population which is reasonably small, otherwise the process of data collection can be expensive, time-consuming and, in some cases, impossible. In fact, it is usually the case that the population is too large for the researcher to survey all of its members. For this reason, it is often the case that a small, carefully chosen sample is used to represent the population.\r\n\r\nThe logic behind sampling is this: by well-mixing the population, we are able to learn about the entire population by examining a sample. For example, suppose you are cooking a pot of soup and you would like to taste the soup; you do not need to have the whole pot\u2014you just need to taste or sample a spoon of it. How can we \"stir\" people or items in sampling? We adopt the idea of randomization, which means we select individuals or items randomly.\r\n<h3><strong>Simple Random Sampling<\/strong><\/h3>\r\nSampling methods are classified as either probability or nonprobability. In probability samples, each member of the population has a known, non-zero probability of being selected. In Stat 151, we only cover one particular sampling procedure, i.e., <strong>simple random sampling<\/strong>, in which each possible sample of a given size has equal chance of being selected. In simple random sampling, each individual is equally likely to be selected. For example, picking four cards randomly after shuffling the cards well, each card has the same chance of being picked. A sample obtained by simple random sampling is called a <strong>simple random sample (SRS)<\/strong>.\r\n\r\nThere are two types of simple random sampling. One is simple random sampling <strong>with replacement<\/strong>, whereby individuals of the population can be selected more than once; the other is simple random sampling <strong>without replacement<\/strong>, whereby any individuals of the population can be selected at most once.\r\n\r\nThere are several ways to obtain a simple random sample:\r\n<ul>\r\n \t<li>Picking slips of paper out of a box<\/li>\r\n \t<li>Generating by computer<\/li>\r\n \t<li>Using a random numbers table<\/li>\r\n<\/ul>\r\nObtaining a simple random sample by picking slips of paper out of a box is impractical, especially when the population is large. We can either use a computer or the random-number tables (see one below) to generate a simple random sample. The arrows are used to show our example that follows.<a id=\"rettab1.1\"><\/a>\r\n\r\n[caption id=\"attachment_3021\" align=\"aligncenter\" width=\"1500\"]<img class=\"wp-image-3021 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI.png\" alt=\"A table of randomly generated numbers. The numbers are divided into sets by rows and columns. Image description available.\" width=\"1500\" height=\"997\" \/> <strong>Table 1.1<\/strong>: Random Number Table. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#tab1.1\">Image Description (See Appendix D Table 1.1)<\/a>][\/caption]\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Example: Simple Random Sampling Using the Random Number Table<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nLet us use the table above to show how to obtain a simple random sample without replacement.\r\n\r\nSuppose our section consists of 60 students, from which we would like to obtain a random sample of size 10. First, we can number the students from 1 to 60. To select 10 random numbers between 1 and 60, we first pick a random starting point. We can close our eyes and randomly point our finger into the table and use the first two digits of the number we point on as the starting point. For example, if the number you pick is 82 (Line number 05 and column number 00-01), then go down the column, the second number is 65 which is greater than 60 and hence is discarded, the third one is 57 which is good to keep, the next one is 67 which is greater than 60 and hence is discarded, the next ones are 68 (discarded), 18 (good to keep), 91 (discarded), 97 (discarded), 71 (discarded), 74 (discarded), 75 (discarded), 78 (discarded), 16 (good to keep), 02 (good to keep), 74 (discarded). Now we are at the end of columns 00-01, move to columns 02-03 at the bottom then move up, the numbers are 81 (discarded), 42 (keep), 90 (discarded), 06 (keep), 62 (discarded), 43 (keep), 95 (discarded), 32 (keep), 67 (discarded), 42 (already in the list, discarded), 22( keep), 35 (keep). As a result, the 10 random numbers are: 57, 18, 16, 02, 42, 06, 43, 32, 22, 35.\r\n\r\n<strong>In this course, a simple random sample means a simple random sample without replacement by default<\/strong>.\r\n\r\n<\/div>\r\n<\/div>\r\nIn practice, the most popular way to obtain a simple random sample is using computer.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Example: Simple Random Sampling Using R Commander<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nSuppose our section consists of 60 students, use R Commander to generate a simple random sample of size [latex]n=10[\/latex] without replacement. In order to obtain the same sample, use \"4061\" as the random seed.\r\n\r\nSimilar to picking a random starting point, we first need to set a random seed using the function \"<strong>set.seed()<\/strong>\". A simple random sample can be obtained using R Commander in two steps:\r\n<ol>\r\n \t<li>Type \"<strong>set.seed(4061)<\/strong>\" in the R Script window, then press \"<strong>Submit<\/strong>\". By doing this, we will obtain the same sample if we rerun the command line.<\/li>\r\n \t<li>Type \"<strong>sample(1:60, 10)<\/strong>\" in the R Script window, and then press \"<strong>Submit<\/strong>\". The function \"<strong>sample()<\/strong>\" is used to generate a simple random sample. The first input argument indicates from what the sample is taken from. In this example, it is \"1:60\" which means [latex]1, 2, 3, \\cdots, 59, 60[\/latex]. The second input argument specifies the sample size.\u00a0 The command line \"<strong>sample(1:60, 10)<\/strong>\" means we could like to randomly take 10 different numbers between 1 and 60.<\/li>\r\n<\/ol>\r\nThe resulting sample is 53, 14, 57, 13, 8, 45, 11, 50, 59, 25 (see the snapshot below).<a id=\"retsnap1.1\"><\/a>\r\n\r\n[caption id=\"attachment_3024\" align=\"aligncenter\" width=\"1024\"]<img class=\"wp-image-3024 size-large\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-1024x596.png\" alt=\"An image of the R-commander workspace and output. Image Description available.\" width=\"1024\" height=\"596\" \/> <strong>Snapshot 1.1<\/strong>: Generate a simple random sample of size 10 using R commander. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#snap1.1\">Image Description (See Appendix D Snapshot 1.1)<\/a>][\/caption]<\/div>\r\n<\/div>\r\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\r\n\r\n<img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" \/>\r\n\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">Exercise: Generate Simple Random Sample Using R Commander<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nSuppose that a class consists of 100 students, use R Commander to generate a simple random sample of size [latex]n=5[\/latex] without replacement. Use \"6194\" as the random seed.\r\n\r\nUse R Commander to generate a simple random sample of size [latex]n=5[\/latex] without replacement. Use \"6194\" as the random seed.\r\n\r\n&nbsp;\r\n\r\n<details><summary>Show\/Hide Answer<\/summary>\r\n<ol>\r\n \t<li>Type \"<strong>set.seed(6194)<\/strong>\" in the R Script window, then press \"<strong>Submit<\/strong>\".<\/li>\r\n \t<li>Type \"<strong>sample(1:100, 5)<\/strong>\" in the R Script window, and then press \"<strong>Submit<\/strong>\".<\/li>\r\n<\/ol>\r\nThe resulting sample is 59, 9, 1, 40, 77.<a id=\"retsnap1.2\"><\/a>\r\n\r\n[caption id=\"attachment_3026\" align=\"aligncenter\" width=\"1037\"]<img class=\"wp-image-3026 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise.png\" alt=\"An image of the R-commander workspace and output. Image Description available.\" width=\"1037\" height=\"596\" \/> <strong>Snapshot 1.2<\/strong>: Generate a simple random sample of size 5 using R commander. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#snap1.2\">Image Description (See Appendix D Snapshot 1.2)<\/a>][\/caption]<\/details><\/div>\r\n<\/div>\r\n<h3><strong><span style=\"font-size: 1.266em; font-style: italic; text-align: initial;\">Some Other Sampling Methods (not required, extra reading)<\/span><\/strong><\/h3>\r\n<div id=\"input01\">\r\n<div id=\"answer01\" class=\"hidden\">\r\n\r\nBesides the simple random sampling, some other good sampling methods include:\r\n<ul>\r\n \t<li><strong>Stratified sampling<\/strong>: Population is divided into homogeneous groups called strata and then simple random sampling is applied within each stratum. For example, 65% of students at MacEwan are female students. A stratified sample of 100 students can be obtained by drawing a simple random sample of 65 female students and a simple random sample of 35 male students.<\/li>\r\n \t<li><strong>Cluster sampling<\/strong>: Split the population into clusters and select one or several clusters at random. And then conduct a census within each cluster. Each cluster should represent the full population fairly. For example, Edmonton is <span class=\"ILfuVd NA6bn\"><span class=\"hgKElc\">geographically divided into <\/span><\/span><span class=\"ILfuVd NA6bn\"><span class=\"hgKElc\">375 neighborhoods. A cluster sample residents of Edmonton can be obtained by taking a simple random sample of 20 neighborhoods and then taking all residents in those 20 selected neighborhoods.<\/span><\/span><\/li>\r\n \t<li><strong>Systematic sampling<\/strong>: Select every kth individual from the sampling frame, e.g., choose every 5th person on an alphabetical list of students. If we start from the 4th individual, and we choose every 5th person, the resulting list will be [latex]4, 4+5=9, 9+5=14, 14+5=19[\/latex], and etc. Therefore the 4th, 9th, 14th, 19th, and so on on the list will be in the sample.<\/li>\r\n<\/ul>\r\nSome convenient but relatively not that desirable sample methods are:\r\n<ul>\r\n \t<li><strong>Voluntary response sampling<\/strong>: A large group of individuals is invited to respond and all who do respond are counted, e.g., online survey.<\/li>\r\n \t<li><strong>Convenience sampling<\/strong>: This includes individuals who are convenient to sample. For example, stop people in a mall and ask questions.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\r\n\r\n<img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" \/>\r\n\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">Exercise: Sampling Methods<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nIf I want to know the percentage of residents in Edmonton who have taken at least one statistics course like STAT 151, identify and explain the advantages and disadvantages of the following sampling methods.\r\n<ol>\r\n \t<li>Stop 100 people at the entrance of MacEwan and ask for their response.<\/li>\r\n \t<li>Get the phone numbers of all residents in Edmonton from the census data, randomly pick 100 people, call them and ask for their response.<\/li>\r\n \t<li>Send invitations to fill out an online survey.<\/li>\r\n<\/ol>\r\n<details><summary>Show\/Hide Answer<\/summary>\r\n<ol>\r\n \t<li>This is a convenience sample. Advantage is convenient to take the sample. Disadvantage is the estimate is biased and might overestimate the percentage.<\/li>\r\n \t<li>This is a simple random sample.\u00a0 However, we will miss those whose phone numbers are not listed and those who do not answer the phone.<\/li>\r\n \t<li>This is a voluntary sample. It is cost effective to send invitations. However, the response rate might be low and we will miss those who have no access to computer.<\/li>\r\n<\/ol>\r\nNote: In practice, there might be no sampling method that will correspond exactly to simple random sampling; but some will be better than others.\r\n\r\n<\/details><\/div>\r\n<\/div>\r\n<h2><strong>1.2.2 Designed Experiments<\/strong><\/h2>\r\nDesign experiments are another method of data collection. In a designed experiment, investigators randomly assign subjects to different experimental groups (called treatments), observe the outcomes, and test whether treatment differences are statistically significant. A designed experiment is ideal for investigating a cause-effect relationship.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Example: A Designed Experiment<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nResearchers in a pharmaceutical company want to test whether their new pain killer is effective or not in reducing pain. Forty females with similar conditions (e.g., age, diet) are randomly assigned to two groups: 20 take a vitamin and another 20 take the pain killer. The subjects are then asked to report their pain score in a scale of 0 to 10, four hours after taking the pill. Here is how the experiment is designed:\r\n<ul>\r\n \t<li>The 40 female participants should be in similar conditions to minimize the effect of other factors, such as age, diet, and etc.<\/li>\r\n \t<li>It is also important to ensure that the vitamin and the pain killer look similar, so that participants do not know which group they are in. Here, the vitamin is called the <strong>placebo<\/strong>. It is well-known that people tend to feel better after they receive some kind of treatment, even if the treatment does not have any physical effect; this is called the <strong>placebo effect<\/strong>. By having both a treatment group and a placebo group, the researchers are able to minimize the placebo effect and hence more accurately measure the effectiveness of the pain killer.<\/li>\r\n \t<li>Randomly assign the individuals to the two different groups by collecting a simple random sample of 20 out of the 40 individuals, assigning them to the vitamin (placebo) group, and assigning the remaining 20 individuals to the group receiving the pain killer.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>1.2.3 Observational Studies<\/h2>\r\nIn some studies, it may not be possible to randomly assign the subjects to different treatment groups due to ethical or practical reasons. For example, we cannot randomly assign people into smoker and non-smoker groups. In those cases, we might have to conduct observational studies.\r\n\r\nIn an observational study, the investigator observes the characteristics of individuals in samples from a population of interest to discover trends and possible relationships between variables.\r\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\r\n\r\n<img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" \/>\r\n\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">Exercise: An Observational Study<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nIn order to study the association between breast cancer and smoking, can we randomly assign the participants to the smoker or non-smoker group? Which of the following two studies do you think better?\r\n<ol>\r\n \t<li>Follow 20 smokers and 20 non-smokers who have similar conditions, compare the occurrences of breast cancer in the two groups at the end of study.<\/li>\r\n \t<li>From one hospital, sample 20 breast cancer patients and another 20 patients without breast cancer by matching, i.e., we try to make the two groups as similar as possible except for the cancer status. Determine the smoking status for each subject and compare the percentages of smokers in both groups. This is called a case-control study. A breast cancer patient is a case, while a patient without breast cancer is a control.<\/li>\r\n<\/ol>\r\n<details><summary>Show\/Hide Answer<\/summary>&nbsp;\r\n\r\nThe second study plan is better. Not everyone will develop breast cancer at the end of the first study; we might not observe any cases of cancer in either groups. The first study plan could potentially end up with no useful results. In the second study, however, we are always able to compare the percentage of smokers between groups and therefore, establish the association between smoking and breast cancer.\r\n\r\n<\/details><\/div>\r\n<\/div>\r\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\r\n\r\n<img class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/instructornote.png\" alt=\"\" width=\"250\" height=\"50\" \/>\r\n\r\n<\/div>\r\n<ol>\r\n \t<li>For rare disease, it is better to first recruit participants with the condition (called cases) and then recruit participants without the condition (controls) by matching other characteristics.<\/li>\r\n \t<li>It is easier to establish a causal relationship with experimental studies than observational studies. Whenever possible, experimental studies are preferred.<\/li>\r\n<\/ol>","rendered":"<p>Most decision-making is based on facts and evidence, and data collection is an excellent means of obtaining evidence. For example,<\/p>\n<ul type=\"disc\">\n<li>If you wanted to determine whether three cars provide sufficient capacity for the LRT train running between the Health Sciences and MacEwan stations, you could stand at the entrance of the MacEwan Station and count the number of people entering and exiting the station during peak hours on a regular school day. The data would help to determine whether more or less cars are needed.<\/li>\n<\/ul>\n<p>Another example is in medical science.<\/p>\n<ul type=\"disc\">\n<li>If you want to confirm whether a new drug is more effective than the old one, you would need to obtain a sample of patients with a similar condition and randomly assign them to two groups: subjects in one group receive the new drug, and subjects in the other group receive the old drug. After each drug has taken effect, you would compare the outcomes of the two groups.<\/li>\n<\/ul>\n<p>Common ways to collect data are sampling surveys, observational studies, and designed experiments. For sampling methods, students in this course are only required to understand simple random sampling, which will be introduced in the next section. The main difference between an observational study and a designed experiment is that a designed experiment involves <strong>manipulations<\/strong> of the subjects, while an observational study does not.<\/p>\n<h2>1.2.1 Sampling Methods<\/h2>\n<p>In statistics, a sampling survey describes the process of selecting a sample of individuals\/items from a target population in order to conduct a survey. A <strong>census <\/strong>is a type of survey in which the researcher samples the entire population. Typically, a census requires a population which is reasonably small, otherwise the process of data collection can be expensive, time-consuming and, in some cases, impossible. In fact, it is usually the case that the population is too large for the researcher to survey all of its members. For this reason, it is often the case that a small, carefully chosen sample is used to represent the population.<\/p>\n<p>The logic behind sampling is this: by well-mixing the population, we are able to learn about the entire population by examining a sample. For example, suppose you are cooking a pot of soup and you would like to taste the soup; you do not need to have the whole pot\u2014you just need to taste or sample a spoon of it. How can we &#8220;stir&#8221; people or items in sampling? We adopt the idea of randomization, which means we select individuals or items randomly.<\/p>\n<h3><strong>Simple Random Sampling<\/strong><\/h3>\n<p>Sampling methods are classified as either probability or nonprobability. In probability samples, each member of the population has a known, non-zero probability of being selected. In Stat 151, we only cover one particular sampling procedure, i.e., <strong>simple random sampling<\/strong>, in which each possible sample of a given size has equal chance of being selected. In simple random sampling, each individual is equally likely to be selected. For example, picking four cards randomly after shuffling the cards well, each card has the same chance of being picked. A sample obtained by simple random sampling is called a <strong>simple random sample (SRS)<\/strong>.<\/p>\n<p>There are two types of simple random sampling. One is simple random sampling <strong>with replacement<\/strong>, whereby individuals of the population can be selected more than once; the other is simple random sampling <strong>without replacement<\/strong>, whereby any individuals of the population can be selected at most once.<\/p>\n<p>There are several ways to obtain a simple random sample:<\/p>\n<ul>\n<li>Picking slips of paper out of a box<\/li>\n<li>Generating by computer<\/li>\n<li>Using a random numbers table<\/li>\n<\/ul>\n<p>Obtaining a simple random sample by picking slips of paper out of a box is impractical, especially when the population is large. We can either use a computer or the random-number tables (see one below) to generate a simple random sample. The arrows are used to show our example that follows.<a id=\"rettab1.1\"><\/a><\/p>\n<figure id=\"attachment_3021\" aria-describedby=\"caption-attachment-3021\" style=\"width: 1500px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3021 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI.png\" alt=\"A table of randomly generated numbers. The numbers are divided into sets by rows and columns. Image description available.\" width=\"1500\" height=\"997\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI.png 1500w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-300x199.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-1024x681.png 1024w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-768x510.png 768w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-65x43.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-225x150.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2022\/06\/TableI-350x233.png 350w\" sizes=\"auto, (max-width: 1500px) 100vw, 1500px\" \/><figcaption id=\"caption-attachment-3021\" class=\"wp-caption-text\"><strong>Table 1.1<\/strong>: Random Number Table. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#tab1.1\">Image Description (See Appendix D Table 1.1)<\/a>]<\/figcaption><\/figure>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Example: Simple Random Sampling Using the Random Number Table<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Let us use the table above to show how to obtain a simple random sample without replacement.<\/p>\n<p>Suppose our section consists of 60 students, from which we would like to obtain a random sample of size 10. First, we can number the students from 1 to 60. To select 10 random numbers between 1 and 60, we first pick a random starting point. We can close our eyes and randomly point our finger into the table and use the first two digits of the number we point on as the starting point. For example, if the number you pick is 82 (Line number 05 and column number 00-01), then go down the column, the second number is 65 which is greater than 60 and hence is discarded, the third one is 57 which is good to keep, the next one is 67 which is greater than 60 and hence is discarded, the next ones are 68 (discarded), 18 (good to keep), 91 (discarded), 97 (discarded), 71 (discarded), 74 (discarded), 75 (discarded), 78 (discarded), 16 (good to keep), 02 (good to keep), 74 (discarded). Now we are at the end of columns 00-01, move to columns 02-03 at the bottom then move up, the numbers are 81 (discarded), 42 (keep), 90 (discarded), 06 (keep), 62 (discarded), 43 (keep), 95 (discarded), 32 (keep), 67 (discarded), 42 (already in the list, discarded), 22( keep), 35 (keep). As a result, the 10 random numbers are: 57, 18, 16, 02, 42, 06, 43, 32, 22, 35.<\/p>\n<p><strong>In this course, a simple random sample means a simple random sample without replacement by default<\/strong>.<\/p>\n<\/div>\n<\/div>\n<p>In practice, the most popular way to obtain a simple random sample is using computer.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Example: Simple Random Sampling Using R Commander<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Suppose our section consists of 60 students, use R Commander to generate a simple random sample of size [latex]n=10[\/latex] without replacement. In order to obtain the same sample, use &#8220;4061&#8221; as the random seed.<\/p>\n<p>Similar to picking a random starting point, we first need to set a random seed using the function &#8220;<strong>set.seed()<\/strong>&#8220;. A simple random sample can be obtained using R Commander in two steps:<\/p>\n<ol>\n<li>Type &#8220;<strong>set.seed(4061)<\/strong>&#8221; in the R Script window, then press &#8220;<strong>Submit<\/strong>&#8220;. By doing this, we will obtain the same sample if we rerun the command line.<\/li>\n<li>Type &#8220;<strong>sample(1:60, 10)<\/strong>&#8221; in the R Script window, and then press &#8220;<strong>Submit<\/strong>&#8220;. The function &#8220;<strong>sample()<\/strong>&#8221; is used to generate a simple random sample. The first input argument indicates from what the sample is taken from. In this example, it is &#8220;1:60&#8221; which means [latex]1, 2, 3, \\cdots, 59, 60[\/latex]. The second input argument specifies the sample size.\u00a0 The command line &#8220;<strong>sample(1:60, 10)<\/strong>&#8221; means we could like to randomly take 10 different numbers between 1 and 60.<\/li>\n<\/ol>\n<p>The resulting sample is 53, 14, 57, 13, 8, 45, 11, 50, 59, 25 (see the snapshot below).<a id=\"retsnap1.1\"><\/a><\/p>\n<figure id=\"attachment_3024\" aria-describedby=\"caption-attachment-3024\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3024 size-large\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-1024x596.png\" alt=\"An image of the R-commander workspace and output. Image Description available.\" width=\"1024\" height=\"596\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-1024x596.png 1024w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-300x175.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-768x447.png 768w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-65x38.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-225x131.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander-350x204.png 350w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander.png 1050w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-3024\" class=\"wp-caption-text\"><strong>Snapshot 1.1<\/strong>: Generate a simple random sample of size 10 using R commander. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#snap1.1\">Image Description (See Appendix D Snapshot 1.1)<\/a>]<\/figcaption><\/figure>\n<\/div>\n<\/div>\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png 250w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-65x13.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-225x45.png 225w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/p>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">Exercise: Generate Simple Random Sample Using R Commander<\/header>\n<div class=\"textbox__content\">\n<p>Suppose that a class consists of 100 students, use R Commander to generate a simple random sample of size [latex]n=5[\/latex] without replacement. Use &#8220;6194&#8221; as the random seed.<\/p>\n<p>Use R Commander to generate a simple random sample of size [latex]n=5[\/latex] without replacement. Use &#8220;6194&#8221; as the random seed.<\/p>\n<p>&nbsp;<\/p>\n<details>\n<summary>Show\/Hide Answer<\/summary>\n<ol>\n<li>Type &#8220;<strong>set.seed(6194)<\/strong>&#8221; in the R Script window, then press &#8220;<strong>Submit<\/strong>&#8220;.<\/li>\n<li>Type &#8220;<strong>sample(1:100, 5)<\/strong>&#8221; in the R Script window, and then press &#8220;<strong>Submit<\/strong>&#8220;.<\/li>\n<\/ol>\n<p>The resulting sample is 59, 9, 1, 40, 77.<a id=\"retsnap1.2\"><\/a><\/p>\n<figure id=\"attachment_3026\" aria-describedby=\"caption-attachment-3026\" style=\"width: 1037px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3026 size-full\" src=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise.png\" alt=\"An image of the R-commander workspace and output. Image Description available.\" width=\"1037\" height=\"596\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise.png 1037w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-300x172.png 300w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-1024x589.png 1024w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-768x441.png 768w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-65x37.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-225x129.png 225w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/SRS_Rcommander_exercise-350x201.png 350w\" sizes=\"auto, (max-width: 1037px) 100vw, 1037px\" \/><figcaption id=\"caption-attachment-3026\" class=\"wp-caption-text\"><strong>Snapshot 1.2<\/strong>: Generate a simple random sample of size 5 using R commander. [<a href=\"https:\/\/openbooks.macewan.ca\/introstats\/back-matter\/image-description\/#snap1.2\">Image Description (See Appendix D Snapshot 1.2)<\/a>]<\/figcaption><\/figure>\n<\/details>\n<\/div>\n<\/div>\n<h3><strong><span style=\"font-size: 1.266em; font-style: italic; text-align: initial;\">Some Other Sampling Methods (not required, extra reading)<\/span><\/strong><\/h3>\n<div id=\"input01\">\n<div id=\"answer01\" class=\"hidden\">\n<p>Besides the simple random sampling, some other good sampling methods include:<\/p>\n<ul>\n<li><strong>Stratified sampling<\/strong>: Population is divided into homogeneous groups called strata and then simple random sampling is applied within each stratum. For example, 65% of students at MacEwan are female students. A stratified sample of 100 students can be obtained by drawing a simple random sample of 65 female students and a simple random sample of 35 male students.<\/li>\n<li><strong>Cluster sampling<\/strong>: Split the population into clusters and select one or several clusters at random. And then conduct a census within each cluster. Each cluster should represent the full population fairly. For example, Edmonton is <span class=\"ILfuVd NA6bn\"><span class=\"hgKElc\">geographically divided into <\/span><\/span><span class=\"ILfuVd NA6bn\"><span class=\"hgKElc\">375 neighborhoods. A cluster sample residents of Edmonton can be obtained by taking a simple random sample of 20 neighborhoods and then taking all residents in those 20 selected neighborhoods.<\/span><\/span><\/li>\n<li><strong>Systematic sampling<\/strong>: Select every kth individual from the sampling frame, e.g., choose every 5th person on an alphabetical list of students. If we start from the 4th individual, and we choose every 5th person, the resulting list will be [latex]4, 4+5=9, 9+5=14, 14+5=19[\/latex], and etc. Therefore the 4th, 9th, 14th, 19th, and so on on the list will be in the sample.<\/li>\n<\/ul>\n<p>Some convenient but relatively not that desirable sample methods are:<\/p>\n<ul>\n<li><strong>Voluntary response sampling<\/strong>: A large group of individuals is invited to respond and all who do respond are counted, e.g., online survey.<\/li>\n<li><strong>Convenience sampling<\/strong>: This includes individuals who are convenient to sample. For example, stop people in a mall and ask questions.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png 250w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-65x13.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-225x45.png 225w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/p>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">Exercise: Sampling Methods<\/header>\n<div class=\"textbox__content\">\n<p>If I want to know the percentage of residents in Edmonton who have taken at least one statistics course like STAT 151, identify and explain the advantages and disadvantages of the following sampling methods.<\/p>\n<ol>\n<li>Stop 100 people at the entrance of MacEwan and ask for their response.<\/li>\n<li>Get the phone numbers of all residents in Edmonton from the census data, randomly pick 100 people, call them and ask for their response.<\/li>\n<li>Send invitations to fill out an online survey.<\/li>\n<\/ol>\n<details>\n<summary>Show\/Hide Answer<\/summary>\n<ol>\n<li>This is a convenience sample. Advantage is convenient to take the sample. Disadvantage is the estimate is biased and might overestimate the percentage.<\/li>\n<li>This is a simple random sample.\u00a0 However, we will miss those whose phone numbers are not listed and those who do not answer the phone.<\/li>\n<li>This is a voluntary sample. It is cost effective to send invitations. However, the response rate might be low and we will miss those who have no access to computer.<\/li>\n<\/ol>\n<p>Note: In practice, there might be no sampling method that will correspond exactly to simple random sampling; but some will be better than others.<\/p>\n<\/details>\n<\/div>\n<\/div>\n<h2><strong>1.2.2 Designed Experiments<\/strong><\/h2>\n<p>Design experiments are another method of data collection. In a designed experiment, investigators randomly assign subjects to different experimental groups (called treatments), observe the outcomes, and test whether treatment differences are statistically significant. A designed experiment is ideal for investigating a cause-effect relationship.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Example: A Designed Experiment<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Researchers in a pharmaceutical company want to test whether their new pain killer is effective or not in reducing pain. Forty females with similar conditions (e.g., age, diet) are randomly assigned to two groups: 20 take a vitamin and another 20 take the pain killer. The subjects are then asked to report their pain score in a scale of 0 to 10, four hours after taking the pill. Here is how the experiment is designed:<\/p>\n<ul>\n<li>The 40 female participants should be in similar conditions to minimize the effect of other factors, such as age, diet, and etc.<\/li>\n<li>It is also important to ensure that the vitamin and the pain killer look similar, so that participants do not know which group they are in. Here, the vitamin is called the <strong>placebo<\/strong>. It is well-known that people tend to feel better after they receive some kind of treatment, even if the treatment does not have any physical effect; this is called the <strong>placebo effect<\/strong>. By having both a treatment group and a placebo group, the researchers are able to minimize the placebo effect and hence more accurately measure the effectiveness of the pain killer.<\/li>\n<li>Randomly assign the individuals to the two different groups by collecting a simple random sample of 20 out of the 40 individuals, assigning them to the vitamin (placebo) group, and assigning the remaining 20 individuals to the group receiving the pain killer.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>1.2.3 Observational Studies<\/h2>\n<p>In some studies, it may not be possible to randomly assign the subjects to different treatment groups due to ethical or practical reasons. For example, we cannot randomly assign people into smoker and non-smoker groups. In those cases, we might have to conduct observational studies.<\/p>\n<p>In an observational study, the investigator observes the characteristics of individuals in samples from a population of interest to discover trends and possible relationships between variables.<\/p>\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png\" alt=\"\" width=\"250\" height=\"50\" srcset=\"https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity.png 250w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-65x13.png 65w, https:\/\/openbooks.macewan.ca\/introstats\/wp-content\/uploads\/sites\/8\/2020\/06\/activity-225x45.png 225w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/p>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">Exercise: An Observational Study<\/header>\n<div class=\"textbox__content\">\n<p>In order to study the association between breast cancer and smoking, can we randomly assign the participants to the smoker or non-smoker group? Which of the following two studies do you think better?<\/p>\n<ol>\n<li>Follow 20 smokers and 20 non-smokers who have similar conditions, compare the occurrences of breast cancer in the two groups at the end of study.<\/li>\n<li>From one hospital, sample 20 breast cancer patients and another 20 patients without breast cancer by matching, i.e., we try to make the two groups as similar as possible except for the cancer status. Determine the smoking status for each subject and compare the percentages of smokers in both groups. This is called a case-control study. A breast cancer patient is a case, while a patient without breast cancer is a control.<\/li>\n<\/ol>\n<details>\n<summary>Show\/Hide Answer<\/summary>\n<p>&nbsp;<\/p>\n<p>The second study plan is better. Not everyone will develop breast cancer at the end of the first study; we might not observe any cases of cancer in either groups. The first study plan could potentially end up with no useful results. In the second study, however, we are always able to compare the percentage of smokers between groups and therefore, establish the association between smoking and breast cancer.<\/p>\n<\/details>\n<\/div>\n<\/div>\n<div style=\"height: 55px; margin-top: 2.1428571429em;\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-99 alignleft\" src=\"https:\/\/openbooks.macewan.ca\/rcommander\/wp-content\/uploads\/sites\/8\/2020\/06\/instructornote.png\" alt=\"\" width=\"250\" height=\"50\" \/><\/p>\n<\/div>\n<ol>\n<li>For rare disease, it is better to first recruit participants with the condition (called cases) and then recruit participants without the condition (controls) by matching other characteristics.<\/li>\n<li>It is easier to establish a causal relationship with experimental studies than observational studies. Whenever possible, experimental studies are preferred.<\/li>\n<\/ol>\n","protected":false},"author":19,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-97","chapter","type-chapter","status-publish","hentry"],"part":34,"_links":{"self":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/97","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/users\/19"}],"version-history":[{"count":68,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/97\/revisions"}],"predecessor-version":[{"id":5482,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/97\/revisions\/5482"}],"part":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/parts\/34"}],"metadata":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapters\/97\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/media?parent=97"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=97"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/contributor?post=97"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/introstats\/wp-json\/wp\/v2\/license?post=97"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}