{"id":49,"date":"2023-09-14T18:57:23","date_gmt":"2023-09-14T22:57:23","guid":{"rendered":"https:\/\/openbooks.macewan.ca\/researchmethods\/?post_type=chapter&#038;p=49"},"modified":"2024-07-16T15:56:48","modified_gmt":"2024-07-16T19:56:48","slug":"chapter-4-research-design-and-measurement","status":"publish","type":"chapter","link":"https:\/\/openbooks.macewan.ca\/researchmethods\/chapter\/chapter-4-research-design-and-measurement\/","title":{"raw":"Chapter 4: Research Design and Measurement","rendered":"Chapter 4: Research Design and Measurement"},"content":{"raw":"<p style=\"padding-left: 40px;\"><em>In any discussion about improving measurement, it is important to begin with basic questions. What exactly are we trying to measure, and why? <\/em><\/p>\r\n<p class=\"hanging-indent\" style=\"padding-left: 80px;\">\u2014 Christine Bachrach, 2007, p. 435<\/p>\r\n\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Learning Objectives<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nAfter reading this chapter, students should be able to do the following:\r\n<ol>\r\n \t<li>Describe the main components of a research design.<\/li>\r\n \t<li>Explain what conceptualization and operationalization processes entail.<\/li>\r\n \t<li>Explain how the purpose of a variable is directly related to how it is measured in research.<\/li>\r\n \t<li>Outline the main techniques used to assess reliability and validity.<\/li>\r\n \t<li>Distinguish between random and systematic errors.<\/li>\r\n \t<li>Explain how rigour is achieved in qualitative research.<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n<h2>INTRODUCTION<\/h2>\r\nAfter carefully considering research foundations, the importance of theory, and the ethics involving research with humans, you are almost ready to delve into the techniques used for obtaining answers to social research questions. But before you can start collecting data, you need to develop a research plan that outlines who or what it is you are studying and how you will go about measuring and evaluating the attitudes, behaviours, or processes that you want to learn more about.\r\n<h2>MAIN COMPONENTS OF A RESEARCH DESIGN<\/h2>\r\nLinked to a specific research question, a <strong>[pb_glossary id=\"526\"]research design[\/pb_glossary]<\/strong> \u201cis the plan or blueprint for a study and includes the who, what, where, when, why, and how of an investigation\u201d (Hagan, 2021, Chapter 3, para 1.). Beginning with the \u201cwho,\u201d researchers in the social sciences most often study people, so individuals are usually the focus of investigation, called the <strong>[pb_glossary id=\"632\"]unit of analysis[\/pb_glossary]<\/strong>. Individuals are often studied as part of a collective or group. For example, the unit of analysis could be students, employees, single-parent-headed families, low-income earners, or some other group of interest. Researchers also compare groups of individuals along particular variables of interest, such as a sample of individuals with less than a high school education versus those who have a grade 12 diploma, single- versus dual-income families, or patients who completed a treatment program versus ones who dropped out. Social institutions and organizations that guide individuals and groups can also be the units of analysis for research, such as the university you are attending, a not-for-profit agency such as the Canadian Red Cross, or a healthcare organization such as the Canadian Medical Association. Finally, social researchers are sometimes interested in artifacts created by people rather than people themselves. For example, researchers might examine news articles, television shows, motion pictures, profiles on an online dating site, YouTube videos, Facebook postings, or X tweets.\r\n\r\nThe \u201cwhat\u201d component of research design refers to whatever is specifically examined and measured in a study, such as attitudes, beliefs, views, behaviours, outcomes, and\/or processes. The measured component is usually referred to as the \u201cunit of observation\u201d since this is what the data is collected on. For example, a researcher might be interested in factors affecting instructors\u2019 views on published ratings of instruction. Instructors at a university who take part in the study constitute the unit of analysis from whom the views on published ratings are obtained (i.e., the more specific focus of the research that comprises the data). Similarly, a researcher might be interested in dating preferences of individuals who use online matchmaking sites such as eharmony or EliteSingles. The individuals seeking dates online are the units of analysis, while their posted profiles containing the characteristics of interest in the study are the units of observation.\r\n\r\nThe \u201cwhere\u201d pertains to the location for a study. The possibilities are endless, from research conducted first-hand in the field to studies conducted in a public setting such as a coffee shop or a private one such as the home of a participant. The location for a study is closely linked to the unit of analysis since the researcher often needs to go to the individuals, groups, or organizations to collect information from them or about them. For example, a researcher interested in interviewing couples who met online might set up appointments to visit dating couples in restaurants or coffee shops near to where the couples reside, at the discretion of the participants. Alternatively, a researcher interested in online dating relationships might choose to gather information in a virtual environment by posting a survey on the internet.\r\n\r\nThe \u201cwhen\u201d relates to an important time consideration in the research design. Some studies are conducted once at a single point in time, and others are carried out at multiple points over time. Most often, when social researchers carry out a study, it takes place at a single point in time or within a single time frame, such as when a researcher develops a questionnaire about internet dating and administers it to a group of people attending a singles\u2019 social function. This is called <strong>[pb_glossary id=\"269\"]cross-sectional research[\/pb_glossary]<\/strong> because the study is made up of a cross-section of the dating population taken at a point in time, like taking a photo that captures a person or group at a single point in time. Alternatively, social researchers sometimes study individuals or groups at multiple points in time using what is called <strong>[pb_glossary id=\"412\"]longitudinal research[\/pb_glossary]<\/strong>. For example, a sample of dating couples might be surveyed shortly after they meet and again after dating for one year to capture their initial viewpoints and to see how their perceptions change once they get to know each other better. Four longitudinal designs are discussed below.\r\n\r\nA <strong>[pb_glossary id=\"458\"]panel study[\/pb_glossary]<\/strong> is a longitudinal study in which a researcher observes the same people, group, or organization at multiple points in time. A new phase of a large-scale panel study called Alberta\u2019s Tomorrow Project was launched in 2017 to learn more about the causes of cancer. The study, active since 2000, includes 55,000 cancer-free Albertans (Alberta\u2019s Tomorrow Project, 2024b). Researchers from Alberta\u2019s Tomorrow Project collect information on participants\u2019 health and lifestyle through surveys and the occasional collection of other specimens, such as blood samples. Going back to the dating example, by collecting information on the same dating couples at various intervals, a researcher could use a panel study to examine how relations evolve or change over time.\r\n\r\nA <strong>[pb_glossary id=\"218\"]cohort study[\/pb_glossary]<\/strong> is like a panel study, except, rather than focusing on the same people, it focuses on comparable people over time. \u201cComparable\u201d refers to a general category of people who are deemed to share a similar life experience in a specified period. The most common example of this is a birth cohort. Think about the classmates you went through school with. Although there were individual differences between you and your classmates, you also shared some common life events, such as the music that topped the charts at that time, the clothing fads and hairstyles, and the political events that occurred during that era. Following the earlier example, a researcher might include people who met online and married their dating partner in 2020 as Covid-19 was unfolding as a unit of analysis. Several couples might be studied in 2020 shortly after they were married, several other couples who also married in 2020 might be studied in 2021, another group in 2022, another group in 2023, and so on over a period of five years.\r\n\r\nA <strong>[pb_glossary id=\"622\"]time-series study[\/pb_glossary]<\/strong> is a longitudinal study in which a researcher examines different people at multiple points in time. Every year, Statistics Canada gathers information from thousands of Canadians using what is called the General Social Survey. Although the participants are different each time, similar forms of information are gathered over time to detect patterns and trends. For example, we can readily discern that Canadians today are delaying marriage (i.e., getting married for the first time at a later age) and they are having fewer children relative to 20 years ago. Going back to the online dating example used throughout this section, a researcher might use a time-series study in order gather information from new dating couples at an online site each year for five years. Looking at the data over time, a researcher would be able to determine if there are changes in the overall profiles of online daters at that site. For example, the use of the site might be increasing for groups such as highly educated women or single individuals over the age of 60 years.\r\n\r\nLastly, a<strong> [pb_glossary id=\"204\"]case study[\/pb_glossary]<\/strong> is a research method in which a researcher focuses on a small number of individuals or even a single person over an extended period. You\u2019ll learn more about this method in Chapter 11. For now, you can think of a case study as a highly detailed study of a single person, group, or organization over time. For example, a researcher might study an alcoholic to better understand the progression of the disease over time, or a researcher might join a subculture, such as a Magic card club that meets every Friday night, to gain an insider\u2019s perspective of the group. Similarly, a researcher could examine the experiences of a frequent online dater over time to get a sense of how online dating works at various sites. See figure 4.1 for an overview of time considerations in relation to units of analysis.<a id=\"retfig4.1\"><\/a>\r\n\r\n[caption id=\"attachment_1100\" align=\"aligncenter\" width=\"1022\"]<a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1.png\"><img class=\"wp-image-1100 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-1022x1024.png\" alt=\"Figure 4.1. Time Considerations by Units of Analysis. Image description available.\" width=\"1022\" height=\"1024\" \/><\/a> Figure 4.1. Time Considerations by Units of Analysis [Image description - <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.1\">See Appendix C Figure 4.1<\/a>][\/caption]\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Research in Action<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"indent\"><strong>Longitudinal Panel Study on Smokers\u2019 Perceptions of Health Risks<\/strong><\/p>\r\n<p class=\"indent\">Cho and colleagues (2018) examined changes in smokers\u2019 perceptions about health risks following amendments to warning labels on cigarette packages using a longitudinal panel study. Four thousand six hundred and twenty-one smokers from Canada, Australia, Mexico, and the United States were surveyed every four months for a total of five times to determine if their knowledge of health risks increased after toxic constituents were added to pictorial health warning labels. Results showed that knowledge of toxic constituents such as cyanide and benzene increased over time and was associated with a stronger perceived risk of vulnerability to smoking-related diseases, including bladder cancer and blindness, for participants in Canada, Australia, and Mexico, but not the United States (Cho et al., 2018).<\/p>\r\n\r\n\r\n[caption id=\"attachment_1087\" align=\"aligncenter\" width=\"1024\"]<img class=\"wp-image-1087 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-1024x683.jpg\" alt=\"Cigarettes showing warning labels and ashtray with butts.\" width=\"1024\" height=\"683\" \/> Image 4.1 Warning labels are designed to increase smokers\u2019 knowledge of health risks.[\/caption]\r\n\r\n<\/div>\r\n<\/div>\r\nIn addition to the time dimension, a research design also includes the \u201cwhy\u201d and the \u201chow\u201d plan for a study. \u201cWhy\u201d relates to the purpose of a study as discussed in chapter 1. A study on internet dating could be focused on exploring ways in which people represent themselves in posted profiles; describing who uses internet dating sites; explaining the importance of including certain types of information, such as appearance or personality factors, for attracting dates; or evaluating the effectiveness of a given site in matching suitable partners. The nature of the study really depends on the interests of the researcher and the nature of the issue itself.\r\n\r\n\u201cHow\u201d refers to the specific method or methods used to gather the data examined in a study. With an interest in the merit of including certain items in a dating profile, a researcher might opt for an experimental design to explain, for example, whether certain characteristics in posted dating profiles are better than others for eliciting potential dates (see chapter 6 for a discussion of different types of experiments). An experiment is a special type of quantitative research design in which a researcher changes or alters something to see what effect it has. In the dating example, an experimenter might compare the number of responses to two profiles posted at an internet dating site. If the profiles are identical except for one trait\u2014for example, one profile might contain an additional statement saying the person has a great sense of humour\u2014it would be possible to determine the importance of that trait in attracting potential partners.\r\n\r\nRecall from chapter 1 that the specific methods or techniques differ depending on whether a researcher adopts a qualitative or quantitative orientation, and the approach itself links back to the research interest. A researcher interested in learning more about why people make dates online or how people define themselves online is more apt to use a qualitative approach. Qualitative researchers tend to engage in research that seeks to better understand or explain some phenomenon using field research and in-depth interviews, as well as strategies involving discourse analyses and content analyses that can be used to help to uncover meaning. Regardless of the approach taken and specific methods used, all researchers must work through various design considerations and measurement issues in their quest to carry out scientific research. Similarly, both qualitative and quantitative researchers undertake a process of conceptualization and measurement\u2014it just occurs differently and at different stages within the overall research process, as discussed in the next section.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>What does a research design inform us about?<\/li>\r\n \t<li>What is the main difference between cross-sectional and longitudinal research?<\/li>\r\n \t<li>What is the name for research on the same unit of analysis carried out at multiple points in time?<\/li>\r\n \t<li>In what ways will a quantitative research design differ from a qualitative one?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>CONCEPTUALIZATION AND OPERATIONALIZATION<\/h2>\r\nResearchers in the social sciences frequently study social issues, social conditions, and social problems that affect individuals, such as environmental disasters, legal policy, crime, healthcare, poverty, divorce, marriage, or growing social inequality between the rich and poor, within a conceptual framework. Their quest is to explore, describe, explain, or evaluate the experiences of individuals or groups. A conceptual framework is the more concentrated area of interest used to study the social issue or problem, which includes the main objects, events, or ideas, along with the relevant theories that spell out relationships. Terms like <em>family<\/em> or <em>crime<\/em> are broad <strong>[pb_glossary id=\"228\"]concepts[\/pb_glossary] <\/strong>that refer to abstract mental representations used to signify important elements of our social world. Concepts are very important because they form the basis of social theory and research in the social sciences. However, as often-abstract representations, concepts like family can be vague since they mean different things to different people. Consider what the concept of family means to you. Does your notion of a family include aunts and uncles or second cousins? What about close friends? How about pets? People have very divergent views about what a family is or is not. The concept of family also has very different meanings depending on the context in which it is applied. For example, there are rules about who is or is not considered eligible to be sponsored under family status for potential immigration to Canada, or who may or may not be deemed family for visitation rights involving prisoners under the supervision of the Correctional Service of Canada. Because concepts are broad notions that can take on various meanings, researchers need to carefully define the concepts (or constructs) that underlie their research interests as part of the conceptualization and operationalization process.\r\n\r\n[caption id=\"attachment_2458\" align=\"aligncenter\" width=\"1024\"]<img class=\"wp-image-2458 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-1024x752.jpg\" alt=\"Couple with a dog.\" width=\"1024\" height=\"752\" \/> Image 4.2 The concept of a family differs depending on the context in which it is applied.[\/caption]\r\n\r\n<strong>[pb_glossary id=\"230\"]Conceptualization[\/pb_glossary]<\/strong>, in research, is the process where a researcher explains what a concept or construct means in terms of a research project. For example, a researcher studying the impact of education cutbacks on the family, with an interest in the negative implications for children, might adopt the broad conceptualization of family provided by Statistics Canada (2021):\r\n<blockquote>Census family is defined as a married couple and the children, if any, of either and\/or both spouses; a couple living common law and the children, if any, of either and\/or both partners; or a parent of any marital status in a one-parent family with at least one child living in the same dwelling and that child or those children. All members of a particular census family live in the same dwelling. Children may be biological or adopted children regardless of their age or marital status as long as they live in the dwelling and do not have their own married spouse, common-law partner or child living in the dwelling. Grandchildren living with their grandparent(s) but with no parents present also constitute a census family. (para. 1)<\/blockquote>\r\nConceptualization is essential since it helps us understand what is or is not being examined in a study. For example, based on the conceptualization provided above, children being raised by their grandparents or living in homes headed by single parents would be included in the study, but they might have otherwise been missed if a more traditional definition of family were employed.\r\n\r\nThe term <em>concept<\/em> is often used interchangeably with a similar term, <em>construct<\/em>. Both refer to mental abstractions; however, concepts derive from tangible or factual observations (e.g., a family exists), while <strong>[pb_glossary id=\"239\"]construct[\/pb_glossary]s<\/strong> are more hypothetical or \u201cconstructed\u201d ideas that are part of our thinking but do not exist in readily observable forms (e.g., love, honesty, intelligence, motivation). While we can readily measure and observe crime through different acts that are deemed criminal (the concept crime), we infer intelligence through measures such as test scores (the construct intelligence). Both concepts and constructs are the basis of theories and are integral components underlying social research. Like concepts, constructs undergo a process of conceptualization when they are used in research. Suppose a researcher is interested in studying social inequality, which we commonly understand to mean differences among groups of people in terms of their access to resources such as education or healthcare. We know some people can afford private schools, while others cannot. We understand that healthcare benefits differ depending on factors such as how old a person is, how much one pays for a health plan, what type of job one has, and where one lives. Social inequality is a construct that conjures up a wide range of examples and notions. To examine social inequality in a specific study, a researcher might opt to use personal finances as an <strong>[pb_glossary id=\"382\"]indicator[\/pb_glossary]<\/strong> of social inequality. An indicator is \u201ca measurable quantity which \u2018stands in\u2019 or substitutes, in some sense, for something less readily measurable\u201d (Sapsford, 2006, para. 1). Personal finances can be further specified as employment income, since most of the working population is able to state how much they earn in dollars.\r\n\r\nNote that in the case of employment income, people earn a certain amount of money (i.e., gross pay), but they receive a different amount as their take-home pay after taxes and various deductions come off (i.e., net pay). The process we use to examine with precision the progression of taking an abstract construct, such as social inequality, and conceptualizing it into something tangible, such as net income, is known as operationalization. <strong>[pb_glossary id=\"450\"]Operationalization[\/pb_glossary]<\/strong> is the process whereby a concept or construct is defined so precisely that it can be measured. In this example, financial wealth was operationalized as net yearly employment income in dollars. Note that once a construct such as social inequality has been clarified as financial wealth and then measured in net yearly income dollars, we are now working with a variable, since net yearly income is something that can change and differ between people. Quantitative researchers examine variables. A researcher interested in implications of social inequality might test the hypothesis that among individuals who work full time, those with low net yearly incomes will report poorer health compared to people with high net yearly incomes. In this example, health (the second variable) might be operationalized into the self-reported ratings of very poor, poor, fair, good, or very good.\r\n\r\nIn contrast, qualitative researchers tend to define concepts based on the users\u2019 own frameworks. Qualitative researchers are less concerned with proving that certain variables affect the individual. Instead, they are more concerned with <em>how<\/em> individuals make sense of their own social situations and <em>what<\/em> the broader social factors are for such framing.\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Research in Action<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<strong>Online Dating and Variables<\/strong>\r\n\r\nStudents routinely have difficulty understanding what variables are, let alone how to explore research questions that contain them. A starting point is to examine what is contained in a typical \u201cprofile\u201d posted on any internet dating site, such as EliteSingles or eharmony. Registered members list certain attributes they feel best describe themselves and that may also be helpful in attracting a compatible dating partner, such as their age, physical features, and certain personality traits. For example, someone might advertise as a single male, 29 years of age, 5'11\" tall, in great shape, with a good sense of humour, seeking men between the ages of 25 and 35 for friendship. These attributes are variables; many that are routinely used in social research! Variables are often defined using categories. For example, the word <em>single<\/em> refers to a category of the variable \u201cmarital status.\u201d Marital status is a variable since it is a property that can differ between individuals and change over time, as in single, common-law, married, separated, widowed, or divorced. Other variables listed on dating sites include age (in years), gender and preference (e.g., man interested in men), height (in feet and inches), body type (slim, fit, average), eye colour (e.g., green, blue, or brown), and astrological sign (e.g., Virgo, Libra, Scorpio). Even the purpose or intent of the posting constitutes a variable, since a person may state whether they are seeking fun, friendship, dating, or a long-term relationship.\r\n\r\n<\/div>\r\n<\/div>\r\n\r\n[caption id=\"attachment_1085\" align=\"aligncenter\" width=\"1024\"]<img class=\"wp-image-1085 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-1024x683.jpg\" alt=\"Couple making heart hand sign.\" width=\"1024\" height=\"683\" \/> Image 4.3 Personal profiles on dating sites are largely made up of variables.[\/caption]\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>What is a concept? Provide an example of one that is of interest to social researchers.<\/li>\r\n \t<li>Why is conceptualization important to researchers?<\/li>\r\n \t<li>What is an indicator? Provide an example of one that could be used in a study on aggression.<\/li>\r\n \t<li>What are three variables you could examine in a study of online dating relationships?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>MEASURING VARIABLES<\/h2>\r\nNow that you better appreciate what is meant by a variable, you can also start to see how variables are operationalized in different ways. Some variables are numerical, such as age or income, while others pertain more to categories and descriptors, such as marital status or perceived health status. Decisions made about how to clarify a construct such as health have important implications for other stages of the research process, including what kind of analyses are possible and what kind of interpretations can be made. For example, the categories for the self-reported health variable described above can tell us whether someone perceives their health to be better or worse than someone else\u2019s (e.g., \u201cvery good\u201d is better than \u201cgood,\u201d while \u201cfair\u201d is worse than \u201cgood\u201d but better than \u201cpoor\u201d). However, from how this variable is measured, we are unable to ascertain how much worse or how much better someone\u2019s health is relative to another.\r\n<h3>Levels of Measurement<\/h3>\r\nVariables mean different things and can be used in different ways, depending upon on how they are measured. At the lowest level, called the <strong>[pb_glossary id=\"436\"]nominal level[\/pb_glossary]<\/strong>, we can classify or label cases, such as persons, according to marital status, eye colour, religion, or the presence of children. These are all qualitative variables. Even if we assign numbers to the categories for marital status, where 1 = single, 2 = common-law, 3 = married, 4 = separated, 5 = widowed, and 6 = divorced, we have not quantified the variable. This is because the numbers serve only to identify the categories so that a <em>6 <\/em>now represents anyone who is currently \u201cdivorced.\u201d The numbers themselves are arbitrary; however, they serve the function of classification, which simply indicates that members of one category are different from another category.\r\n\r\nAt the next level of measurement, called the <strong>[pb_glossary id=\"454\"]ordinal level[\/pb_glossary]<\/strong>, we can classify and order categories of the variables of interest, such as people\u2019s perceived health into levels, job satisfaction into ratings, or prestige into rankings. Note that these variables are measured as more or less, or higher or lower amounts, of some dimension of interest. The variable health, then, measured as very good, good, fair, poor, and very poor, is an ordered variable since we know that very good is higher than good and therefore indicates better health. However, as noted earlier, we cannot determine precisely what that means in terms of how much healthier someone is who reports very good health. Ordinal variables are also qualitative in nature.\r\n\r\nAt the next highest level of measurement, called the <strong>[pb_glossary id=\"400\"]interval level[\/pb_glossary]<\/strong>, we can classify, order, and examine differences between the categories of the variables of interest This is possible because the assigned scores include equal intervals between categories. For example, with temperature as a main variable, we know that 28\u00b0C is exactly one degree higher than 27\u00b0C, which is 7 degrees higher than 20\u00b0C, and so on.\r\n\r\nStatisticians sometimes make a further distinction between an interval and <strong>[pb_glossary id=\"516\"]ratio level[\/pb_glossary]<\/strong> of measurement. Both levels include meaningful distance between categories, as well as the properties from the lower levels. However, a true zero only exists at the ratio level of measurement constituting the additional property. In the case of temperature, 0\u00b0C cannot be taken to mean the absence of temperature (an absolute or true zero). At the ratio level, however, there is a true zero in the case of time, where a stopwatch can count down from two minutes to zero, and zero indicates no time left. Most variables that include the property of score assignment have a true zero. One way to determine if a variable is measured at the ratio level is to consider if its categories can adhere to the logic of \u201ctwice as.\u201d For example, an assessment variable where a person can achieve twice the score of someone else or net employment income where one employee can earn twice that of another are both measured at the ratio level. Interval- and ratio-level variables are quantitative, and they are amenable to statistical analyses such as tests for associations between variables. The properties of each level of measurement are summarized in figure 4.2.<a id=\"retfig4.2\"><\/a>\r\n\r\n[caption id=\"attachment_2279\" align=\"aligncenter\" width=\"1024\"]<a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1.png\"><img class=\"wp-image-2279 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-1024x716.png\" alt=\"Figure 4.2. Properties and Functions of Levels of Measurement. Image description available.\" width=\"1024\" height=\"716\" \/><\/a> Figure 4.2. Properties and Functions of Levels of Measurement [Image description - <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.2\">See Appendix A Figure 4.2<\/a>][\/caption]\r\n<h4>Activity: Sorting Levels of Measurement<\/h4>\r\n[h5p id=\"23\"]\r\n<h3>Indexes versus Scales<\/h3>\r\nInstead of using the response to a single statement as a measure of some construct, indexes and scales combine several responses together to create a composite measure of a construct. An <strong>[pb_glossary id=\"380\"]index[\/pb_glossary]<\/strong> is a composite measure of a construct comprising several different indicators that produce a shared outcome (DeVellis &amp; Thorpe, 2021). For example, in a study on gambling, Wood and Williams (2007) examined the extent to which internet gamblers manifest a \u201cpropensity for problem gambling\u201d (i.e., the outcome). The propensity for problem gambling was assessed using nine items composing the Canadian Problem Gambling Index (CPGI). The CPGI consists of a series of nine questions, which follow prompts to consider only the preceding 12 months, including \u201cThinking about the last 12 months \u2026 have you bet more than you could really afford to lose?\u201d; \u201cStill thinking about the last 12 months, have you needed to gamble with larger amounts of money to get the same feeling of excitement?\u201d; and \u201cWhen you gambled, did you go back another day to try and win back the money you lost?\u201d The response categories for all nine items are sometimes (awarded a score of 1), most of the time (awarded a score of 2), almost always (scored as 3), or don\u2019t know (scored as zero). The scores are added up for the nine items, generating an overall score for propensity for problem gambling that ranges between 0 and 27, where 0 = non-problem gamblers, 1\u20132 = a low-risk for problem gambling, 3\u20137 = a moderate risk, and 8 or more = problem gamblers (Ferris &amp; Wynne, 2001). Although the index has several items, they all independently measure the same thing: the propensity to become a problem gambler. And while there are expected relationships between the items\u2014for instance, spending more than one can afford to lose is assumed to be associated with needing to gamble with larger amounts of money to get the same feeling of excitement\u2014the indicators are not derived from a single cause (DeVellis &amp; Thorpe, 2021). That is, a person may gamble more than they can afford to lose due to a belief in luck, while the same person might need to gamble larger amounts due to a thrill-seeking tendency. Regardless of their origins, the indicators result in a common outcome: the tendency to become a problem gambler. The higher the overall score on an index, the more of that trait or propensity the respondent has.\r\n\r\nIn contrast, a <strong>[pb_glossary id=\"554\"]scale[\/pb_glossary] <\/strong>is a composite measure of a construct consisting of several different indicators that stem from a common cause (DeVellis &amp; Thorpe 2021). For example, the Eysenck Personality Questionnaire\u2014Revised (EPQR) is a 48-item questionnaire designed to measure an individual\u2019s personality (the construct) via extroversion and neuroticism (Eysenck et al., 1985). Extroversion and neuroticism are two underlying potential causes for certain behavioural tendencies. Extroversion is the propensity to direct one\u2019s attention outward toward the environment. Thus, extroverts tend to be people who are outgoing or talkative. Sample yes\/no forced-choice items on the EPQR measuring extroversion include the statements \u201cAre you a talkative person?\u201d; \u201cDo you usually take the initiative in making new friends?\u201d; and \u201cDo other people think of you as being very lively?\u201d Neuroticism refers to emotional stability. For example, a neurotic person is someone who worries excessively or who might be described as moody. Sample questions include \u201cDoes your mood often go up and down?\u201d; \u201cAre you an irritable person?\u201d; and \u201cAre you a worrier?\u201d\r\n\r\nNote that there are some similarities between indexes and scales and that these terms are often used interchangeably (albeit incorrectly) in research! Both indexes and scales measure constructs\u2014for example, dimensions of personality, risk for problem gambling\u2014using nominal variables with categories such as yes\/no and presence\/absence or ordinal variables depicting intensity such as very dissatisfied or dissatisfied. In addition, both are composite measures, meaning they are made up of multiple items. However, there are also some important differences (see figure 4.3).\r\n\r\nWhile an index is always an accumulation of individual scores based on items that have no expected common cause, a scale is based on the assignment of scores to items that are believed to derive from a common cause. In addition, scales often comprise items that have logical relationships between them. Namely, someone who indicates on EPQR that they \u201calways take the initiative in making new friends\u201d and \u201calways get a party going\u201d is also very likely to \u201cenjoy meeting new people,\u201d but is very unlikely to be \u201cmostly quiet when with other people.\u201d In addition, specific items in a scale can indicate varying intensity or magnitude of a construct in a manner that is accounted for by the scoring. For example, the Bogardus social distance scale measures respondents\u2019 willingness to participate with members of other racial and ethnic groups (Bogardus, 1933). The items in the scale have different intensity, meaning certain items show more unwillingness to participate with members of other groups than others. For example, an affirmative response to the item \u201cI would be willing to marry outside group members\u201d indicates very low social distance (akin to low prejudice) and scores 1 point, whereas an affirmative response to \u201cI would have (outside group members) merely as speaking acquaintances\u201d scores 5, indicating more prejudice. A scale takes advantage of differences in intensity or the magnitude between indictors of a construct and weights them accordingly when it comes to scoring. In contrast, an index assumes that all items are different but of equal importance.\r\n\r\nSometimes it can be difficult to determine if an instrument is better classified as a scale or an index. For example, the Eating Attitudes Test (EAT-26) was developed by Garner and Garkfinkel (1979) as a self-report measure designed to help identify those at risk for an eating disorder such as anorexia nervosa. Taken together, it can be considered an index since it is based on items that do not have a single underlying cause since eating disorders can result from many different individual causes. Importantly, as required by an index, the indicators are used to derive a composite score for a common outcome (risk of anorexia) by summing up the scores obtained for all 26 independent items. However, the EAT-26 also contains three sub-scales, where certain items can be used to examine dimensions of anorexia that are believed to be the result of dieting, bulimia and food preoccupation, and oral control (common causes).<a id=\"retfig4.3\"><\/a>\r\n\r\n[caption id=\"attachment_1102\" align=\"aligncenter\" width=\"1024\"]<a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151.png\"><img class=\"wp-image-1102 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-1024x425.png\" alt=\"Figure 4.3. Comparing Index and Scale. Image description available.\" width=\"1024\" height=\"425\" \/><\/a> Figure 4.3. Comparing Index and Scale [Image description - <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.3\">See Appendix C Figure 4.3<\/a>][\/caption]\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>What property distinguishes the ordinal level of measurement from the nominal?<\/li>\r\n \t<li>What are the main properties and functions of the interval level of measurement?<\/li>\r\n \t<li>What special property does the ratio level have that distinguishes it from the interval level of measurement?<\/li>\r\n \t<li>In what ways are indexes and scales similar and different?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>CRITERIA FOR ASSESSING MEASUREMENT<\/h2>\r\nMeasurement often involves obtaining answers to questions posed by researchers. In some cases, the answers to questions might be very straightforward, as would be your response to the question \u201cHow old are you?\u201d But what if you were instead asked, \u201cWhat is your ethnicity?\u201d Would your answer be singular or plural? Would your answer reflect the country you were born in and\/or the one you currently reside in? Did you consider the origin of your biological father, mother, or both parents\u2019 ancestors (e.g., grandparents or great-grandparents)? Did you think about languages you speak other than English or any cultural practices or ceremonies you engage in? Ethnicity is a difficult concept to measure because it has different dimensions; it reflects ancestry in terms of family origin as well as identity in the case of more current personal practices. According to Statistics Canada (2017), if the intent of the study is to examine identity, then a question such as \u201cWith which ethnic group do you identify?\u201d is probably the best choice, since it will steer respondents to that dimension by having them focus on how they perceive themselves. To assess whether measures are \u201cgood\u201d ones, you can evaluate their reliability and validity.\r\n<h3>RELIABILITY<\/h3>\r\nAs a quantitative term,<strong> [pb_glossary id=\"522\"]reliability[\/pb_glossary] <\/strong>refers to the consistency in measurement. A measurement procedure is reliable if it can provide the same data at different points in time, assuming there has been no change in the variable under consideration. For example, a weigh scale is generally considered to be a reliable measure for weight. That is, if a person steps on a scale and records a weight, the person could step off and back on the scale and it should indicate the same weight a second time. Similarly, a watch or a clock\u2014barring the occasional power outage or worn-out battery\u2014is a dependable measure for keeping track of time on a 24-hour cycle (e.g., your alarm wakes you up for work at precisely 6:38 a.m. on weekdays). Finally, a specialized test can provide a reliable measure of a child\u2019s intelligence in the form of an intelligence quotient (IQ). IQ is a numerical score determined using an instrument such as the Wechsler Intelligence Scale for Children\u2014Fifth Edition (WISC-V), released by Pearson in 2014. The test consists of questions asked of a child by a trained psychologist who records the answers and then calculates scores to determine an overall IQ (Weschler, 2003). IQ is considered a reliable indicator of intelligence because it is stable over time. The average IQ for the general population is 100. A child who obtained an IQ score of 147 on the WISC-V at age eight would be classified as highly gifted. If that same person took an IQ test several years later, the results should also place the person in the highly gifted range. While a child could have a \u201cbad test\u201d day if they felt ill, was distracted, and so on, it is not reasonable to assume that the child guessed his way to a score of 147! Four ways to determine if a measure is reliable or unreliable are discussed below, including test-retest, split-half, inter-rater, and inter-item reliability.\r\n<h4>Test-Retest Reliability<\/h4>\r\nDemonstrating that a measure of some phenomenon, such as intelligence, does not change when the same instrument is administered at two different points in time is known as <strong>[pb_glossary id=\"610\"]test-retest reliability[\/pb_glossary]<\/strong>. Test-retest reliability is usually assessed using the Pearson product-moment correlation coefficient (represented by the symbol <em>r<\/em>). The correlation ranges between 0 and +1.00 or \u20131.00, representing the degree of association between two variables. The closer the value of <em>r<\/em> is to +1.00, the greater the degree or strength of the association between the variables. For example, an <em>r <\/em>of +.80 is higher than one that is +.64, 0 indicates no relationship between the two variables, and 1.00 indicates a perfect relationship. The positive or negative sign indicates the <em>direction<\/em> of a relationship. A plus sign indicates a positive relationship, where both variables go in the same direction. For example, an <em>r <\/em>of +.60 for the relationship between education and income tells us that as education increases, so does income. In the case of negative correlations, the variables go in opposite directions, such as an <em>r <\/em>= \u201354 for education and prejudice. With increased education, we can expect decreased prejudice.\r\n\r\nTo evaluate test-retest reliability, the correlation coefficient is denoting the relationship between the same variable measured at time 1 and time 2. The correlation coefficient (also called a reliability coefficient) should have a value of<em> .<\/em>80 or greater to indicate good reliability (Cozby et al., 2020). Test-retest reliability is especially important for demonstrating the accuracy of new measurement instruments. Currently, the identification of gifted children is largely restricted to outcomes determined by standardized IQ tests administered by psychologists (Pfeiffer et al., 2008). Expensive IQ tests are only funded by the school system for a small fraction of students, usually identified early on as having special needs. This means most students are never tested, and many gifted children are never identified as such. An alternative instrument, the Gifted Rating Scales (GRS), published by PsychCorp\/Harcourt Assessment, is based on teacher ratings of various abilities, such as student intellectual ability, academic ability, artistic talent, leadership ability, and motivation. Test-retest reliability coefficients for this assessment tool\u2019s various scales were high, as reported in the test manual. For example, the coefficient for the Academic Ability scale used by teachers on a sample of 160 children aged 12.00 to 13.11 years old and reapplied approximately a week later was .97 (Pfeiffer et al., 2008).\r\n<h4>Split-Half Reliability<\/h4>\r\nAn obvious critique of test-retest reliability concerns the fact that since participants receive the same test twice or observers provide ratings of the same phenomenon at close intervals in time, the similarity in results could have more to do with memory for the items than the construct of interest. An alternative to the test-retest method that provides a more independent assessment of reliability is the <strong>[pb_glossary id=\"586\"]split-half reliability[\/pb_glossary]<\/strong> approach. Using this method, a researcher provides exactly half of the items at time 1 (e.g., only the odd-numbered items or a random sample of the questions on a survey) and the remaining half at time 2. In this case, the researcher compares the two halves for their degree of association.\r\n<h4>Inter-Rater Reliability<\/h4>\r\nAnother way to test for the reliability of a measure is by comparing the results obtained on one instrument provided by two different observers. This is called <strong>[pb_glossary id=\"398\"]inter-rater reliability[\/pb_glossary]<\/strong> (and interchangeably inter-judge-, inter-coder-, or inter-observer reliability). Inter-rater reliability is the overall percentage of times two raters agree after examining each pair of results. Using the IQ example above, two different teachers would provide assessments of the students on the various indicators of giftedness and then the two sets of responses would be compared. If two different teachers agree most of the time that certain children exhibit signs of giftedness, we can be more confident that the scales are identifying gifted children as opposed to showing the biases of a teacher toward their students.\r\n\r\nA statistical test called Cohen\u2019s kappa is usually employed to test inter-rater reliability because it takes into account the percentage of agreement as well as the number of times raters could be expected to agree just by chance alone (Cohen, 1960).[footnote] Cohen\u2019s kappa is generally used only with nominal variables. If the variables of interest are at the ordinal or interval\/ratio level, Krippendorff\u2019s alpha is recommended (Lombard et al., 2002).[\/footnote]<strong>\u00a0<\/strong>Given the conservative nature of this test, Landis and Koch (1977) recommend considering coefficients of between .61 and .80 as substantial and .81 and over as indicative of near-to-perfect agreement. Building on the earlier example, the test manual for the GRS reported an inter-rater reliability of .79 for the academic ability of children aged 6.00 to 9.11 years old, based on the ratings of two different teachers for 152 students (Pfeiffer et al., 2008).\r\n<h4>Inter-Item Reliability<\/h4>\r\nLastly, when researchers use instruments that contain multiple indicators of a single construct, it is also possible to assess <strong>[pb_glossary id=\"392\"]inter-item reliability[\/pb_glossary]<\/strong>. Inter-item reliability (also called internal-consistency reliability) refers to demonstrated associations among multiple items representing a single construct, such as giftedness. First, there should be close correspondence between items evaluating a single dimension. For example, students who score well above average on an item indicating intellectual ability (e.g., verbal comprehension) should also score well above average on other items making up the intellectual ability scale (e.g., memory, abstract reasoning). The internal consistency of a dimension such as intellectual ability can be assessed using Cronbach\u2019s alpha, a coefficient ranging between 0 and 1.00, which considers how pairs of items relate to one another (co-vary), the variance in the overall measure, and how many items there are (Cronbach, 1951).\r\n\r\nIn addition, since giftedness is a broad-ranging, multidimensional construct that is usually defined to mean more than just intellectual ability, students who score high on the dimension of intellectual ability should also score high on other dimensions of giftedness, such as academic ability (e.g., math and reading proficiency) and creativity (e.g., novel problem solving). Pfeiffer et al. (2008) reported a correlation coefficient of .95 between intellectual ability and academic ability and one of .88 between intellectual ability and creativity using the GRS. The four approaches for assessing reliability that were discussed in this section are summarized in figure 4.4.<a id=\"retfig4.4\"><\/a>\r\n\r\n[caption id=\"attachment_1103\" align=\"aligncenter\" width=\"1024\"]<a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4.png\"><img class=\"wp-image-1103 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-1024x465.png\" alt=\"Figure 4.4. Distinguishing among Techniques Used to Assess Reliability. Image description available.\" width=\"1024\" height=\"465\" \/><\/a> Figure 4.4. Distinguishing Among Techniques Used to Assess Reliability [Image description - <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.4\">See Appendix C Figure 4.4<\/a>][\/caption]\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Research on the Net<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<strong>Inter-Rater Reliability<\/strong>\r\n\r\nFor more information on inter-rater reliability and what Cohen's Kappa is and how it is calculated, check out this video by DATAtab: <a href=\"https:\/\/youtu.be\/z4CiQPV0Mgw?si=PSL5ui_fFw1Mid5W\">Cohen's Kappa (Inter-Rater-Reliability)<\/a>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>What is reliability? Provide an example of a reliable measure used in everyday life.<\/li>\r\n \t<li>What is the main difference between test-retest reliability and split-half reliability?<\/li>\r\n \t<li>What type of reliability renders the same findings provided by two different observers?<\/li>\r\n \t<li>What type of reliability refers to demonstrated associations among multiple items representing a single construct?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h3>VALIDITY<\/h3>\r\nPerhaps even more important than ensuring consistency in measurement, we need to be certain that we are measuring the intended construct of interest. <strong>[pb_glossary id=\"638\"]Validity[\/pb_glossary]<\/strong> is a term used by quantitative researchers to refer to the extent to which a study examines what it intends to. Not all reliable measures are valid. We might reliably weigh ourselves with a scale that consistency tells us the wrong weight because the dial was set two kilograms too high. Similarly, we may depend upon an alarm clock that is consistently ahead of schedule by a few minutes because it was incorrectly programmed. In this section, you will learn about four methods for evaluating the extent to which a given measure is measuring what it is intended to using face validity, content validity, construct validity, and criterion validity.\r\n<h4>Face Validity<\/h4>\r\nFirst, in trying to determine if a measure is a good indicator of an intended construct, we can assess the measure\u2019s face validity. [pb_glossary id=\"343\"]<strong>Face validity<\/strong>[\/pb_glossary] refers to the extent to which an instrument or variable appears on the surface or \u201cface\u201d to be a good measure of the intended construct. Grade point average, for example, appears to be a pretty good measure of a student\u2019s scholastic ability, just as net yearly income seems like a valid measure of financial wealth. Your criteria for determining whether something has face validity is an assessment of whether the operationalization used is logical. For example, in the case of giftedness, most teachers would agree that children who exhibit very superior intellectual ability (i.e., the ability to reason at high levels) also tend to exhibit very superior academic ability (e.g., the ability to function at higher than normal levels in specific academic areas, such as math or reading).\r\n<h4>Content Validity<\/h4>\r\n<strong>[pb_glossary id=\"245\"]Content validity[\/pb_glossary]<\/strong> refers to the extent to which a measure includes the full range or meaning of the intended construct. To adequately assess your knowledge of the general field of psychology, for example, a test should include a broad range of topics, such as how psychologists conduct research, the brain and mental states, sensation, perception, and learning. While a person might not score evenly across all areas of psychology (e.g., a student might score 20 out of 20 on the questions related to sensation and perception and only 15 out of 20 on items about research methods), the test result (35\/40) should provide a general measure of knowledge regarding introductory psychology. Similarly, the Gifted Rating Scale discussed earlier is an instrument designed to identify giftedness that includes not only items related to intellectual ability but also content pertaining to the dimensions of academic ability, creativity, leadership, and motivation. This is not to say that a person scoring in the gifted range will achieve the same ratings on all items. For example, it is possible for a gifted child to score in the very superior range for intellectual and academic ability as well as creativity but score only superior for motivation and average for leadership. However, when taken together, the overall (i.e., full-scale) IQ score is 130 or greater for a gifted individual.\r\n<h4>Construct Validity<\/h4>\r\nAnother way to assess validity is through <strong>[pb_glossary id=\"241\"]construct validity[\/pb_glossary]<\/strong>, which examines how closely a measure is associated with other measures that are expected to be related, based on prior theory. For example, Gottfredson and Hirschi\u2019s (1990) general theory of crime rests on the assumption that a failure to develop self-control is at the root of most impulsive and even criminal behaviours. Impulsivity, as measured by school records, such as report cards, should then correspond with other impulsive behaviours, such as deviant and\/or criminal acts. If a study fails to show the expected association (e.g., perhaps children who fail to complete assignments or follow rules in school as noted on reports cards do not engage in higher levels of criminal or deviant acts relative to children who appear to have more self-control in the classroom), then the measures of missed assignments and an inability to follow rules may not be valid indicators of the construct. That is, the items stated on a report card, for example, incomplete assignments, may be measuring something other than impulsivity, such as academic aptitude, health issues, or attention-deficit problems. In this case, a better school indicator of impulsivity might be self-reported ratings of disruptive behaviour by the students themselves or teachers\u2019 ratings of student impulsivity rather than the behavioural measures listed on a report card. Alternatively, behavioural measures from other areas of a person\u2019s life, such as a history of unstable relationships or a lack of perseverance in employment, may be better arenas for assessing low self-control than the highly monitored and structured early school environment.\r\n<h4>Criterion Validity<\/h4>\r\nFinally, a measure of some construct of interest can be assessed against an external standard or benchmark to determine its worth using what is called <strong>[pb_glossary id=\"265\"]criterion validity[\/pb_glossary]<\/strong>. We can readily anticipate that students who are excelling are also more likely to achieve academic awards, such as scholarships, honours, or distinction, and go on to higher levels. Academic ability as measured by grades or grade point averages is predictive of future school and scholastic success. Similarly, consider how most research-methods courses at a university or college have a prerequisite, such as a minimum grade of C\u2013 in a 200-level course. The prerequisite indicates basic achievement in the course. It is the cut-off for predicting future success in higher-level courses in the same discipline. The prerequisite has criterion validity if most students with the prerequisite end up successful navigating their way through research methods. All four types of validity are summarized in figure 4.5.<a id=\"retfig4.5\"><\/a>\r\n\r\n[caption id=\"attachment_1104\" align=\"aligncenter\" width=\"1024\"]<a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5.png\"><img class=\"wp-image-1104 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-1024x534.png\" alt=\"Figure 4.5. Distinguishing among Techniques Used to Assess Validity. Image description available.\" width=\"1024\" height=\"534\" \/><\/a> Figure 4.5. Distinguishing among Techniques Used to Assess Validity [Image description - <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.5\">See Appendix C Figure 4.5<\/a>][\/caption]\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>Are all reliable measures valid? Explain your answer.<\/li>\r\n \t<li>What does it mean to say a measure has face validity?<\/li>\r\n \t<li>What does content validity assess?<\/li>\r\n \t<li>Which type of validity is based on the prediction of future events?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h3>Activity: Reliability and Validity<\/h3>\r\n[h5p id=\"24\"]\r\n<h2>RANDOM AND SYSTEMATIC ERRORS<\/h2>\r\nResearchers and research participants are potential sources of measurement error. Think about the last time you took a multiple-choice test and accidently entered a response of <em>d <\/em>when you intended to put <em>e<\/em>, or when you rushed to finish an exam and missed one of the items in your answer because you didn\u2019t have time to re-read the instructions or your answers before handing in the test. Similarly, errors occur in research when participants forget things, accidently miss responses, and otherwise make mistakes completing research tasks. Also, researchers produce inconsistencies in any number of ways, including by giving varied instructions to participants, by missing something relevant during an observation, and by entering data incorrectly into a spreadsheet (where a 1 might become an 11). Errors that result in unpredictable mistakes due to carelessness are called <strong>[pb_glossary id=\"512\"]random errors[\/pb_glossary]<\/strong>. Random errors made by participants can be reduced by simplifying the procedures (e.g., participants make fewer mistakes if instructions are clear and easy to follow and if the task is short and simple). Even researchers\u2019 and observers\u2019 unintentional mistakes can be reduced by using standardized procedures, simplifying the task as much as possible, training observers, and using recording devices or apparatus other than people to collect first-hand data (e.g., replaying an audio recording for verification following an interview). Random errors mostly influence reliability since they work against consistency in measurement.\r\n\r\nIn contrast to random errors, <strong>[pb_glossary id=\"602\"]systematic errors[\/pb_glossary]<\/strong> refer to ongoing inaccuracies in measurement that come about through deliberate effort. For example, a researcher who expects or desires a finding might behave in a manner that encourages such a response in participants. Expecting a treatment group to perform better than a control group, a researcher might interpret responses more favourably in the treatment group and unjustifiably rate them higher. The use of standardized procedures, such as scripts and objective measures that are less open to interpretation, can help reduce researcher bias. In addition, it might be possible to divide participants into two groups without the researcher being aware of the groups until after the performance scores are recorded.\r\n\r\nStudy participants make other types of intentional errors, including ones resulting from a social desirability bias. Respondents sometimes provide untruthful answers to present themselves more favourably. Just as people sometimes underestimate the number of cigarettes they smoke when asked by a family physician at an annual physical examination, survey respondents exaggerate the extent to which they engage in socially desirable practices (e.g., exercising, healthy eating) and minimize their unhealthy practices (e.g., overuse of non-prescription pain medicine, binge drinking). Researchers using a questionnaire to measure a construct sometimes build in a lie scale along with the other dimensions of interest. For example, in the Eysenck Personality Questionnaire\u2014Revised (EPQR), there are 12 lie-detection items, including the statements \u201cIf you say you will do something, do you always keep your promise no matter how inconvenient it might be?\u201d; \u201cAre all your habits good and desirable ones?\u201d; and \u201cHave you ever said anything bad or nasty about anyone?\u201d A score of 5 or more indicates social desirability bias (Eysenck et al., 1985).\r\n\r\nSimilarly, participants in experimental research sometimes follow what Martin Orne (1962) called demand characteristics or environmental cues, meaning they pick up on hints about what a study is about and then try to help along the researchers and the study by behaving in ways that support the hypothesis. Systematic errors influence validity since they reduce the odds that a measure is gauging what it is truly intended to.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>Who is a potential source of error in measurement?<\/li>\r\n \t<li>Which main form of errors can be reduced by simplifying the procedures in a study?<\/li>\r\n \t<li>What is the term for the bias that results when respondents try to answer in the manner that makes them look the most favourable?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>RIGOUR IN QUALITATIVE RESEARCH<\/h2>\r\nWhile it is important for anyone learning about research to understand the centrality of reliability and validity criteria for assessing measurement instruments, it is also imperative to note that much of what has been discussed in this chapter pertains mainly to quantitative research that is based in the positivist paradigm. Qualitative research, largely based in the interpretative and critical paradigms, is aimed at understanding socially constructed phenomena in the context in which it occurs at a specific point in time. It is therefore less concerned with the systematic reproducibility of data. In many cases, statements provided by research participants or processes studied cannot be replicated to assess reliability. Similarly, if we are to understand events from the point of view of those experiencing them, validity is really in the eyes of the individual actor for whom that understanding is real. That is not to say reliability and validity are not relevant in qualitative research; in fact, if we conclude that these constructs are not applicable to qualitative research, then we run the risk of suggesting that qualitative inquiry is without <strong>[pb_glossary id=\"542\"]rigour[\/pb_glossary]<\/strong>. As defined by Gerard A. Tobin and Cecily M. Begley (2004), \u201crigour is the means by which we show integrity and competence; it is about ethics and politics, regardless of the paradigm\u201d (p. 390). This helps to legitimize a qualitative research process.\r\n\r\nJust as various forms of reliability and validity are used to gauge the merit of quantitative research, other criteria such as rigour, credibility, and dependability can be used to establish the trustworthiness of qualitative research. <strong>[pb_glossary id=\"263\"]Credibility[\/pb_glossary] <\/strong>(comparable to validity) has to do with how well the research tells the story it is designed to. For example, in the case of interview data, this pertains to the goodness of fit between a respondent\u2019s actual views of reality and a researcher\u2019s representations of it (Tobin &amp; Begley, 2004). Credibility can be enhanced through the thoroughness of a literature review and open coding of data. For example, in the case of a qualitative interview, the researcher should provide evidence of how conclusions were reached. <strong>[pb_glossary id=\"291\"]Dependability[\/pb_glossary]<\/strong> is a qualitative replacement for reliability and this \u201cis achieved through a process of auditing\u201d (Tobin &amp; Begley, 2004, p. 392). Qualitative researchers ensure their research processes, decisions, and interpretation can be examined and verified by other interested researchers through <strong>[pb_glossary id=\"190\"]audit trails[\/pb_glossary]<\/strong>. Audit trails are carefully documented paper trails of an entire research process, including research decisions such as theoretical clarifications made along the way. Transparency, detailed rationale, and justifications all help to establish the later reliability and dependability of findings (Liamputtong, 2013).\r\n\r\nSimilarly, while questions of measurement and the operationalization of variables may not apply to qualitative research, questions concerning how the research process was undertaken are essential. For example, in a study using in-depth interviews, were the questions posed to the respondents in a culturally sensitive manner that was readily understood by them? Did the interview continue until all important issues were fully examined (i.e., saturation was reached)? Were the researchers appropriately reflective in considering their own subjectivity and how it may have influenced the questions asked, the impressions they formed of the respondents, and the conclusions they reached from the findings (Hennick et al., 2011)? Qualitative researchers acknowledge subjectivity and accept researcher bias as an unavoidable aspect of social research. Certain topics are examined specifically because they interest the researchers! To reconcile biases with empirical methods, qualitative researchers openly acknowledge their preconceptions and remain transparent and reflective about the ways in which their own views may influence research processes.\r\n<h3>Achieving Rigour through Triangulation<\/h3>\r\nOne of the main ways rigour is achieved in qualitative research is by using <strong>[pb_glossary id=\"630\"]triangulation[\/pb_glossary]<\/strong>. Triangulation is the use of multiple methods to establish what can be considered the qualitative equivalent of reliability and validity (Willis, 2007). For example, we can be more confident in data collected on aggressive behavioural displays in children if data obtained from field notes taken during observations closely corresponds with interview statements made by the children themselves. We can also be more confident in the findings when multiple sources converge (i.e., <strong>[pb_glossary id=\"280\"]data triangulation[\/pb_glossary]<\/strong>), as might be the case if the children, teachers, and parents all say similar things about the behaviour of those being studied. Since the data comes from various sources with different perspectives, the data itself can also exist in a variety of forms, from comments made by parents and teachers, to actions undertaken by children, to school records and other documents, such as report cards.\r\n<h3>Other Means for Establishing Rigour<\/h3>\r\nVarious alternative strategies to triangulation that help to establish rigour in qualitative studies include the use of member checks, prolonged time spent with research participants in a research setting, peer debriefing, and audit checking (Liamputtong, 2013; Willis, 2007). <strong>[pb_glossary id=\"420\"]Member checks[\/pb_glossary]<\/strong> are attempts by a researcher to validate emerging findings by testing out their accuracy with the originators of that data while still in the field. For example, researchers might share observational findings with the group being studied to see if the participants concur with what is being said. It helps to validate the data if the participants agree that their perspective is being appropriately conveyed by the data. Whenever I conduct interviews with small groups (called focus groups, as discussed in chapter 9), I share the preliminary findings with the group and ask them whether the views I am expressing capture what they feel is important and relevant, given the research objectives. I also ask whether the statements I\u2019ve provided are missing any information that they feel should be included to more fully explain their views or address their concerns about the topic.\r\n\r\nQualitative researchers also gain a more informed understanding of the individuals, processes, or cultures they are studying if they spend prolonged periods of time in the field. Consider how much more you know about your fellow classmates at the end of term compared to what you know about the group on the first day of classes. Similarly, over time, qualitative researchers learn more and more about the individuals and processes of interest once they gain entry to a group, establish relationships, build trust, and so on. Time also aids in triangulation as researchers are better able to verify information provided as converging sources of evidence are established.\r\n\r\nIn addition to spending long periods of time in the field and testing findings via their originating sources, qualitative researchers also substantiate their research by opening it up to the scrutiny of others in their field.<strong> [pb_glossary id=\"468\"]Peer debriefing[\/pb_glossary] <\/strong>involves attempts to authenticate the research process and findings through an external review provided by another qualitative researcher who is not directly involved in the study (Creswell &amp; Creswell, 2018). This process helps to verify the procedures undertaken and substantiate the findings, lending overall credibility to the study. Note that reflexivity and other features underlying ethnographic research are discussed in detail in chapter 10, while multiple methods and mixed-methods approaches are the subject matter of chapter 11.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Test Yourself<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>What is the qualitative term for validity?<\/li>\r\n \t<li>How do qualitative researchers ensure their research processes and conclusions reached can be verified by other researchers?<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>CHAPTER SUMMARY<\/h2>\r\n<ol>\r\n \t<li><strong>Describe the main components of a research design.\r\n<\/strong>A research design details the main components of a study, including who (the unit of analysis), what (the attitudes or behaviours under investigation), where (the location), when (at one or multiple points in time), why (e.g., to explain), and how (the specific research method used).<\/li>\r\n \t<li><strong style=\"text-align: initial; font-size: 1em;\">Explain what conceptualization and operationalization processes entail.<\/strong>\r\nConceptualization is the process whereby a researcher explains what a concept, such as family, or a construct, like social inequality, means within a research project. Operationalization is the process whereby a concept or construct is defined so precisely it can be measured in a study. For example, financial wealth can be operationalized as net yearly income in dollars.<\/li>\r\n \t<li><strong>Explain how the purpose of a variable is directly related to how it is measured in research.\r\n<\/strong>Variables are measured at the nominal, ordinal, interval, and ratio level. The nominal level of measurement is used to classify case, while the ordinal level has the property of classification and rank order. The interval level provides the ability to classify, order, and make precise comparisons as a function of equal intervals. The ratio level includes previous properties and a true zero. An index is a composite measure of a construct comprising several different indicators that produce a shared outcome, while a scale is a composite measure of a construct consisting of several different indicators that stem from a common cause.<\/li>\r\n \t<li><strong>Outline the main techniques used to assess reliability and validity.\r\n<\/strong>Reliability refers to consistency in measurement. <em>Test-retest<\/em> reliability examines consistency between the same measures for a variable at two different times using a correlation coefficient. <em>Inter-rater<\/em> reliability examines consistency between the same measures for a variable of interest provided by two different raters, often using Cohen\u2019s kappa. <em>Split-half<\/em> reliability examines consistency between both halves of the measures for a variable of interest. <em>Inter-item<\/em> reliability involves demonstrated associations among multiple items representing a single construct. Validity refers to the extent to which a measure is a good indicator of the intended construct. <em>Face <\/em>validity refers to the extent to which an instrument appears to be a good measure of the intended construct. <em>Content <\/em>validity assesses the extent to which an instrument contains the full range of content pertaining to the intended construct. <em>Construct<\/em> validity assesses the extent to which an instrument is associated with other logically related measures of the intended construct. <em>Criterion<\/em> validity assesses the extent to which an instrument holds up to an external standard, such as the ability to predict future events.<\/li>\r\n \t<li><strong>Distinguish between random and systematic errors.\r\n<\/strong>Random errors are unintentional and usually result from careless mistakes, while systematic errors result from intentional bias. Sources of both types of errors include participants, researchers, and observers in a study. Errors can be reduced through training, the use of standardized procedures, and the simplification of tasks.<\/li>\r\n \t<li><strong>Explain how rigour is achieved in qualitative research.\r\n<\/strong>Rigour refers to a means for demonstrating integrity and competence in qualitative research. Rigour can be achieved using triangulation, member checks, extended experience in an environment, peer review, and audit trails.<\/li>\r\n<\/ol>\r\n<h2>RESEARCH REFLECTION<\/h2>\r\n<ol>\r\n \t<li>Suppose you want to conduct a quantitative study on the success of students at the post-secondary institution you are currently attending. List five variables that you think would be relevant for inclusion in the study. Generate one hypothesis you could test using two of the variables you\u2019ve listed above. Operationalize the variables you included in your proposed hypothesis.<\/li>\r\n \t<li>Studies on the health of individuals often operationalize health as self-reported health using these five fixed response categories: poor, fair, good, very good, and excellent. What level of measurement is this? Provide an example of health operationalized into two categories measured at the nominal level and three categories at the ordinal level. Is it possible to measure health at the interval level? Justify your answer.<\/li>\r\n \t<li>Consider some of the variables that can be used to examine the construct of scholastic ability (e.g., grades, awards, and overall grade point average). Which measure do you think best represents scholastic ability? Is the measure reliable and\/or valid? Defend your answer with examples that reflect student experiences.<\/li>\r\n \t<li>Define the construct of honesty and come up with an indicator that could be used to gauge honesty. Compare your definition and indicator with those of at least three other students in the class. Are the definitions similar? Consider how each definition reflects a prior conceptualization process.<\/li>\r\n<\/ol>\r\n<h2>LEARNING THROUGH PRACTICE<\/h2>\r\nObjective: To construct an index for students at risk for degree incompletion\r\n\r\nDirections:\r\n<ol>\r\n \t<li>Item selection: Develop 10 statements that can be answered with a forced-choice response of <em>yes <\/em>or <em>no<\/em>, where <em>yes<\/em> responses will receive 1 point and <em>no<\/em> responses will be awarded 0 points. Select items that would serve as good indicators of students at risk for failing to complete their program of study. Think of behaviours or events that would put a student at risk for dropping out or being asked to leave a program, such as failing a required course. Make sure your items are one-dimensional (i.e., they only measure one behaviour or attitude).<\/li>\r\n \t<li>Try out your index on a few of your classmates to see what scores you obtain for them. Is there any variability in the responses? Do some students score higher or lower than others?<\/li>\r\n \t<li>Come up with a range of scores you feel represent no risk, low risk, moderate risk, and high risk. Justify your numerical scoring.<\/li>\r\n<\/ol>\r\n<h2>RESEARCH RESOURCES<\/h2>\r\n<ol>\r\n \t<li>For more information on the four types of validity discussed in this chapter, see Middleton, F. (2023, June 22). <a href=\"https:\/\/web.archive.org\/web\/20240712181655\/https:\/\/www.scribbr.com\/methodology\/types-of-validity\/\">The 4 types of validity in research. Definitions and examples<\/a>. <em>Scribbr.<\/em><\/li>\r\n \t<li><span style=\"background-color: #ffff00;\"><span style=\"background-color: #ffffff;\">To learn more about rigour in research, refer to <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/2022-12-07_feast-centre_rigour-in-research.pdf\">\"The Feast Centre Learning Series: 'Rigour' in Research Proposals\"<\/a> at McMaster University.<\/span>\r\n<\/span><\/li>\r\n \t<li>For an in-depth look at scale development, see DeVellis, R. F. and Thorpe, C. T. (2022). <em><a href=\"https:\/\/search.worldcat.org\/en\/title\/1342443149\">Scale development: Theory and applications<\/a> <\/em>(5th ed.). Sage.<\/li>\r\n \t<li>To learn about a new online gambling index based on 12 items, see Auer, M. et al. (2024). <a href=\"https:\/\/doi.org\/10.1177\/01632787231179460\">Development of the Online Problem Gaming Behavior Index.<\/a> <em>Evaluation &amp; the Health Professions, 47<\/em>(1), 81-92.<\/li>\r\n<\/ol>","rendered":"<p style=\"padding-left: 40px;\"><em>In any discussion about improving measurement, it is important to begin with basic questions. What exactly are we trying to measure, and why? <\/em><\/p>\n<p class=\"hanging-indent\" style=\"padding-left: 80px;\">\u2014 Christine Bachrach, 2007, p. 435<\/p>\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Learning Objectives<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>After reading this chapter, students should be able to do the following:<\/p>\n<ol>\n<li>Describe the main components of a research design.<\/li>\n<li>Explain what conceptualization and operationalization processes entail.<\/li>\n<li>Explain how the purpose of a variable is directly related to how it is measured in research.<\/li>\n<li>Outline the main techniques used to assess reliability and validity.<\/li>\n<li>Distinguish between random and systematic errors.<\/li>\n<li>Explain how rigour is achieved in qualitative research.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<h2>INTRODUCTION<\/h2>\n<p>After carefully considering research foundations, the importance of theory, and the ethics involving research with humans, you are almost ready to delve into the techniques used for obtaining answers to social research questions. But before you can start collecting data, you need to develop a research plan that outlines who or what it is you are studying and how you will go about measuring and evaluating the attitudes, behaviours, or processes that you want to learn more about.<\/p>\n<h2>MAIN COMPONENTS OF A RESEARCH DESIGN<\/h2>\n<p>Linked to a specific research question, a <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_526\">research design<\/a><\/strong> \u201cis the plan or blueprint for a study and includes the who, what, where, when, why, and how of an investigation\u201d (Hagan, 2021, Chapter 3, para 1.). Beginning with the \u201cwho,\u201d researchers in the social sciences most often study people, so individuals are usually the focus of investigation, called the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_632\">unit of analysis<\/a><\/strong>. Individuals are often studied as part of a collective or group. For example, the unit of analysis could be students, employees, single-parent-headed families, low-income earners, or some other group of interest. Researchers also compare groups of individuals along particular variables of interest, such as a sample of individuals with less than a high school education versus those who have a grade 12 diploma, single- versus dual-income families, or patients who completed a treatment program versus ones who dropped out. Social institutions and organizations that guide individuals and groups can also be the units of analysis for research, such as the university you are attending, a not-for-profit agency such as the Canadian Red Cross, or a healthcare organization such as the Canadian Medical Association. Finally, social researchers are sometimes interested in artifacts created by people rather than people themselves. For example, researchers might examine news articles, television shows, motion pictures, profiles on an online dating site, YouTube videos, Facebook postings, or X tweets.<\/p>\n<p>The \u201cwhat\u201d component of research design refers to whatever is specifically examined and measured in a study, such as attitudes, beliefs, views, behaviours, outcomes, and\/or processes. The measured component is usually referred to as the \u201cunit of observation\u201d since this is what the data is collected on. For example, a researcher might be interested in factors affecting instructors\u2019 views on published ratings of instruction. Instructors at a university who take part in the study constitute the unit of analysis from whom the views on published ratings are obtained (i.e., the more specific focus of the research that comprises the data). Similarly, a researcher might be interested in dating preferences of individuals who use online matchmaking sites such as eharmony or EliteSingles. The individuals seeking dates online are the units of analysis, while their posted profiles containing the characteristics of interest in the study are the units of observation.<\/p>\n<p>The \u201cwhere\u201d pertains to the location for a study. The possibilities are endless, from research conducted first-hand in the field to studies conducted in a public setting such as a coffee shop or a private one such as the home of a participant. The location for a study is closely linked to the unit of analysis since the researcher often needs to go to the individuals, groups, or organizations to collect information from them or about them. For example, a researcher interested in interviewing couples who met online might set up appointments to visit dating couples in restaurants or coffee shops near to where the couples reside, at the discretion of the participants. Alternatively, a researcher interested in online dating relationships might choose to gather information in a virtual environment by posting a survey on the internet.<\/p>\n<p>The \u201cwhen\u201d relates to an important time consideration in the research design. Some studies are conducted once at a single point in time, and others are carried out at multiple points over time. Most often, when social researchers carry out a study, it takes place at a single point in time or within a single time frame, such as when a researcher develops a questionnaire about internet dating and administers it to a group of people attending a singles\u2019 social function. This is called <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_269\">cross-sectional research<\/a><\/strong> because the study is made up of a cross-section of the dating population taken at a point in time, like taking a photo that captures a person or group at a single point in time. Alternatively, social researchers sometimes study individuals or groups at multiple points in time using what is called <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_412\">longitudinal research<\/a><\/strong>. For example, a sample of dating couples might be surveyed shortly after they meet and again after dating for one year to capture their initial viewpoints and to see how their perceptions change once they get to know each other better. Four longitudinal designs are discussed below.<\/p>\n<p>A <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_458\">panel study<\/a><\/strong> is a longitudinal study in which a researcher observes the same people, group, or organization at multiple points in time. A new phase of a large-scale panel study called Alberta\u2019s Tomorrow Project was launched in 2017 to learn more about the causes of cancer. The study, active since 2000, includes 55,000 cancer-free Albertans (Alberta\u2019s Tomorrow Project, 2024b). Researchers from Alberta\u2019s Tomorrow Project collect information on participants\u2019 health and lifestyle through surveys and the occasional collection of other specimens, such as blood samples. Going back to the dating example, by collecting information on the same dating couples at various intervals, a researcher could use a panel study to examine how relations evolve or change over time.<\/p>\n<p>A <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_218\">cohort study<\/a><\/strong> is like a panel study, except, rather than focusing on the same people, it focuses on comparable people over time. \u201cComparable\u201d refers to a general category of people who are deemed to share a similar life experience in a specified period. The most common example of this is a birth cohort. Think about the classmates you went through school with. Although there were individual differences between you and your classmates, you also shared some common life events, such as the music that topped the charts at that time, the clothing fads and hairstyles, and the political events that occurred during that era. Following the earlier example, a researcher might include people who met online and married their dating partner in 2020 as Covid-19 was unfolding as a unit of analysis. Several couples might be studied in 2020 shortly after they were married, several other couples who also married in 2020 might be studied in 2021, another group in 2022, another group in 2023, and so on over a period of five years.<\/p>\n<p>A <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_622\">time-series study<\/a><\/strong> is a longitudinal study in which a researcher examines different people at multiple points in time. Every year, Statistics Canada gathers information from thousands of Canadians using what is called the General Social Survey. Although the participants are different each time, similar forms of information are gathered over time to detect patterns and trends. For example, we can readily discern that Canadians today are delaying marriage (i.e., getting married for the first time at a later age) and they are having fewer children relative to 20 years ago. Going back to the online dating example used throughout this section, a researcher might use a time-series study in order gather information from new dating couples at an online site each year for five years. Looking at the data over time, a researcher would be able to determine if there are changes in the overall profiles of online daters at that site. For example, the use of the site might be increasing for groups such as highly educated women or single individuals over the age of 60 years.<\/p>\n<p>Lastly, a<strong> <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_204\">case study<\/a><\/strong> is a research method in which a researcher focuses on a small number of individuals or even a single person over an extended period. You\u2019ll learn more about this method in Chapter 11. For now, you can think of a case study as a highly detailed study of a single person, group, or organization over time. For example, a researcher might study an alcoholic to better understand the progression of the disease over time, or a researcher might join a subculture, such as a Magic card club that meets every Friday night, to gain an insider\u2019s perspective of the group. Similarly, a researcher could examine the experiences of a frequent online dater over time to get a sense of how online dating works at various sites. See figure 4.1 for an overview of time considerations in relation to units of analysis.<a id=\"retfig4.1\"><\/a><\/p>\n<figure id=\"attachment_1100\" aria-describedby=\"caption-attachment-1100\" style=\"width: 1022px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1100 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-1022x1024.png\" alt=\"Figure 4.1. Time Considerations by Units of Analysis. Image description available.\" width=\"1022\" height=\"1024\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-1022x1024.png 1022w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-300x300.png 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-150x150.png 150w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-768x769.png 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-1533x1536.png 1533w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-2044x2048.png 2044w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-65x65.png 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-225x225.png 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.1-350x351.png 350w\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" \/><\/a><figcaption id=\"caption-attachment-1100\" class=\"wp-caption-text\">Figure 4.1. Time Considerations by Units of Analysis [Image description &#8211; <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.1\">See Appendix C Figure 4.1<\/a>]<\/figcaption><\/figure>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Research in Action<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"indent\"><strong>Longitudinal Panel Study on Smokers\u2019 Perceptions of Health Risks<\/strong><\/p>\n<p class=\"indent\">Cho and colleagues (2018) examined changes in smokers\u2019 perceptions about health risks following amendments to warning labels on cigarette packages using a longitudinal panel study. Four thousand six hundred and twenty-one smokers from Canada, Australia, Mexico, and the United States were surveyed every four months for a total of five times to determine if their knowledge of health risks increased after toxic constituents were added to pictorial health warning labels. Results showed that knowledge of toxic constituents such as cyanide and benzene increased over time and was associated with a stronger perceived risk of vulnerability to smoking-related diseases, including bladder cancer and blindness, for participants in Canada, Australia, and Mexico, but not the United States (Cho et al., 2018).<\/p>\n<figure id=\"attachment_1087\" aria-describedby=\"caption-attachment-1087\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1087 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-1024x683.jpg\" alt=\"Cigarettes showing warning labels and ashtray with butts.\" width=\"1024\" height=\"683\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-1024x683.jpg 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-300x200.jpg 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-768x512.jpg 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-1536x1024.jpg 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-2048x1365.jpg 2048w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-65x43.jpg 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-225x150.jpg 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.1-350x233.jpg 350w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-1087\" class=\"wp-caption-text\">Image 4.1 Warning labels are designed to increase smokers\u2019 knowledge of health risks.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n<p>In addition to the time dimension, a research design also includes the \u201cwhy\u201d and the \u201chow\u201d plan for a study. \u201cWhy\u201d relates to the purpose of a study as discussed in chapter 1. A study on internet dating could be focused on exploring ways in which people represent themselves in posted profiles; describing who uses internet dating sites; explaining the importance of including certain types of information, such as appearance or personality factors, for attracting dates; or evaluating the effectiveness of a given site in matching suitable partners. The nature of the study really depends on the interests of the researcher and the nature of the issue itself.<\/p>\n<p>\u201cHow\u201d refers to the specific method or methods used to gather the data examined in a study. With an interest in the merit of including certain items in a dating profile, a researcher might opt for an experimental design to explain, for example, whether certain characteristics in posted dating profiles are better than others for eliciting potential dates (see chapter 6 for a discussion of different types of experiments). An experiment is a special type of quantitative research design in which a researcher changes or alters something to see what effect it has. In the dating example, an experimenter might compare the number of responses to two profiles posted at an internet dating site. If the profiles are identical except for one trait\u2014for example, one profile might contain an additional statement saying the person has a great sense of humour\u2014it would be possible to determine the importance of that trait in attracting potential partners.<\/p>\n<p>Recall from chapter 1 that the specific methods or techniques differ depending on whether a researcher adopts a qualitative or quantitative orientation, and the approach itself links back to the research interest. A researcher interested in learning more about why people make dates online or how people define themselves online is more apt to use a qualitative approach. Qualitative researchers tend to engage in research that seeks to better understand or explain some phenomenon using field research and in-depth interviews, as well as strategies involving discourse analyses and content analyses that can be used to help to uncover meaning. Regardless of the approach taken and specific methods used, all researchers must work through various design considerations and measurement issues in their quest to carry out scientific research. Similarly, both qualitative and quantitative researchers undertake a process of conceptualization and measurement\u2014it just occurs differently and at different stages within the overall research process, as discussed in the next section.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>What does a research design inform us about?<\/li>\n<li>What is the main difference between cross-sectional and longitudinal research?<\/li>\n<li>What is the name for research on the same unit of analysis carried out at multiple points in time?<\/li>\n<li>In what ways will a quantitative research design differ from a qualitative one?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>CONCEPTUALIZATION AND OPERATIONALIZATION<\/h2>\n<p>Researchers in the social sciences frequently study social issues, social conditions, and social problems that affect individuals, such as environmental disasters, legal policy, crime, healthcare, poverty, divorce, marriage, or growing social inequality between the rich and poor, within a conceptual framework. Their quest is to explore, describe, explain, or evaluate the experiences of individuals or groups. A conceptual framework is the more concentrated area of interest used to study the social issue or problem, which includes the main objects, events, or ideas, along with the relevant theories that spell out relationships. Terms like <em>family<\/em> or <em>crime<\/em> are broad <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_228\">concepts<\/a> <\/strong>that refer to abstract mental representations used to signify important elements of our social world. Concepts are very important because they form the basis of social theory and research in the social sciences. However, as often-abstract representations, concepts like family can be vague since they mean different things to different people. Consider what the concept of family means to you. Does your notion of a family include aunts and uncles or second cousins? What about close friends? How about pets? People have very divergent views about what a family is or is not. The concept of family also has very different meanings depending on the context in which it is applied. For example, there are rules about who is or is not considered eligible to be sponsored under family status for potential immigration to Canada, or who may or may not be deemed family for visitation rights involving prisoners under the supervision of the Correctional Service of Canada. Because concepts are broad notions that can take on various meanings, researchers need to carefully define the concepts (or constructs) that underlie their research interests as part of the conceptualization and operationalization process.<\/p>\n<figure id=\"attachment_2458\" aria-describedby=\"caption-attachment-2458\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2458 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-1024x752.jpg\" alt=\"Couple with a dog.\" width=\"1024\" height=\"752\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-1024x752.jpg 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-300x220.jpg 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-768x564.jpg 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-1536x1128.jpg 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-65x48.jpg 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-225x165.jpg 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733-350x257.jpg 350w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/4.2-replacement-_families-pexels-mikhail-nilov-6530733.jpg 2014w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-2458\" class=\"wp-caption-text\">Image 4.2 The concept of a family differs depending on the context in which it is applied.<\/figcaption><\/figure>\n<p><strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_230\">Conceptualization<\/a><\/strong>, in research, is the process where a researcher explains what a concept or construct means in terms of a research project. For example, a researcher studying the impact of education cutbacks on the family, with an interest in the negative implications for children, might adopt the broad conceptualization of family provided by Statistics Canada (2021):<\/p>\n<blockquote><p>Census family is defined as a married couple and the children, if any, of either and\/or both spouses; a couple living common law and the children, if any, of either and\/or both partners; or a parent of any marital status in a one-parent family with at least one child living in the same dwelling and that child or those children. All members of a particular census family live in the same dwelling. Children may be biological or adopted children regardless of their age or marital status as long as they live in the dwelling and do not have their own married spouse, common-law partner or child living in the dwelling. Grandchildren living with their grandparent(s) but with no parents present also constitute a census family. (para. 1)<\/p><\/blockquote>\n<p>Conceptualization is essential since it helps us understand what is or is not being examined in a study. For example, based on the conceptualization provided above, children being raised by their grandparents or living in homes headed by single parents would be included in the study, but they might have otherwise been missed if a more traditional definition of family were employed.<\/p>\n<p>The term <em>concept<\/em> is often used interchangeably with a similar term, <em>construct<\/em>. Both refer to mental abstractions; however, concepts derive from tangible or factual observations (e.g., a family exists), while <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_239\">construct<\/a>s<\/strong> are more hypothetical or \u201cconstructed\u201d ideas that are part of our thinking but do not exist in readily observable forms (e.g., love, honesty, intelligence, motivation). While we can readily measure and observe crime through different acts that are deemed criminal (the concept crime), we infer intelligence through measures such as test scores (the construct intelligence). Both concepts and constructs are the basis of theories and are integral components underlying social research. Like concepts, constructs undergo a process of conceptualization when they are used in research. Suppose a researcher is interested in studying social inequality, which we commonly understand to mean differences among groups of people in terms of their access to resources such as education or healthcare. We know some people can afford private schools, while others cannot. We understand that healthcare benefits differ depending on factors such as how old a person is, how much one pays for a health plan, what type of job one has, and where one lives. Social inequality is a construct that conjures up a wide range of examples and notions. To examine social inequality in a specific study, a researcher might opt to use personal finances as an <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_382\">indicator<\/a><\/strong> of social inequality. An indicator is \u201ca measurable quantity which \u2018stands in\u2019 or substitutes, in some sense, for something less readily measurable\u201d (Sapsford, 2006, para. 1). Personal finances can be further specified as employment income, since most of the working population is able to state how much they earn in dollars.<\/p>\n<p>Note that in the case of employment income, people earn a certain amount of money (i.e., gross pay), but they receive a different amount as their take-home pay after taxes and various deductions come off (i.e., net pay). The process we use to examine with precision the progression of taking an abstract construct, such as social inequality, and conceptualizing it into something tangible, such as net income, is known as operationalization. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_450\">Operationalization<\/a><\/strong> is the process whereby a concept or construct is defined so precisely that it can be measured. In this example, financial wealth was operationalized as net yearly employment income in dollars. Note that once a construct such as social inequality has been clarified as financial wealth and then measured in net yearly income dollars, we are now working with a variable, since net yearly income is something that can change and differ between people. Quantitative researchers examine variables. A researcher interested in implications of social inequality might test the hypothesis that among individuals who work full time, those with low net yearly incomes will report poorer health compared to people with high net yearly incomes. In this example, health (the second variable) might be operationalized into the self-reported ratings of very poor, poor, fair, good, or very good.<\/p>\n<p>In contrast, qualitative researchers tend to define concepts based on the users\u2019 own frameworks. Qualitative researchers are less concerned with proving that certain variables affect the individual. Instead, they are more concerned with <em>how<\/em> individuals make sense of their own social situations and <em>what<\/em> the broader social factors are for such framing.<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Research in Action<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p><strong>Online Dating and Variables<\/strong><\/p>\n<p>Students routinely have difficulty understanding what variables are, let alone how to explore research questions that contain them. A starting point is to examine what is contained in a typical \u201cprofile\u201d posted on any internet dating site, such as EliteSingles or eharmony. Registered members list certain attributes they feel best describe themselves and that may also be helpful in attracting a compatible dating partner, such as their age, physical features, and certain personality traits. For example, someone might advertise as a single male, 29 years of age, 5&#8217;11&#8221; tall, in great shape, with a good sense of humour, seeking men between the ages of 25 and 35 for friendship. These attributes are variables; many that are routinely used in social research! Variables are often defined using categories. For example, the word <em>single<\/em> refers to a category of the variable \u201cmarital status.\u201d Marital status is a variable since it is a property that can differ between individuals and change over time, as in single, common-law, married, separated, widowed, or divorced. Other variables listed on dating sites include age (in years), gender and preference (e.g., man interested in men), height (in feet and inches), body type (slim, fit, average), eye colour (e.g., green, blue, or brown), and astrological sign (e.g., Virgo, Libra, Scorpio). Even the purpose or intent of the posting constitutes a variable, since a person may state whether they are seeking fun, friendship, dating, or a long-term relationship.<\/p>\n<\/div>\n<\/div>\n<figure id=\"attachment_1085\" aria-describedby=\"caption-attachment-1085\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1085 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-1024x683.jpg\" alt=\"Couple making heart hand sign.\" width=\"1024\" height=\"683\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-1024x683.jpg 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-300x200.jpg 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-768x512.jpg 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-1536x1024.jpg 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-65x43.jpg 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-225x150.jpg 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3-350x233.jpg 350w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Photo-4.3.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-1085\" class=\"wp-caption-text\">Image 4.3 Personal profiles on dating sites are largely made up of variables.<\/figcaption><\/figure>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>What is a concept? Provide an example of one that is of interest to social researchers.<\/li>\n<li>Why is conceptualization important to researchers?<\/li>\n<li>What is an indicator? Provide an example of one that could be used in a study on aggression.<\/li>\n<li>What are three variables you could examine in a study of online dating relationships?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>MEASURING VARIABLES<\/h2>\n<p>Now that you better appreciate what is meant by a variable, you can also start to see how variables are operationalized in different ways. Some variables are numerical, such as age or income, while others pertain more to categories and descriptors, such as marital status or perceived health status. Decisions made about how to clarify a construct such as health have important implications for other stages of the research process, including what kind of analyses are possible and what kind of interpretations can be made. For example, the categories for the self-reported health variable described above can tell us whether someone perceives their health to be better or worse than someone else\u2019s (e.g., \u201cvery good\u201d is better than \u201cgood,\u201d while \u201cfair\u201d is worse than \u201cgood\u201d but better than \u201cpoor\u201d). However, from how this variable is measured, we are unable to ascertain how much worse or how much better someone\u2019s health is relative to another.<\/p>\n<h3>Levels of Measurement<\/h3>\n<p>Variables mean different things and can be used in different ways, depending upon on how they are measured. At the lowest level, called the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_436\">nominal level<\/a><\/strong>, we can classify or label cases, such as persons, according to marital status, eye colour, religion, or the presence of children. These are all qualitative variables. Even if we assign numbers to the categories for marital status, where 1 = single, 2 = common-law, 3 = married, 4 = separated, 5 = widowed, and 6 = divorced, we have not quantified the variable. This is because the numbers serve only to identify the categories so that a <em>6 <\/em>now represents anyone who is currently \u201cdivorced.\u201d The numbers themselves are arbitrary; however, they serve the function of classification, which simply indicates that members of one category are different from another category.<\/p>\n<p>At the next level of measurement, called the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_454\">ordinal level<\/a><\/strong>, we can classify and order categories of the variables of interest, such as people\u2019s perceived health into levels, job satisfaction into ratings, or prestige into rankings. Note that these variables are measured as more or less, or higher or lower amounts, of some dimension of interest. The variable health, then, measured as very good, good, fair, poor, and very poor, is an ordered variable since we know that very good is higher than good and therefore indicates better health. However, as noted earlier, we cannot determine precisely what that means in terms of how much healthier someone is who reports very good health. Ordinal variables are also qualitative in nature.<\/p>\n<p>At the next highest level of measurement, called the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_400\">interval level<\/a><\/strong>, we can classify, order, and examine differences between the categories of the variables of interest This is possible because the assigned scores include equal intervals between categories. For example, with temperature as a main variable, we know that 28\u00b0C is exactly one degree higher than 27\u00b0C, which is 7 degrees higher than 20\u00b0C, and so on.<\/p>\n<p>Statisticians sometimes make a further distinction between an interval and <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_516\">ratio level<\/a><\/strong> of measurement. Both levels include meaningful distance between categories, as well as the properties from the lower levels. However, a true zero only exists at the ratio level of measurement constituting the additional property. In the case of temperature, 0\u00b0C cannot be taken to mean the absence of temperature (an absolute or true zero). At the ratio level, however, there is a true zero in the case of time, where a stopwatch can count down from two minutes to zero, and zero indicates no time left. Most variables that include the property of score assignment have a true zero. One way to determine if a variable is measured at the ratio level is to consider if its categories can adhere to the logic of \u201ctwice as.\u201d For example, an assessment variable where a person can achieve twice the score of someone else or net employment income where one employee can earn twice that of another are both measured at the ratio level. Interval- and ratio-level variables are quantitative, and they are amenable to statistical analyses such as tests for associations between variables. The properties of each level of measurement are summarized in figure 4.2.<a id=\"retfig4.2\"><\/a><\/p>\n<figure id=\"attachment_2279\" aria-describedby=\"caption-attachment-2279\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2279 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-1024x716.png\" alt=\"Figure 4.2. Properties and Functions of Levels of Measurement. Image description available.\" width=\"1024\" height=\"716\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-1024x716.png 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-300x210.png 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-768x537.png 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-1536x1073.png 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-2048x1431.png 2048w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-65x45.png 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-225x157.png 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.2-1-350x245.png 350w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-2279\" class=\"wp-caption-text\">Figure 4.2. Properties and Functions of Levels of Measurement [Image description &#8211; <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.2\">See Appendix A Figure 4.2<\/a>]<\/figcaption><\/figure>\n<h4>Activity: Sorting Levels of Measurement<\/h4>\n<div id=\"h5p-23\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-23\" class=\"h5p-iframe\" data-content-id=\"23\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"Sorting Levels of Measurement\"><\/iframe><\/div>\n<\/div>\n<h3>Indexes versus Scales<\/h3>\n<p>Instead of using the response to a single statement as a measure of some construct, indexes and scales combine several responses together to create a composite measure of a construct. An <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_380\">index<\/a><\/strong> is a composite measure of a construct comprising several different indicators that produce a shared outcome (DeVellis &amp; Thorpe, 2021). For example, in a study on gambling, Wood and Williams (2007) examined the extent to which internet gamblers manifest a \u201cpropensity for problem gambling\u201d (i.e., the outcome). The propensity for problem gambling was assessed using nine items composing the Canadian Problem Gambling Index (CPGI). The CPGI consists of a series of nine questions, which follow prompts to consider only the preceding 12 months, including \u201cThinking about the last 12 months \u2026 have you bet more than you could really afford to lose?\u201d; \u201cStill thinking about the last 12 months, have you needed to gamble with larger amounts of money to get the same feeling of excitement?\u201d; and \u201cWhen you gambled, did you go back another day to try and win back the money you lost?\u201d The response categories for all nine items are sometimes (awarded a score of 1), most of the time (awarded a score of 2), almost always (scored as 3), or don\u2019t know (scored as zero). The scores are added up for the nine items, generating an overall score for propensity for problem gambling that ranges between 0 and 27, where 0 = non-problem gamblers, 1\u20132 = a low-risk for problem gambling, 3\u20137 = a moderate risk, and 8 or more = problem gamblers (Ferris &amp; Wynne, 2001). Although the index has several items, they all independently measure the same thing: the propensity to become a problem gambler. And while there are expected relationships between the items\u2014for instance, spending more than one can afford to lose is assumed to be associated with needing to gamble with larger amounts of money to get the same feeling of excitement\u2014the indicators are not derived from a single cause (DeVellis &amp; Thorpe, 2021). That is, a person may gamble more than they can afford to lose due to a belief in luck, while the same person might need to gamble larger amounts due to a thrill-seeking tendency. Regardless of their origins, the indicators result in a common outcome: the tendency to become a problem gambler. The higher the overall score on an index, the more of that trait or propensity the respondent has.<\/p>\n<p>In contrast, a <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_554\">scale<\/a> <\/strong>is a composite measure of a construct consisting of several different indicators that stem from a common cause (DeVellis &amp; Thorpe 2021). For example, the Eysenck Personality Questionnaire\u2014Revised (EPQR) is a 48-item questionnaire designed to measure an individual\u2019s personality (the construct) via extroversion and neuroticism (Eysenck et al., 1985). Extroversion and neuroticism are two underlying potential causes for certain behavioural tendencies. Extroversion is the propensity to direct one\u2019s attention outward toward the environment. Thus, extroverts tend to be people who are outgoing or talkative. Sample yes\/no forced-choice items on the EPQR measuring extroversion include the statements \u201cAre you a talkative person?\u201d; \u201cDo you usually take the initiative in making new friends?\u201d; and \u201cDo other people think of you as being very lively?\u201d Neuroticism refers to emotional stability. For example, a neurotic person is someone who worries excessively or who might be described as moody. Sample questions include \u201cDoes your mood often go up and down?\u201d; \u201cAre you an irritable person?\u201d; and \u201cAre you a worrier?\u201d<\/p>\n<p>Note that there are some similarities between indexes and scales and that these terms are often used interchangeably (albeit incorrectly) in research! Both indexes and scales measure constructs\u2014for example, dimensions of personality, risk for problem gambling\u2014using nominal variables with categories such as yes\/no and presence\/absence or ordinal variables depicting intensity such as very dissatisfied or dissatisfied. In addition, both are composite measures, meaning they are made up of multiple items. However, there are also some important differences (see figure 4.3).<\/p>\n<p>While an index is always an accumulation of individual scores based on items that have no expected common cause, a scale is based on the assignment of scores to items that are believed to derive from a common cause. In addition, scales often comprise items that have logical relationships between them. Namely, someone who indicates on EPQR that they \u201calways take the initiative in making new friends\u201d and \u201calways get a party going\u201d is also very likely to \u201cenjoy meeting new people,\u201d but is very unlikely to be \u201cmostly quiet when with other people.\u201d In addition, specific items in a scale can indicate varying intensity or magnitude of a construct in a manner that is accounted for by the scoring. For example, the Bogardus social distance scale measures respondents\u2019 willingness to participate with members of other racial and ethnic groups (Bogardus, 1933). The items in the scale have different intensity, meaning certain items show more unwillingness to participate with members of other groups than others. For example, an affirmative response to the item \u201cI would be willing to marry outside group members\u201d indicates very low social distance (akin to low prejudice) and scores 1 point, whereas an affirmative response to \u201cI would have (outside group members) merely as speaking acquaintances\u201d scores 5, indicating more prejudice. A scale takes advantage of differences in intensity or the magnitude between indictors of a construct and weights them accordingly when it comes to scoring. In contrast, an index assumes that all items are different but of equal importance.<\/p>\n<p>Sometimes it can be difficult to determine if an instrument is better classified as a scale or an index. For example, the Eating Attitudes Test (EAT-26) was developed by Garner and Garkfinkel (1979) as a self-report measure designed to help identify those at risk for an eating disorder such as anorexia nervosa. Taken together, it can be considered an index since it is based on items that do not have a single underlying cause since eating disorders can result from many different individual causes. Importantly, as required by an index, the indicators are used to derive a composite score for a common outcome (risk of anorexia) by summing up the scores obtained for all 26 independent items. However, the EAT-26 also contains three sub-scales, where certain items can be used to examine dimensions of anorexia that are believed to be the result of dieting, bulimia and food preoccupation, and oral control (common causes).<a id=\"retfig4.3\"><\/a><\/p>\n<figure id=\"attachment_1102\" aria-describedby=\"caption-attachment-1102\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1102 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-1024x425.png\" alt=\"Figure 4.3. Comparing Index and Scale. Image description available.\" width=\"1024\" height=\"425\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-1024x425.png 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-300x124.png 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-768x319.png 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-1536x637.png 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-65x27.png 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-225x93.png 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151-350x145.png 350w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.3-e1720022273151.png 1989w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-1102\" class=\"wp-caption-text\">Figure 4.3. Comparing Index and Scale [Image description &#8211; <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.3\">See Appendix C Figure 4.3<\/a>]<\/figcaption><\/figure>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>What property distinguishes the ordinal level of measurement from the nominal?<\/li>\n<li>What are the main properties and functions of the interval level of measurement?<\/li>\n<li>What special property does the ratio level have that distinguishes it from the interval level of measurement?<\/li>\n<li>In what ways are indexes and scales similar and different?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>CRITERIA FOR ASSESSING MEASUREMENT<\/h2>\n<p>Measurement often involves obtaining answers to questions posed by researchers. In some cases, the answers to questions might be very straightforward, as would be your response to the question \u201cHow old are you?\u201d But what if you were instead asked, \u201cWhat is your ethnicity?\u201d Would your answer be singular or plural? Would your answer reflect the country you were born in and\/or the one you currently reside in? Did you consider the origin of your biological father, mother, or both parents\u2019 ancestors (e.g., grandparents or great-grandparents)? Did you think about languages you speak other than English or any cultural practices or ceremonies you engage in? Ethnicity is a difficult concept to measure because it has different dimensions; it reflects ancestry in terms of family origin as well as identity in the case of more current personal practices. According to Statistics Canada (2017), if the intent of the study is to examine identity, then a question such as \u201cWith which ethnic group do you identify?\u201d is probably the best choice, since it will steer respondents to that dimension by having them focus on how they perceive themselves. To assess whether measures are \u201cgood\u201d ones, you can evaluate their reliability and validity.<\/p>\n<h3>RELIABILITY<\/h3>\n<p>As a quantitative term,<strong> <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_522\">reliability<\/a> <\/strong>refers to the consistency in measurement. A measurement procedure is reliable if it can provide the same data at different points in time, assuming there has been no change in the variable under consideration. For example, a weigh scale is generally considered to be a reliable measure for weight. That is, if a person steps on a scale and records a weight, the person could step off and back on the scale and it should indicate the same weight a second time. Similarly, a watch or a clock\u2014barring the occasional power outage or worn-out battery\u2014is a dependable measure for keeping track of time on a 24-hour cycle (e.g., your alarm wakes you up for work at precisely 6:38 a.m. on weekdays). Finally, a specialized test can provide a reliable measure of a child\u2019s intelligence in the form of an intelligence quotient (IQ). IQ is a numerical score determined using an instrument such as the Wechsler Intelligence Scale for Children\u2014Fifth Edition (WISC-V), released by Pearson in 2014. The test consists of questions asked of a child by a trained psychologist who records the answers and then calculates scores to determine an overall IQ (Weschler, 2003). IQ is considered a reliable indicator of intelligence because it is stable over time. The average IQ for the general population is 100. A child who obtained an IQ score of 147 on the WISC-V at age eight would be classified as highly gifted. If that same person took an IQ test several years later, the results should also place the person in the highly gifted range. While a child could have a \u201cbad test\u201d day if they felt ill, was distracted, and so on, it is not reasonable to assume that the child guessed his way to a score of 147! Four ways to determine if a measure is reliable or unreliable are discussed below, including test-retest, split-half, inter-rater, and inter-item reliability.<\/p>\n<h4>Test-Retest Reliability<\/h4>\n<p>Demonstrating that a measure of some phenomenon, such as intelligence, does not change when the same instrument is administered at two different points in time is known as <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_610\">test-retest reliability<\/a><\/strong>. Test-retest reliability is usually assessed using the Pearson product-moment correlation coefficient (represented by the symbol <em>r<\/em>). The correlation ranges between 0 and +1.00 or \u20131.00, representing the degree of association between two variables. The closer the value of <em>r<\/em> is to +1.00, the greater the degree or strength of the association between the variables. For example, an <em>r <\/em>of +.80 is higher than one that is +.64, 0 indicates no relationship between the two variables, and 1.00 indicates a perfect relationship. The positive or negative sign indicates the <em>direction<\/em> of a relationship. A plus sign indicates a positive relationship, where both variables go in the same direction. For example, an <em>r <\/em>of +.60 for the relationship between education and income tells us that as education increases, so does income. In the case of negative correlations, the variables go in opposite directions, such as an <em>r <\/em>= \u201354 for education and prejudice. With increased education, we can expect decreased prejudice.<\/p>\n<p>To evaluate test-retest reliability, the correlation coefficient is denoting the relationship between the same variable measured at time 1 and time 2. The correlation coefficient (also called a reliability coefficient) should have a value of<em> .<\/em>80 or greater to indicate good reliability (Cozby et al., 2020). Test-retest reliability is especially important for demonstrating the accuracy of new measurement instruments. Currently, the identification of gifted children is largely restricted to outcomes determined by standardized IQ tests administered by psychologists (Pfeiffer et al., 2008). Expensive IQ tests are only funded by the school system for a small fraction of students, usually identified early on as having special needs. This means most students are never tested, and many gifted children are never identified as such. An alternative instrument, the Gifted Rating Scales (GRS), published by PsychCorp\/Harcourt Assessment, is based on teacher ratings of various abilities, such as student intellectual ability, academic ability, artistic talent, leadership ability, and motivation. Test-retest reliability coefficients for this assessment tool\u2019s various scales were high, as reported in the test manual. For example, the coefficient for the Academic Ability scale used by teachers on a sample of 160 children aged 12.00 to 13.11 years old and reapplied approximately a week later was .97 (Pfeiffer et al., 2008).<\/p>\n<h4>Split-Half Reliability<\/h4>\n<p>An obvious critique of test-retest reliability concerns the fact that since participants receive the same test twice or observers provide ratings of the same phenomenon at close intervals in time, the similarity in results could have more to do with memory for the items than the construct of interest. An alternative to the test-retest method that provides a more independent assessment of reliability is the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_586\">split-half reliability<\/a><\/strong> approach. Using this method, a researcher provides exactly half of the items at time 1 (e.g., only the odd-numbered items or a random sample of the questions on a survey) and the remaining half at time 2. In this case, the researcher compares the two halves for their degree of association.<\/p>\n<h4>Inter-Rater Reliability<\/h4>\n<p>Another way to test for the reliability of a measure is by comparing the results obtained on one instrument provided by two different observers. This is called <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_398\">inter-rater reliability<\/a><\/strong> (and interchangeably inter-judge-, inter-coder-, or inter-observer reliability). Inter-rater reliability is the overall percentage of times two raters agree after examining each pair of results. Using the IQ example above, two different teachers would provide assessments of the students on the various indicators of giftedness and then the two sets of responses would be compared. If two different teachers agree most of the time that certain children exhibit signs of giftedness, we can be more confident that the scales are identifying gifted children as opposed to showing the biases of a teacher toward their students.<\/p>\n<p>A statistical test called Cohen\u2019s kappa is usually employed to test inter-rater reliability because it takes into account the percentage of agreement as well as the number of times raters could be expected to agree just by chance alone (Cohen, 1960).<a class=\"footnote\" title=\"Cohen\u2019s kappa is generally used only with nominal variables. If the variables of interest are at the ordinal or interval\/ratio level, Krippendorff\u2019s alpha is recommended (Lombard et al., 2002).\" id=\"return-footnote-49-1\" href=\"#footnote-49-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a><strong>\u00a0<\/strong>Given the conservative nature of this test, Landis and Koch (1977) recommend considering coefficients of between .61 and .80 as substantial and .81 and over as indicative of near-to-perfect agreement. Building on the earlier example, the test manual for the GRS reported an inter-rater reliability of .79 for the academic ability of children aged 6.00 to 9.11 years old, based on the ratings of two different teachers for 152 students (Pfeiffer et al., 2008).<\/p>\n<h4>Inter-Item Reliability<\/h4>\n<p>Lastly, when researchers use instruments that contain multiple indicators of a single construct, it is also possible to assess <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_392\">inter-item reliability<\/a><\/strong>. Inter-item reliability (also called internal-consistency reliability) refers to demonstrated associations among multiple items representing a single construct, such as giftedness. First, there should be close correspondence between items evaluating a single dimension. For example, students who score well above average on an item indicating intellectual ability (e.g., verbal comprehension) should also score well above average on other items making up the intellectual ability scale (e.g., memory, abstract reasoning). The internal consistency of a dimension such as intellectual ability can be assessed using Cronbach\u2019s alpha, a coefficient ranging between 0 and 1.00, which considers how pairs of items relate to one another (co-vary), the variance in the overall measure, and how many items there are (Cronbach, 1951).<\/p>\n<p>In addition, since giftedness is a broad-ranging, multidimensional construct that is usually defined to mean more than just intellectual ability, students who score high on the dimension of intellectual ability should also score high on other dimensions of giftedness, such as academic ability (e.g., math and reading proficiency) and creativity (e.g., novel problem solving). Pfeiffer et al. (2008) reported a correlation coefficient of .95 between intellectual ability and academic ability and one of .88 between intellectual ability and creativity using the GRS. The four approaches for assessing reliability that were discussed in this section are summarized in figure 4.4.<a id=\"retfig4.4\"><\/a><\/p>\n<figure id=\"attachment_1103\" aria-describedby=\"caption-attachment-1103\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1103 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-1024x465.png\" alt=\"Figure 4.4. Distinguishing among Techniques Used to Assess Reliability. Image description available.\" width=\"1024\" height=\"465\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-1024x465.png 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-300x136.png 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-768x349.png 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-1536x697.png 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-2048x930.png 2048w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-65x30.png 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-225x102.png 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.4-350x159.png 350w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-1103\" class=\"wp-caption-text\">Figure 4.4. Distinguishing Among Techniques Used to Assess Reliability [Image description &#8211; <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.4\">See Appendix C Figure 4.4<\/a>]<\/figcaption><\/figure>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Research on the Net<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p><strong>Inter-Rater Reliability<\/strong><\/p>\n<p>For more information on inter-rater reliability and what Cohen&#8217;s Kappa is and how it is calculated, check out this video by DATAtab: <a href=\"https:\/\/youtu.be\/z4CiQPV0Mgw?si=PSL5ui_fFw1Mid5W\">Cohen&#8217;s Kappa (Inter-Rater-Reliability)<\/a><\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>What is reliability? Provide an example of a reliable measure used in everyday life.<\/li>\n<li>What is the main difference between test-retest reliability and split-half reliability?<\/li>\n<li>What type of reliability renders the same findings provided by two different observers?<\/li>\n<li>What type of reliability refers to demonstrated associations among multiple items representing a single construct?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h3>VALIDITY<\/h3>\n<p>Perhaps even more important than ensuring consistency in measurement, we need to be certain that we are measuring the intended construct of interest. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_638\">Validity<\/a><\/strong> is a term used by quantitative researchers to refer to the extent to which a study examines what it intends to. Not all reliable measures are valid. We might reliably weigh ourselves with a scale that consistency tells us the wrong weight because the dial was set two kilograms too high. Similarly, we may depend upon an alarm clock that is consistently ahead of schedule by a few minutes because it was incorrectly programmed. In this section, you will learn about four methods for evaluating the extent to which a given measure is measuring what it is intended to using face validity, content validity, construct validity, and criterion validity.<\/p>\n<h4>Face Validity<\/h4>\n<p>First, in trying to determine if a measure is a good indicator of an intended construct, we can assess the measure\u2019s face validity. <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_343\"><strong>Face validity<\/strong><\/a> refers to the extent to which an instrument or variable appears on the surface or \u201cface\u201d to be a good measure of the intended construct. Grade point average, for example, appears to be a pretty good measure of a student\u2019s scholastic ability, just as net yearly income seems like a valid measure of financial wealth. Your criteria for determining whether something has face validity is an assessment of whether the operationalization used is logical. For example, in the case of giftedness, most teachers would agree that children who exhibit very superior intellectual ability (i.e., the ability to reason at high levels) also tend to exhibit very superior academic ability (e.g., the ability to function at higher than normal levels in specific academic areas, such as math or reading).<\/p>\n<h4>Content Validity<\/h4>\n<p><strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_245\">Content validity<\/a><\/strong> refers to the extent to which a measure includes the full range or meaning of the intended construct. To adequately assess your knowledge of the general field of psychology, for example, a test should include a broad range of topics, such as how psychologists conduct research, the brain and mental states, sensation, perception, and learning. While a person might not score evenly across all areas of psychology (e.g., a student might score 20 out of 20 on the questions related to sensation and perception and only 15 out of 20 on items about research methods), the test result (35\/40) should provide a general measure of knowledge regarding introductory psychology. Similarly, the Gifted Rating Scale discussed earlier is an instrument designed to identify giftedness that includes not only items related to intellectual ability but also content pertaining to the dimensions of academic ability, creativity, leadership, and motivation. This is not to say that a person scoring in the gifted range will achieve the same ratings on all items. For example, it is possible for a gifted child to score in the very superior range for intellectual and academic ability as well as creativity but score only superior for motivation and average for leadership. However, when taken together, the overall (i.e., full-scale) IQ score is 130 or greater for a gifted individual.<\/p>\n<h4>Construct Validity<\/h4>\n<p>Another way to assess validity is through <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_241\">construct validity<\/a><\/strong>, which examines how closely a measure is associated with other measures that are expected to be related, based on prior theory. For example, Gottfredson and Hirschi\u2019s (1990) general theory of crime rests on the assumption that a failure to develop self-control is at the root of most impulsive and even criminal behaviours. Impulsivity, as measured by school records, such as report cards, should then correspond with other impulsive behaviours, such as deviant and\/or criminal acts. If a study fails to show the expected association (e.g., perhaps children who fail to complete assignments or follow rules in school as noted on reports cards do not engage in higher levels of criminal or deviant acts relative to children who appear to have more self-control in the classroom), then the measures of missed assignments and an inability to follow rules may not be valid indicators of the construct. That is, the items stated on a report card, for example, incomplete assignments, may be measuring something other than impulsivity, such as academic aptitude, health issues, or attention-deficit problems. In this case, a better school indicator of impulsivity might be self-reported ratings of disruptive behaviour by the students themselves or teachers\u2019 ratings of student impulsivity rather than the behavioural measures listed on a report card. Alternatively, behavioural measures from other areas of a person\u2019s life, such as a history of unstable relationships or a lack of perseverance in employment, may be better arenas for assessing low self-control than the highly monitored and structured early school environment.<\/p>\n<h4>Criterion Validity<\/h4>\n<p>Finally, a measure of some construct of interest can be assessed against an external standard or benchmark to determine its worth using what is called <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_265\">criterion validity<\/a><\/strong>. We can readily anticipate that students who are excelling are also more likely to achieve academic awards, such as scholarships, honours, or distinction, and go on to higher levels. Academic ability as measured by grades or grade point averages is predictive of future school and scholastic success. Similarly, consider how most research-methods courses at a university or college have a prerequisite, such as a minimum grade of C\u2013 in a 200-level course. The prerequisite indicates basic achievement in the course. It is the cut-off for predicting future success in higher-level courses in the same discipline. The prerequisite has criterion validity if most students with the prerequisite end up successful navigating their way through research methods. All four types of validity are summarized in figure 4.5.<a id=\"retfig4.5\"><\/a><\/p>\n<figure id=\"attachment_1104\" aria-describedby=\"caption-attachment-1104\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1104 size-large\" src=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-1024x534.png\" alt=\"Figure 4.5. Distinguishing among Techniques Used to Assess Validity. Image description available.\" width=\"1024\" height=\"534\" srcset=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-1024x534.png 1024w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-300x157.png 300w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-768x401.png 768w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-1536x802.png 1536w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-65x34.png 65w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-225x117.png 225w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5-350x183.png 350w, https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/Figure4.5.png 1581w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-1104\" class=\"wp-caption-text\">Figure 4.5. Distinguishing among Techniques Used to Assess Validity [Image description &#8211; <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/back-matter\/appendix-c-figure-descriptions\/#fig4.5\">See Appendix C Figure 4.5<\/a>]<\/figcaption><\/figure>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>Are all reliable measures valid? Explain your answer.<\/li>\n<li>What does it mean to say a measure has face validity?<\/li>\n<li>What does content validity assess?<\/li>\n<li>Which type of validity is based on the prediction of future events?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h3>Activity: Reliability and Validity<\/h3>\n<div id=\"h5p-24\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-24\" class=\"h5p-iframe\" data-content-id=\"24\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"Reliability and Validity\"><\/iframe><\/div>\n<\/div>\n<h2>RANDOM AND SYSTEMATIC ERRORS<\/h2>\n<p>Researchers and research participants are potential sources of measurement error. Think about the last time you took a multiple-choice test and accidently entered a response of <em>d <\/em>when you intended to put <em>e<\/em>, or when you rushed to finish an exam and missed one of the items in your answer because you didn\u2019t have time to re-read the instructions or your answers before handing in the test. Similarly, errors occur in research when participants forget things, accidently miss responses, and otherwise make mistakes completing research tasks. Also, researchers produce inconsistencies in any number of ways, including by giving varied instructions to participants, by missing something relevant during an observation, and by entering data incorrectly into a spreadsheet (where a 1 might become an 11). Errors that result in unpredictable mistakes due to carelessness are called <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_512\">random errors<\/a><\/strong>. Random errors made by participants can be reduced by simplifying the procedures (e.g., participants make fewer mistakes if instructions are clear and easy to follow and if the task is short and simple). Even researchers\u2019 and observers\u2019 unintentional mistakes can be reduced by using standardized procedures, simplifying the task as much as possible, training observers, and using recording devices or apparatus other than people to collect first-hand data (e.g., replaying an audio recording for verification following an interview). Random errors mostly influence reliability since they work against consistency in measurement.<\/p>\n<p>In contrast to random errors, <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_602\">systematic errors<\/a><\/strong> refer to ongoing inaccuracies in measurement that come about through deliberate effort. For example, a researcher who expects or desires a finding might behave in a manner that encourages such a response in participants. Expecting a treatment group to perform better than a control group, a researcher might interpret responses more favourably in the treatment group and unjustifiably rate them higher. The use of standardized procedures, such as scripts and objective measures that are less open to interpretation, can help reduce researcher bias. In addition, it might be possible to divide participants into two groups without the researcher being aware of the groups until after the performance scores are recorded.<\/p>\n<p>Study participants make other types of intentional errors, including ones resulting from a social desirability bias. Respondents sometimes provide untruthful answers to present themselves more favourably. Just as people sometimes underestimate the number of cigarettes they smoke when asked by a family physician at an annual physical examination, survey respondents exaggerate the extent to which they engage in socially desirable practices (e.g., exercising, healthy eating) and minimize their unhealthy practices (e.g., overuse of non-prescription pain medicine, binge drinking). Researchers using a questionnaire to measure a construct sometimes build in a lie scale along with the other dimensions of interest. For example, in the Eysenck Personality Questionnaire\u2014Revised (EPQR), there are 12 lie-detection items, including the statements \u201cIf you say you will do something, do you always keep your promise no matter how inconvenient it might be?\u201d; \u201cAre all your habits good and desirable ones?\u201d; and \u201cHave you ever said anything bad or nasty about anyone?\u201d A score of 5 or more indicates social desirability bias (Eysenck et al., 1985).<\/p>\n<p>Similarly, participants in experimental research sometimes follow what Martin Orne (1962) called demand characteristics or environmental cues, meaning they pick up on hints about what a study is about and then try to help along the researchers and the study by behaving in ways that support the hypothesis. Systematic errors influence validity since they reduce the odds that a measure is gauging what it is truly intended to.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>Who is a potential source of error in measurement?<\/li>\n<li>Which main form of errors can be reduced by simplifying the procedures in a study?<\/li>\n<li>What is the term for the bias that results when respondents try to answer in the manner that makes them look the most favourable?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>RIGOUR IN QUALITATIVE RESEARCH<\/h2>\n<p>While it is important for anyone learning about research to understand the centrality of reliability and validity criteria for assessing measurement instruments, it is also imperative to note that much of what has been discussed in this chapter pertains mainly to quantitative research that is based in the positivist paradigm. Qualitative research, largely based in the interpretative and critical paradigms, is aimed at understanding socially constructed phenomena in the context in which it occurs at a specific point in time. It is therefore less concerned with the systematic reproducibility of data. In many cases, statements provided by research participants or processes studied cannot be replicated to assess reliability. Similarly, if we are to understand events from the point of view of those experiencing them, validity is really in the eyes of the individual actor for whom that understanding is real. That is not to say reliability and validity are not relevant in qualitative research; in fact, if we conclude that these constructs are not applicable to qualitative research, then we run the risk of suggesting that qualitative inquiry is without <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_542\">rigour<\/a><\/strong>. As defined by Gerard A. Tobin and Cecily M. Begley (2004), \u201crigour is the means by which we show integrity and competence; it is about ethics and politics, regardless of the paradigm\u201d (p. 390). This helps to legitimize a qualitative research process.<\/p>\n<p>Just as various forms of reliability and validity are used to gauge the merit of quantitative research, other criteria such as rigour, credibility, and dependability can be used to establish the trustworthiness of qualitative research. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_263\">Credibility<\/a> <\/strong>(comparable to validity) has to do with how well the research tells the story it is designed to. For example, in the case of interview data, this pertains to the goodness of fit between a respondent\u2019s actual views of reality and a researcher\u2019s representations of it (Tobin &amp; Begley, 2004). Credibility can be enhanced through the thoroughness of a literature review and open coding of data. For example, in the case of a qualitative interview, the researcher should provide evidence of how conclusions were reached. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_291\">Dependability<\/a><\/strong> is a qualitative replacement for reliability and this \u201cis achieved through a process of auditing\u201d (Tobin &amp; Begley, 2004, p. 392). Qualitative researchers ensure their research processes, decisions, and interpretation can be examined and verified by other interested researchers through <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_190\">audit trails<\/a><\/strong>. Audit trails are carefully documented paper trails of an entire research process, including research decisions such as theoretical clarifications made along the way. Transparency, detailed rationale, and justifications all help to establish the later reliability and dependability of findings (Liamputtong, 2013).<\/p>\n<p>Similarly, while questions of measurement and the operationalization of variables may not apply to qualitative research, questions concerning how the research process was undertaken are essential. For example, in a study using in-depth interviews, were the questions posed to the respondents in a culturally sensitive manner that was readily understood by them? Did the interview continue until all important issues were fully examined (i.e., saturation was reached)? Were the researchers appropriately reflective in considering their own subjectivity and how it may have influenced the questions asked, the impressions they formed of the respondents, and the conclusions they reached from the findings (Hennick et al., 2011)? Qualitative researchers acknowledge subjectivity and accept researcher bias as an unavoidable aspect of social research. Certain topics are examined specifically because they interest the researchers! To reconcile biases with empirical methods, qualitative researchers openly acknowledge their preconceptions and remain transparent and reflective about the ways in which their own views may influence research processes.<\/p>\n<h3>Achieving Rigour through Triangulation<\/h3>\n<p>One of the main ways rigour is achieved in qualitative research is by using <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_630\">triangulation<\/a><\/strong>. Triangulation is the use of multiple methods to establish what can be considered the qualitative equivalent of reliability and validity (Willis, 2007). For example, we can be more confident in data collected on aggressive behavioural displays in children if data obtained from field notes taken during observations closely corresponds with interview statements made by the children themselves. We can also be more confident in the findings when multiple sources converge (i.e., <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_280\">data triangulation<\/a><\/strong>), as might be the case if the children, teachers, and parents all say similar things about the behaviour of those being studied. Since the data comes from various sources with different perspectives, the data itself can also exist in a variety of forms, from comments made by parents and teachers, to actions undertaken by children, to school records and other documents, such as report cards.<\/p>\n<h3>Other Means for Establishing Rigour<\/h3>\n<p>Various alternative strategies to triangulation that help to establish rigour in qualitative studies include the use of member checks, prolonged time spent with research participants in a research setting, peer debriefing, and audit checking (Liamputtong, 2013; Willis, 2007). <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_420\">Member checks<\/a><\/strong> are attempts by a researcher to validate emerging findings by testing out their accuracy with the originators of that data while still in the field. For example, researchers might share observational findings with the group being studied to see if the participants concur with what is being said. It helps to validate the data if the participants agree that their perspective is being appropriately conveyed by the data. Whenever I conduct interviews with small groups (called focus groups, as discussed in chapter 9), I share the preliminary findings with the group and ask them whether the views I am expressing capture what they feel is important and relevant, given the research objectives. I also ask whether the statements I\u2019ve provided are missing any information that they feel should be included to more fully explain their views or address their concerns about the topic.<\/p>\n<p>Qualitative researchers also gain a more informed understanding of the individuals, processes, or cultures they are studying if they spend prolonged periods of time in the field. Consider how much more you know about your fellow classmates at the end of term compared to what you know about the group on the first day of classes. Similarly, over time, qualitative researchers learn more and more about the individuals and processes of interest once they gain entry to a group, establish relationships, build trust, and so on. Time also aids in triangulation as researchers are better able to verify information provided as converging sources of evidence are established.<\/p>\n<p>In addition to spending long periods of time in the field and testing findings via their originating sources, qualitative researchers also substantiate their research by opening it up to the scrutiny of others in their field.<strong> <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_49_468\">Peer debriefing<\/a> <\/strong>involves attempts to authenticate the research process and findings through an external review provided by another qualitative researcher who is not directly involved in the study (Creswell &amp; Creswell, 2018). This process helps to verify the procedures undertaken and substantiate the findings, lending overall credibility to the study. Note that reflexivity and other features underlying ethnographic research are discussed in detail in chapter 10, while multiple methods and mixed-methods approaches are the subject matter of chapter 11.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Test Yourself<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>What is the qualitative term for validity?<\/li>\n<li>How do qualitative researchers ensure their research processes and conclusions reached can be verified by other researchers?<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>CHAPTER SUMMARY<\/h2>\n<ol>\n<li><strong>Describe the main components of a research design.<br \/>\n<\/strong>A research design details the main components of a study, including who (the unit of analysis), what (the attitudes or behaviours under investigation), where (the location), when (at one or multiple points in time), why (e.g., to explain), and how (the specific research method used).<\/li>\n<li><strong style=\"text-align: initial; font-size: 1em;\">Explain what conceptualization and operationalization processes entail.<\/strong><br \/>\nConceptualization is the process whereby a researcher explains what a concept, such as family, or a construct, like social inequality, means within a research project. Operationalization is the process whereby a concept or construct is defined so precisely it can be measured in a study. For example, financial wealth can be operationalized as net yearly income in dollars.<\/li>\n<li><strong>Explain how the purpose of a variable is directly related to how it is measured in research.<br \/>\n<\/strong>Variables are measured at the nominal, ordinal, interval, and ratio level. The nominal level of measurement is used to classify case, while the ordinal level has the property of classification and rank order. The interval level provides the ability to classify, order, and make precise comparisons as a function of equal intervals. The ratio level includes previous properties and a true zero. An index is a composite measure of a construct comprising several different indicators that produce a shared outcome, while a scale is a composite measure of a construct consisting of several different indicators that stem from a common cause.<\/li>\n<li><strong>Outline the main techniques used to assess reliability and validity.<br \/>\n<\/strong>Reliability refers to consistency in measurement. <em>Test-retest<\/em> reliability examines consistency between the same measures for a variable at two different times using a correlation coefficient. <em>Inter-rater<\/em> reliability examines consistency between the same measures for a variable of interest provided by two different raters, often using Cohen\u2019s kappa. <em>Split-half<\/em> reliability examines consistency between both halves of the measures for a variable of interest. <em>Inter-item<\/em> reliability involves demonstrated associations among multiple items representing a single construct. Validity refers to the extent to which a measure is a good indicator of the intended construct. <em>Face <\/em>validity refers to the extent to which an instrument appears to be a good measure of the intended construct. <em>Content <\/em>validity assesses the extent to which an instrument contains the full range of content pertaining to the intended construct. <em>Construct<\/em> validity assesses the extent to which an instrument is associated with other logically related measures of the intended construct. <em>Criterion<\/em> validity assesses the extent to which an instrument holds up to an external standard, such as the ability to predict future events.<\/li>\n<li><strong>Distinguish between random and systematic errors.<br \/>\n<\/strong>Random errors are unintentional and usually result from careless mistakes, while systematic errors result from intentional bias. Sources of both types of errors include participants, researchers, and observers in a study. Errors can be reduced through training, the use of standardized procedures, and the simplification of tasks.<\/li>\n<li><strong>Explain how rigour is achieved in qualitative research.<br \/>\n<\/strong>Rigour refers to a means for demonstrating integrity and competence in qualitative research. Rigour can be achieved using triangulation, member checks, extended experience in an environment, peer review, and audit trails.<\/li>\n<\/ol>\n<h2>RESEARCH REFLECTION<\/h2>\n<ol>\n<li>Suppose you want to conduct a quantitative study on the success of students at the post-secondary institution you are currently attending. List five variables that you think would be relevant for inclusion in the study. Generate one hypothesis you could test using two of the variables you\u2019ve listed above. Operationalize the variables you included in your proposed hypothesis.<\/li>\n<li>Studies on the health of individuals often operationalize health as self-reported health using these five fixed response categories: poor, fair, good, very good, and excellent. What level of measurement is this? Provide an example of health operationalized into two categories measured at the nominal level and three categories at the ordinal level. Is it possible to measure health at the interval level? Justify your answer.<\/li>\n<li>Consider some of the variables that can be used to examine the construct of scholastic ability (e.g., grades, awards, and overall grade point average). Which measure do you think best represents scholastic ability? Is the measure reliable and\/or valid? Defend your answer with examples that reflect student experiences.<\/li>\n<li>Define the construct of honesty and come up with an indicator that could be used to gauge honesty. Compare your definition and indicator with those of at least three other students in the class. Are the definitions similar? Consider how each definition reflects a prior conceptualization process.<\/li>\n<\/ol>\n<h2>LEARNING THROUGH PRACTICE<\/h2>\n<p>Objective: To construct an index for students at risk for degree incompletion<\/p>\n<p>Directions:<\/p>\n<ol>\n<li>Item selection: Develop 10 statements that can be answered with a forced-choice response of <em>yes <\/em>or <em>no<\/em>, where <em>yes<\/em> responses will receive 1 point and <em>no<\/em> responses will be awarded 0 points. Select items that would serve as good indicators of students at risk for failing to complete their program of study. Think of behaviours or events that would put a student at risk for dropping out or being asked to leave a program, such as failing a required course. Make sure your items are one-dimensional (i.e., they only measure one behaviour or attitude).<\/li>\n<li>Try out your index on a few of your classmates to see what scores you obtain for them. Is there any variability in the responses? Do some students score higher or lower than others?<\/li>\n<li>Come up with a range of scores you feel represent no risk, low risk, moderate risk, and high risk. Justify your numerical scoring.<\/li>\n<\/ol>\n<h2>RESEARCH RESOURCES<\/h2>\n<ol>\n<li>For more information on the four types of validity discussed in this chapter, see Middleton, F. (2023, June 22). <a href=\"https:\/\/web.archive.org\/web\/20240712181655\/https:\/\/www.scribbr.com\/methodology\/types-of-validity\/\">The 4 types of validity in research. Definitions and examples<\/a>. <em>Scribbr.<\/em><\/li>\n<li><span style=\"background-color: #ffff00;\"><span style=\"background-color: #ffffff;\">To learn more about rigour in research, refer to <a href=\"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-content\/uploads\/sites\/30\/2023\/09\/2022-12-07_feast-centre_rigour-in-research.pdf\">&#8220;The Feast Centre Learning Series: &#8216;Rigour&#8217; in Research Proposals&#8221;<\/a> at McMaster University.<\/span><br \/>\n<\/span><\/li>\n<li>For an in-depth look at scale development, see DeVellis, R. F. and Thorpe, C. T. (2022). <em><a href=\"https:\/\/search.worldcat.org\/en\/title\/1342443149\">Scale development: Theory and applications<\/a> <\/em>(5th ed.). Sage.<\/li>\n<li>To learn about a new online gambling index based on 12 items, see Auer, M. et al. (2024). <a href=\"https:\/\/doi.org\/10.1177\/01632787231179460\">Development of the Online Problem Gaming Behavior Index.<\/a> <em>Evaluation &amp; the Health Professions, 47<\/em>(1), 81-92.<\/li>\n<\/ol>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-49-1\"> Cohen\u2019s kappa is generally used only with nominal variables. If the variables of interest are at the ordinal or interval\/ratio level, Krippendorff\u2019s alpha is recommended (Lombard et al., 2002). <a href=\"#return-footnote-49-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><\/ol><\/div><div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_49_526\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_526\"><div tabindex=\"-1\"><p>The plan or blueprint for a study, outlining the who, what, where, when, why, and how of an investigation.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_632\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_632\"><div tabindex=\"-1\"><p>The object of investigation.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_269\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_269\"><div tabindex=\"-1\"><p>Research conducted at a single point in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_412\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_412\"><div tabindex=\"-1\"><p>Research conducted at multiple points in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_458\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_458\"><div tabindex=\"-1\"><p>Research on the same unit of analysis carried out at multiple points in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_218\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_218\"><div tabindex=\"-1\"><p>Research on the same category of people carried out at multiple points in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_622\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_622\"><div tabindex=\"-1\"><p>Research on different units of analysis carried out at multiple points in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_204\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_204\"><div tabindex=\"-1\"><p>Research on a small number of individuals or an organization carried out over an extended period.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_228\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_228\"><div tabindex=\"-1\"><p>Abstract mental representations of important elements in our social world.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_230\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_230\"><div tabindex=\"-1\"><p>The process where a researcher explains what a concept means in terms of a research project.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_239\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_239\"><div tabindex=\"-1\"><p>Intangible idea that does not exist independent of our thinking.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_382\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_382\"><div tabindex=\"-1\"><p>A measurable quantity that in some sense stands for or substitutes for something less readily measurable.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_450\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_450\"><div tabindex=\"-1\"><p>The process whereby a concept or construct is defined so precisely that it can be measured.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_436\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_436\"><div tabindex=\"-1\"><p>A level of measurement used to classify cases.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_454\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_454\"><div tabindex=\"-1\"><p>A level of measurement used to order cases along some dimension of interest.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_400\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_400\"><div tabindex=\"-1\"><p>A level of measurement in which the distance between categories of the variable of interest is meaningful.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_516\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_516\"><div tabindex=\"-1\"><p>An interval level of measurement with an absolute zero.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_380\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_380\"><div tabindex=\"-1\"><p>A composite measure of a construct comprising several different indicators that produce a shared outcome.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_554\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_554\"><div tabindex=\"-1\"><p>A composite measure of a construct consisting of several different indicators that stem from a common cause.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_522\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_522\"><div tabindex=\"-1\"><p>Consistency in measurement.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_610\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_610\"><div tabindex=\"-1\"><p>Consistency between the same measures for a variable of interest taken at two different points in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_586\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_586\"><div tabindex=\"-1\"><p>Consistency between both halves of the measure for a variable of interest.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_398\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_398\"><div tabindex=\"-1\"><p>Consistency between the same measures for a variable of interest provided by two independent raters.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_392\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_392\"><div tabindex=\"-1\"><p>Demonstrated associations among multiple items representing a single concept.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_638\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_638\"><div tabindex=\"-1\"><p>The extent to which a study examines what it intends to.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_343\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_343\"><div tabindex=\"-1\"><p>Assesses the extent to which an instrument appears to be a good measure of the intended construct.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_245\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_245\"><div tabindex=\"-1\"><p>Assesses the extent to which an instrument contains the full range of content pertaining to the intended construct.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_241\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_241\"><div tabindex=\"-1\"><p>Assesses the extent to which an instrument is associated with other logically related measures of the intended construct.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_265\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_265\"><div tabindex=\"-1\"><p>Assesses the extent to which an instrument holds up to an external standard, such as the ability to predict future events.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_512\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_512\"><div tabindex=\"-1\"><p>Measurement miscalculation due to unpredictable mistakes.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_602\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_602\"><div tabindex=\"-1\"><p>Miscalculation due to consistently inaccurate measures or intentional bias.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_542\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_542\"><div tabindex=\"-1\"><p>A means for demonstrating integrity and competence in qualitative research.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_263\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_263\"><div tabindex=\"-1\"><p>An assessment of the goodness of fit between the respondent\u2019s view of reality and a researcher\u2019s representation of it.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_291\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_291\"><div tabindex=\"-1\"><p>An assessment of the researcher\u2019s process as well documented and verifiable.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_190\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_190\"><div tabindex=\"-1\"><p>Attempts made by a researcher to carefully document the research process in its entirety.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_630\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_630\"><div tabindex=\"-1\"><p>The use of multiple methods or sources to help establish rigour.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_280\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_280\"><div tabindex=\"-1\"><p>The reliance on multiple data sources in a single study.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_420\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_420\"><div tabindex=\"-1\"><p>Attempts made by a researcher to validate findings by testing them with the original sources of the data.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_49_468\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_49_468\"><div tabindex=\"-1\"><p>Attempts made by a researcher to authenticate the research process and findings through an external review provided by an independent researcher.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":4,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-49","chapter","type-chapter","status-publish","hentry"],"part":3,"_links":{"self":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapters\/49","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/wp\/v2\/users\/4"}],"version-history":[{"count":60,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapters\/49\/revisions"}],"predecessor-version":[{"id":2480,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapters\/49\/revisions\/2480"}],"part":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapters\/49\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/wp\/v2\/media?parent=49"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/pressbooks\/v2\/chapter-type?post=49"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/wp\/v2\/contributor?post=49"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/openbooks.macewan.ca\/researchmethods\/wp-json\/wp\/v2\/license?post=49"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}