In RW Connect’s new Quant Essentials series, we discuss critical methodological skills in simple, jargon-free language. The first article, What Is Quantitative Research? gives some more background about the series. Our second article was about research design. Our topic in this article is sampling.
Though it doesn’t excite most people, sampling is fundamental to nearly any kind of research. It’s probably easiest to explain sampling by using an illustration. Let’s imagine that you work in the human resources (HR) department of a large company. Your databases include information on all company employees, and represent the population of employees working at your company. If you wanted to see what percent of your employees is female or their average age, you can query the appropriate data bases to obtain this information. This would be analogous to looking up data from a national census.
On the other hand, say you are a data scientist working in HR and have been asked to conduct statistical modeling – predicting employee churn, for instance. This kind of modeling is often sophisticated and may demand a substantial amount of computing power. Therefore, it might be more practical for you to draw a sample of records of current and former employees and use that sample for your modeling. Since you would be using a sample and not the population, there would be some loss of precision. Most statistical procedures have been developed for small samples, however, and originally that was all most statisticians had to work with.
In marketing research, we almost never are in the position where a census of consumers is feasible. We must use samples because of budget or time constraints. Fortunately, it is seldom necessary to interview millions of consumers. Sample quality does matter though, and one sample is not just as good as another.
There are two basic kinds of samples: probability and non-probability. With probability sampling, each unit (e.g., consumer) has a known, non-zero chance of being selected. This is often called random sampling, since some form of random or (systematic) selection mechanism is employed. Fieldwork staff do not choose who participates in the research and who does not. With non-probability sampling, some elements of the population have no chance of selection, or the probability of their selection cannot be precisely determined. In the case of mall and street intercepts, fieldwork staff do choose who participates in the research and who does not. These are non-probability samples.
Simple random sampling (SRS) and systematic (“every nth”) sampling are the sampling procedures most of us would probably think of when we hear “random sample.” There are also stratified and cluster samples, among many other kinds. In cluster (or multi-stage) sampling, we take samples of smaller units within larger units. For example, we might sample geographic areas, and then housing units and, finally, individuals within these housing units. Stratification entails breaking down the target population into segments, such as age group or gender, before sampling and then taking independent samples within each stratum. Quota sampling is a non-probability method that resembles stratified sampling except that selection of units is at least partly judgmental.
There is an important but often overlooked distinction between probability sampling and a probability sample. A research agency may utilize a probability sampling procedure for a consumer survey, for instance, but because many people invited to join the survey refuse to do so the sample the research agency obtains is not a true probability sample. If the differences between this sample and those refusing to participate are small, then this self-selection won’t matter much and the respondents can be treated as a probability sample with negligible risk. If research agency had this information, however, there would be no need to conduct the survey in the first place.
Post-survey weight adjustments can be used in many situations to make the actual sample represent the target population more closely. Marketing researchers often weight survey data by age, gender, region or other variables for which national census data are available. Weighting cannot transform a non-probability sample into a probability sample, though, and we can only try to make our sample more representative of the population. Weighting can be tricky and is not a panacea.
Something else to bear in mind is that we very often use online panels in marketing research. The quality of these panels can vary enormously. Internet access is not ubiquitous in any country, and rare in some, so there will always be some skew in any panel. The panelists may be different in important ways from the population of consumers we’re studying but, judging from sales figures and other data, they often are close enough overall for most marketing research purposes. Online Panel Research: A Data Quality Perspective (Callegaro et al.) is a good source for more information about this topic, as is the Public Opinion Quarterly(AAPOR).
Marketing research textbooks introduce the fundamental concepts of sampling but I would encourage you to learn about the total survey error framework developed by survey methodologists. If you’d like to study sampling in depth, Survey Sampling (Kish), Sampling Techniques (Cochran) and Model Assisted Survey Sampling (Särndal et al.) are classic, if technical, references. Sampling: Design and Analysis (Lohr) and Practical Tools for Designing and Weighting Survey Samples (Valliant et al.) are recent and also excellent. The Research Methods Knowledge Base is a handy online reference. Hard-to-Survey Populations(Tourangeau et al.) is a fascinating read and will be of particular interest to those of you who study specialized populations.
We hope you’ve found this brief overview of sampling interesting and helpful!
Kevin Gray, Marketing Research, Statistics and Data Science Subcontracting and Consulting
Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy. He also co-hosts the audio podcast series MR Realities.