Back to the Future: How Do Statisticians Make Predictions?
Naira Musallam, PhD • 7 Feb 2017
Have you ever wondered how statisticians are able to make predictions about the future?
In our previous piece, When Science Gets Involved in Politics, we discussed the importance of adhering to scientific sampling techniques as a solid first step. Now that we have our well-defined sample, the natural next question is: how do we find answers about the population?
Welcome to inferential statistics!
Estimation in statistics refers to the process by which statisticians are able to make relatively accurate inferences about a population based on information obtained from a sample.
In order to understand how we make that move, it’s important to differentiate the three different distributions:
- The population distribution of the variable of interest (be it customer satisfaction or product popularity), while empirical, is actually unknown because it is extremely difficult to survey your entire population.
- The sample consists of a set of data selected from the population of interest, ideally a representative one (More information on this is available on our last blog)
- The sampling distribution is a theoretical, probabilistic distribution of a statistic (such as the mean) for all possible samples which has a certain sample size.
It’s important to understand that sampling distribution is theoretical, meaning that the researcher never obtains it in reality, but it is critical for estimation.
Thanks to the laws of probability, a great deal is known about sampling distribution, such as its shape, central tendency, and dispersion. We know that its shape is a normal curve, but you may know this as a “Bell Curve”, which is a theoretical distribution of scores that is symmetrical and bell-shaped.
The standard normal curve always has a mean of 0 and a standard deviation of 1. Because one can assume that the shape of the sampling distribution is normal, we can calculate the probabilities of various outcomes. We can also assume things like the mean of the sampling distribution is the same value as the mean of the population.
Building on this is the Central Limit Theorem, a probability theory that says if a random sample of size N is drawn from any population with a mean and standard deviation, as N grows, the sampling distribution of the sample means will approach normality.
With a larger sample size, the mean of the sampling distribution becomes equal to the population mean, the standard error of the mean decreases in size, and the variability in the sample estimates from sample to sample decreases. So now you can start to see how researchers can have more and more confidence in their results.
But with estimation, there is always a chance of error.
The width of Confidence Intervals is a function of the risk we are willing to take of being wrong and the sample size. The larger the sample, the lower the chance of error.
In other words, it refers to the probability that a specified interval will contain the population parameter. A 95% confidence level means that there is a 0.95 probability that a specified interval does contain the population mean; accordingly, there are 5 chances out of 100 that the interval does not contain the population mean.
When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong. For example, a 5% significance level means that our conclusion will be wrong 5% of the time. It is always the case that Confidence Level + Significance Level = 1.
It is possible to make inferences about a population from a sample that is carefully selected. The sampling distribution, a theoretical one, links the known sample to a larger population through an estimation. Because of the properties of the sampling distribution, we are able to identify the probability of any statistic with a certain level of confidence.
Whether you realize it or not this is under our noses every day in the news!
Keep your eye out and next time someone talks about who is ahead in the polls at your next cocktail party, you’ll be armed with a heavy dose of skepticism.
Meet the author
Naira Musallam, PhD
Generative AI is Here to Push the Limits of Market Research
While the technology of generative AI has been around for quite some time, it wasn’t until the introduction of Lindsay • 30 Sep 2024
Consumer Spending Outlook: Summer 2024
Despite most economic indicators showing the US economy is in good shape by historical standards, consumer confidence has begun to dip once again, Savannah Trotter • 11 Jul 2024
How to Write Effective Concept Testing Survey Questions
Your concept tests are only as good as the questions you include. When done correctly, Savannah Trotter • 07 Jun 2024
Concept Validation Strategies and Methods
When you're ready to validate a new idea, Naira Musallam, PhD • 22 May 2024
Data Privacy & Generative AI in Market Research
In the digital age, data privacy and security are paramount, especially when utilizing powerful generative AI tools, like our Lindsay • 05 Apr 2024
Unleashing the Power of Survey Pages with Randomization and Looping
When designing a The SightX Research Team • 27 Mar 2024
Meet the author
Naira Musallam, PhD
Ready to meet the next generation of consumer research technology?
The Future of Consumer Research