The general idea that underlies statistical inference is the comparison of particular statistics from on observational data set (i.e. the mean, the standard deviation, the differences among the means of subsets of the data), with an appropriate reference distribution in order to judge the significance of those statistics. When various assumptions are met, and specific hypotheses about the values of those statistics that should arise in practice have been specified, then statistical inference can be a powerful approach for drawing scientific conclusions that efficiently uses existing data or those collected for the specific purpose of testing those hypotheses. Even in a context when a formal experimental design is not possible, or when the objective is to explore the data, significance evaluation can be useful.
As a consequence of the central limit theorem, we know that the mean is normally distributed, and so we can use the normal distribution to describe the uncertainty of a sample mean.
Characterization of samples
Once a sample has been obtained, and descriptive statistics calculated, attention may then turn to the significance (representativeness as opposed to unusualness) of the sample or of the statistics. This information may be gained by comparing the specific value of a statistic with an appropriate reference distribution, and by the calculation of additional statistics that describe the level of uncertainty a particular statistic may have.
In the case of the sample mean, the appropriate reference distribution is the normal distribution, which is implied by the Central Limit Theorem.
Standard error of the mean and confidence interval for the mean
Uncertainty in the mean can be described by the standard error of the mean or by the confidence interval for the mean. The standard error of the mean can be thought of as the standard deviation of a set mean values from repeated samples.
Definition of the standard error of the mean
Here is a demonstration using simulated data and repeated samples of different sizes
Set the number of replications and the (maximum) sample size
Create several matrices to hold the individual replication results.
Generate means for a range of sample sizes (1:max.sample.size)
Take a look at the means and the standard errors. Note that means remain essentially constant across the range of sample sizes, while the standard errors decrease rapidly (at first) with increasing sample size.
Verify that the standard error of the mean is sigma/sqrt(n)
Generate some data values, this time from a uniform distribution
Rescale these values so that they have the same mean () and standard deviation () as in the previous example,
Repeat the demonstration
This demonstrates that the standard error of the mean is insensitive to the underlying distribution of the data.
The confidence interval provides a verbal or graphical characterization, based on the information in a sample, of the likely range of values within which the “true” or population mean lies. This example uses an artificial data set [cidat.csv]
is a data frame that can be generated as follows
Attach and summarize the data set.
The idea here is to imagine that each group of 100 observations represents one possible sample of some underlying process or information set, that might occur in practice. These hypothetical samples (which are each equally likely) provide a mechanism for illustrating the range of values of the mean that could occur simply due to natural variability of the data, and the “confidence interal” is that range of values of the mean that enclose 90% of the possible mean values.
Get the means and standard errors of each group.
Plot the individual samples (top plot) and then the means, and their standard errors (bottom plot). Note the different scales on the plots.
The bottom plot shows that out the 40 mean values (red dots), 2 (0.05 or 5 percent) have intervals (defined to be twice the standard error either side of the mean, black tick marks) that do not enclose the “true” value of the mean (10.0).
Set the graphics window back to normal and detach .
[Back to top]
Simple inferences based on the standard error of the mean
The standard error of the mean, along with the knowledge that the sample mean is normally distributed allows inferences about the mean to made For example, questions of the following kind can be answered:
- What is the probability of occurrence of an observation with a particular value?
- What is the probability of occurrence of a sample mean with a particular value?
- What is the “confidence interval” for a sample mean with a particular value?
Here’s a short discussion of simple inferential statistics:
[Back to top]
The next step toward statistical inference is the more formal development and testing of specific hypotheses (as opposed to the rather informal inspection of descriptive plots, confidence intervals, etc.)
“Hypothesis” is a word used in several contexts in data analysis or statistics:
- the research hypothesis is the general scientific issue that is being explored by a data analysis. It may take the form of quite specific statements, or just general speculations.
- the null hypothesis (Ho) is a specific statement whose truthfulness can be evaluated by a particular statistical test. An example of a null hypothesis is that the means of two groups of observations are identical.
- the alternative hypothesis (Ha) is, as its name suggests an alternative statement of what situation is true, in the event that the null hypothesis is rejected. An example of an alternative hypothesis to a null hypothesis that the means of two groups of observations are identical is that the means are not identical.
A null hypothesis is never “proven” by a statistical test. Tests may only reject, or fail to reject, a null hypothesis.
There are two general approaches toward setting up and testing specific hypotheses: the “classical approach” and the “p-value” approach.
The steps in the classical approach:
- define or state the null and alternative hypotheses.
- select a test statistic.
- select a significance level, or a specific probability level, which if exceeded, signals that the test statistic is large enough to consider significant.
- delineate the “rejection region” under the pdf of the appropriate distribution for the test statistic, (i.e. determine the specific value of the test statistic that if exceeded would be grounds to consider it significant.
- compute the test statistic.
- depending on the particular value of the test statistics either a) reject the null hypothesis (Ho) and accept the alternative hypothesis (Ha), or b) fail to reject the null hypothesis.
The steps in the “p-value” approach are:
- define or state the null and alternative hypotheses.
- select and compute the test statistic.
- refer the test statistic to its appropriate reference distribution.
- calculate the probability that a value of the test statistic as large as that observed would occur by chance if the null hypothesis were true (this probability, or p-value, is called the significance level).
- if the significance level is small, the tested hypothesis (Ho) is discredited, and we assert that a “significant result” or “significant difference” has been observed.
[Back to top]
An illustration of an hypothesis test that is frequently used in practice is provided by the t-test, one of several “difference-of-means” tests. The t-test (or more particularly Student’s t-test (after the pseudonym of its author, W.S. Gosset) provides a mechanism for the simple task of testing whether there is a significant difference between two groups of observations, as reflected by differences in the means of the two groups. In the t-test, two sample mean values, or a sample mean and a theoretical mean value, are compared as follows:
- the null hypthesis is that the two mean values are equal, while the
- alternative hypothesis is that the means are not equal (or that one is greater than or less than the other)
- the test statistic is the t-statistic
- the significance level or p-value is determined using the t-distribution
The shape of the t distribution can be visualized as follows (for df=30):
You can read about the origin of Gosset’s pseudonum (and his contributions to brewing) here.
The t-test for assessing differences in group means
[Details of the t-test]
There are two ways the t-test is implemented in practice, depending on the nature of the question being asked and hence on the nature of the null hypotheis:
- one-sample t-test (for testing the hypothesis that a sample mean is equal to a “known” or “theoretical” value), or the
- two-sample t-test (for testing the hypothesis that the means of two groups of observations are identical).
Example data sets:
Attach the example data, and get a boxplot of the data by group:
Two-tailed t-test (are the means different in a general way?)
The t-statistic is -0.2071 and the p-value = 0.8367, which indicates that the t-statistic is not significant, i.e. that there is little support for rejecting the null hypothesis that there is no difference between the mean of group 0 and the mean of group 1.
Two one-tailed t-tests (each evaluates whether the means are different in a specific way?)
Notice that for each example, the statistics (t-statistic, means of each group), are identical, while the p-values, and confidence intervals for the t-statistic differ). The smallest p-value is obtained for the test of the hypothes that the mean of group 0 is less than the mean of group 1 (which is the observed difference). But, that difference is not significant (the p-value is greater than 0.05).
A a second example
Here the t-statistic is relatively large and the p-value very small, lending support for rejecting the null hypothesis of no significant difference in the means (and accepting the alternative hypothesis that the means do differ). Remember, we haven’t “proven” that they differ, we’ve only rejected the idea that they are identical.
Differences in group variances
One assumption that underlies the t-test is that the variances (or dispersions) of the two samples are equal. A modification of the basic test allows cases when the variances are approximately equal to be handled, but large differences in variability between the two groups can have an impact on the interpretability of the test results:
Example data: [foursamples.csv]
t-tests among groups with different variances
There is a formal test for equality of group variances that will be described with analysis of variance.
[Back to top]
- Owen (The R Guide): section 7.1
By Science Buddies on February 23, 2010 9:23 AM
"If _____[I do this] _____, then _____[this]_____ will happen."
Sound familiar? It should. This formulaic approach to making a statement about what you "think" will happen is the basis of most science fair projects and much scientific exploration.
You can see from the basic outline of the Scientific Method below that writing your hypothesis comes early in the process:
- Ask a Question
- Do Background Research
- Construct a Hypothesis
- Test Your Hypothesis by Doing an Experiment
- Analyze Your Data and Draw a Conclusion
- Communicate Your Results
Following the scientific method, we come up with a question that we want to answer, we do some initial research, and then before we set out to answer the question by performing an experiment and observing what happens, we first clearly identify what we "think" will happen.
We make an "educated guess."
We write a hypothesis.
We set out to prove or disprove the hypothesis.
What you "think" will happen, of course, should be based on your preliminary research and your understanding of the science and scientific principles involved in your proposed experiment or study. In other words, you don't simply "guess." You're not taking a shot in the dark. You're not pulling your statement out of thin air. Instead, you make an "educated guess" based on what you already know and what you have already learned from your research.
If you keep in mind the format of a well-constructed hypothesis, you should find that writing your hypothesis is not difficult to do. You'll also find that in order to write a solid hypothesis, you need to understand what your variables are for your project. It's all connected!
If I never water my plant, it will dry out and die.
That seems like an obvious statement, right? The above hypothesis is too simplistic for most middle- to upper-grade science projects, however. As you work on deciding what question you will explore, you should be looking for something for which the answer is not already obvious or already known (to you). When you write your hypothesis, it should be based on your "educated guess" not on known data. Similarly, the hypothesis should be written before you begin your experimental procedures—not after the fact.
Our staff scientists offer the following tips for thinking about and writing good hypotheses.
- The question comes first. Before you make a hypothesis, you have to clearly identify the question you are interested in studying.
- A hypothesis is a statement, not a question. Your hypothesis is not the scientific question in your project. The hypothesis is an educated, testable prediction about what will happen.
- Make it clear. A good hypothesis is written in clear and simple language. Reading your hypothesis should tell a teacher or judge exactly what you thought was going to happen when you started your project.
- Keep the variables in mind. A good hypothesis defines the variables in easy-to-measure terms, like who the participants are, what changes during the testing, and what the effect of the changes will be. (For more information about identifying variables, see: Variables in Your Science Fair Project.)
- Make sure your hypothesis is "testable." To prove or disprove your hypothesis, you need to be able to do an experiment and take measurements or make observations to see how two things (your variables) are related. You should also be able to repeat your experiment over and over again, if necessary.
To create a "testable" hypothesis make sure you have done all of these things:
- Thought about what experiments you will need to carry out to do the test.
- Identified the variables in the project.
- Included the independent and dependent variables in the hypothesis statement. (This helps ensure that your statement is specific enough.
- Do your research. You may find many studies similar to yours have already been conducted. What you learn from available research and data can help you shape your project and hypothesis.
- Don't bite off more than you can chew! Answering some scientific questions can involve more than one experiment, each with its own hypothesis. Make sure your hypothesis is a specific statement relating to a single experiment.
Putting it in Action
To help demonstrate the above principles and techniques for developing and writing solid, specific, and testable hypotheses, Sandra and Kristin, two of our staff scientists, offer the following good and bad examples.
|Good Hypothesis||Poor Hypothesis|
|When there is less oxygen in the water, rainbow trout suffer more lice.|
Kristin says: "This hypothesis is good because it is testable, simple, written as a statement, and establishes the participants (trout), variables (oxygen in water, and numbers of lice), and predicts effect (as oxygen levels go down, the numbers of lice go up)."
|Our universe is surrounded by another, larger universe, with which we can have absolutely no contact.|
Kristin says: "This statement may or may not be true, but it is not a scientific hypothesis. By its very nature, it is not testable. There are no observations that a scientist can make to tell whether or not the hypothesis is correct. This statement is speculation, not a hypothesis."
|Aphid-infected plants that are exposed to ladybugs will have fewer aphids after a week than aphid-infected plants which are left untreated.|
Sandra says: "This hypothesis gives a clear indication of what is to be tested (the ability of ladybugs to curb an aphid infestation), is a manageable size for a single experiment, mentions the independent variable (ladybugs) and the dependent variable (number of aphids), and predicts the effect (exposure to ladybugs reduces the number of aphids)."
|Ladybugs are a good natural pesticide for treating aphid infected plants.|
Sandra says: "This statement is not 'bite size.' Whether or not something is a 'good natural pesticide' is too vague for a science fair project. There is no clear indication of what will be measured to evaluate the prediction."
Hypotheses in History
Throughout history, scientists have posed hypotheses and then set out to prove or disprove them. Staff Scientist Dave reminds that scientific experiments become a dialogue between and among scientists and that hypotheses are rarely (if ever) "eternal." In other words, even a hypothesis that is proven true may be displaced by the next set of research on a similar topic, whether that research appears a month or a hundred years later.
A look at the work of Sir Isaac Newton and Albert Einstein, more than 100 years apart, shows good hypothesis-writing in action.
As Dave explains, "A hypothesis is a possible explanation for something that is observed in nature. For example, it is a common observation that objects that are thrown into the air fall toward the earth. Sir Isaac Newton (1643-1727) put forth a hypothesis to explain this observation, which might be stated as 'objects with mass attract each other through a gravitational field.'"
Newton's hypothesis demonstrates the techniques for writing a good hypothesis: It is testable. It is simple. It is universal. It allows for predictions that will occur in new circumstances. It builds upon previously accumulated knowledge (e.g., Newton's work explained the observed orbits of the planets).
"As it turns out, despite its incredible explanatory power, Newton's hypothesis was wrong," says Dave. "Albert Einstein (1879-1955) provided a hypothesis that is closer to the truth, which can be stated as 'objects with mass cause space to bend.' This hypothesis discards the idea of a gravitational field and introduces the concept of space as bendable. Like Newton's hypothesis, the one offered by Einstein has all of the characteristics of a good hypothesis."
"Like all scientific ideas and explanations," says Dave, "hypotheses are all partial and temporary, lasting just until a better one comes along."
That's good news for scientists of all ages. There are always questions to answer and educated guesses to make!
If your science fair is over, leave a comment here to let us know what your hypothesis was for your project.