In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This proposes a range of plausible values for an unknown parameter (for example, the mean). The interval has an associated confidence level that the true parameter is in the proposed range. Given observations and a confidence level , a valid confidence interval has a probability of containing the true underlying parameter. The level of confidence can be chosen by the investigator. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.
More strictly speaking, the confidence level represents the frequency (i.e. the proportion) of possible confidence intervals that contain the true value of the unknown population parameter. In other words, if confidence intervals are constructed using a given confidence level from an infinite number of independent sample statistics, the proportion of those intervals that contain the true value of the parameter will be equal to the confidence level. For example, if the confidence level (CL) is 90% then in hypothetical indefinite data collection, in 90% of the samples the interval estimate will contain the true population parameter.
The confidence level is designated prior to examining the data. Most commonly, the 95% confidence level is used. However, confidence levels of 90% and 99% are also often used in analysis.
Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample. A larger sample will tend to produce a better estimate of the population parameter, when all other factors are equal. A higher confidence level will tend to produce a broader confidence interval.
Many confidence intervals are of the form: , where is the realization of the dataset, c is a constant and is the standard deviation of the dataset. Another way to express the form of confidence interval:
(point estimate – error bound, point estimate + error bound)
or symbolically expressed, (– EBM, + EBM)
where (point estimate) serves as an estimate for m (the population mean) and EBM is the error bound for a population mean.
The margin of error (EBM) depends on the confidence level.
A thorough, general definition:
Suppose a dataset is given, modeled as realization of random variables Let be the parameter of interest, and a number between 1 and 0. If there exist sample statistics and such that:
for every value of
Then , where and , is called a 100% confidence interval for . The number is called the confidence level.