Populations and Samples
Descriptive and Inferential Statistics
Illustrated with the "Immediate Memory Span" Project
GFE Questions:
- What are the three basics of measurement?
- In this reading, what is meant by "inference"?
- Is "inference" similar to inferential or deductive reasoning? You may recall that deductive reasoning is going from the general to the particular; inferential reasoning is going from the particular to the general.
- Which is more correct? To say the Monte Carlo technique verifies the validity or reliability of the value measured by the sample?
- Whats the difference betweenf a statistic and a parameter.
- How is the use of the term "population" different when used by a mathematician/statistician than when used by a scientist/researcher?
- When using infential statistical theory to generalize from the sample to the population, what is the scientist's goal?
- How can a researcher be sure the prediction/generalization they make is correct?
What I (a scientist) wants to know:
- What is a typical memory span? What's a long span? What's a short span?
- Does memory span depend upon the type of material (letters, digits, figures, etc.) presented?
These questins are answered scientifically by observing (empiricism).
What do I observe?
- Operationally define the phenomenon of interest.
- Validity: Does the measurement procedure measure what it says it measures?
- To be valid the measurement procedure must be reliable.
- Reliable: I get the "same" results when I repeat the measurement.
- Measure digit span several times using different sequences of numbers.
- Each time should give me similar digit spans.
- To be reliable the measurement procedure must be objective
- Two observers would independently assign the same number
- Multiple choice tests are objective.
- (Objectivity, reliability, and validity are the basics of measurement.)
I can't measure everyone in order to find out the answer to my question. Its absurd to think that I could. Can I measure relatively few people and use that information as a substitute for measuring everyone?
Perhaps if I measure the memory span on a small set of people (a sample), that will tell me something useful about the memory span of everyone (the population).
- How will I know the sample results accurately reflect the population values? After all, I can't measure everyone to show the sample results tell me anything about people in general.
- What I can do is to repeat the sampling process and show that the measurement results are similar from one sample to the next to the next, etc. I can empirically demonstrate the reliability of the information obtained by repeating the the measurement process on many samples.
- Repeating the measurement process on many samples to determine how results change from sample to sample is known as the Monte Carlo technique of demonstrating (or determining) inferential reliabillity. [Monte Carlo is a well known gambling casino in Europe where many people continually repeat, sample after sample, that the house always wins].
Inferential statistics is an area of mathematics where the process of infering population characteristics from sample values has been formalized and made rigorous.
The development of inferential statistics has led to two sets of specialists using similar terms but for different reasons.
- Mathematicians speak statistic-ese. It is the formal model
- Sample values (statistics) reflect population values (parameters) in a completely determined manner.
- We can describe how big the differences is between a statistic and a parameter (aka "sampling error") probabilistically.
- 68% of sample means (each mean is a statistic) are within one standard deviation of the population mean (the parameter; the "true" value).
- That also means 32% of sample means are more different from the true value than one standard deviation.
- I can choose an "error rate" and make predictions knowing that sometimes I'll be right and sometimes I'll be wrong. ("The margin of error in the poll is 3%.")
- Scientists and researchers speak experimenter-ese. Its the applied process.
- As a scientist/researcher I want to be able to predict what the mean of future samples will be based on the measurements I take on this sample.
- If I follow established procedures (e.g., using randomly drawn samples) then the sample of subjects will give results I can use to generalize to people in general (the population). The generalization (prediction) won't be "correct" every time because "some samples are closer than others". But it will always be "correct" in a probabilistic manner.
By measuring IMS for digits and letters from a small sample of subjects, a scientist can determine memory span for "people in general" However, the scientist, in using inferential statistics knows the generalizations (predictions) will be wrong a certain proportion of the time because some samples are "further away" [have a larger sampling error] than others.
Is there any way to make predictions (generalizations) without error? Yes there is and its really very very simple. Never test your predictions. Never state predictions (theoretical statements) in a way in which they can be disproven. You'll never be wrong.
© 2002 by BurrtonWoodruff. All rights reserved. Modified Friday, June 7, 2002