EDF+6938+Exam+1

EDF 6938 Fall, 2011 Exam #1 Test Blueprint

Exam 1 will be administered the weekend of October 8th and 9th. The exam is in the format of short-answer responses. You will have two hours to complete the exam. The topics for exam 1 are given below.


 * 1) ** Review of Descriptive Statistics **

A statistical measure that identifies a single value as a representative of the entire distribution, Typical score or center of the distribution – **Mean:** Arithmetic average of numbers. Pros/Cons: uses all the data, easy to work with mathematically, stable. Works poorly in a small sample with extreme scores, requires interval or ratio data. **Median:** Middle piece of data once the data has been ranked from smallest to largest; average of the two middle pieces of data when the data has an even number of pieces of data; median divides a set of scores into two equal halves, resulting in 50% scores above the median and 50% below the median. Pros/Cons: always the 50th percentile, works well for small samples with extreme scores, can be used with ordinal data; unstable **b.Measures of Variability** - a quantitative measure of the degree to which scores in a distribution are spread out or clustered together. **Range:** difference between the largest (max) piece of data and the smallest (min) piece of data. Range= max-min I**nterquartile Range:** distance between the first quartile and the third quartile. Measures the range covered by the middle 50% of the distribution.
 * a.Measures of Central Tendency **
 * Mode: ** The point or score that occurs with the greatest frequency . P ros/Cons: can be used with all types of data (nominal, ordinal, interval, or ratioP, often viewed as being what is most typical for the data set, interpretations can be misleading, unstable, a set of data may have one mode, more than one mode, nor no mode.
 * Semi-interquartile range:** measures one-half of the IQR, or it measures the distance from the middle of the distribution (median) to the boundaries that define the middle 50% of the data.

**Variance** the mean (or average) of the squared deviations. The standard deviation is in metric of scores and allows for interpretation and the formation of confidence intervals. The variance is the metric of scores squared. Therefore it is not useful for interpretation but can be decomposed into sources of explanation (e.g., ANOVA and other statistical procedures).
 * Standard deviation: ** the average amount scores deviate from the mean. deviation score: distance from the mean, direction from the mean. the square root of the variance


 * 2. Correlation **

**a.Purpose of correlation** to establish if there is a relationship between two variables, how are two variables related, and the strength of the relationship **b.Interpret a correlation coefficient** cannot be interpreted as a measure of causality, depends on context and purpose linearity: the pattern of association between the two variables is assumed to be a linear patter (resembles a line) normality: both variables are assumed to be from distributions that approximate a normal curve (bell-shaped distribution) homoscedasity: equal variance
 * c.Given a printout, identify the correlation **
 * coefficient **
 * d.Assumptions **


 * 1) Introduction to Inferential Statistics

a.Sampling Distribution of the mean 1.Definition : it is the collection of sample means for all possible random samples of a particular size (n) that can be obtained from a population. 2.Shape, mean, standard error: The shape is **bell shaped** (normally distributed). The sample means tend to pile up around the population mean, thus "the **mean of the distribution of sample means will be equal to the population mean and is called the expected value of the sample means**." The standard deviation of the distribution of the sample means is called the **standard error of the mean**. The standard error is the average distance between the sample means and the population mean. *The larger the sample the more accurate the sample and the smaller the standard error of the mean. *The smaller the population standard deviation, the smaller the standard error of the mean.

b.Type I Error : If the null hypothesis is true and the researcher rejects the null c.Type II Error : If the null hypothesis is false, but the researcher accepts (fails to reject) the null. d.Power : the probability of rejecting a false null hypothesis. This probability is 1-ᵦ e.Significance level: significance level defines an unlikely sample assuming the null hypothesis is true. The symbol is alpha. The most common level used is 0.05. f.P value: the calculated probability of the test statistic. This is compared with the significance level (usually 0.05) if the p-value is less than or equal to the significance level, it is concluded that the sample is statistically unusual, i.e. we reject the null hypothesis. g.Set up hypotheses: State a value for your parameter -- this is the null hypothesis. This is indicated with an H0.

Create an alternative hypothesis. Two options for alternative hypothesis: Non-directional alternative, two-tailed test (population mean is not equal to some value) OR Directional alternative, one-tailed test (population mean is less than some value or is greater than some value ). 1.One Sample Test: Possibly she means "One-tailed test" -- which is a directional alternative hypothesis. When we say that the population mean is either greater than or less than some value.

Refer to Single Sample t test in Week 4: Use to determine if a sample is representative or belongs to a larger population. Use when population variance is known. Use sampling distribution of the t statistic.

2.Two Sample Test: Possibly she means "Two-tailed test" -- which is a non-directional alternative hypothesis. When we say that the population is NOT equal to some value.


 * 1) Two-Sample Test (Independent and Dependent)

a.Know the purpose of each: Independent samples t-test is used when you have two or more samples and subjects are independently assigned to one and **only** one group. Goal is to obtain two separate sets of independent samples and use them to make inferences about the populations from which they were drawn.

Dependent samples t-tests are used when an observation in one group is related to an observation in the other group. Examples are related persons, matched subjects, and pretest/posttest designs.

b.Recognize which one (independent or dependent) is the appropriate one to use

c.Calculate degrees of freedom For independent: df=n1+n2-2 For dependent: df=n-1

d.Conduct the test 1.Set up hypotheses 2.Given a printout, identify the test statistic and p-value 3.Make a decision 4.Draw conclusions


 * 1) Assumptions for Independent and Dependent Samples t-test

a.Know the assumptions for each test

For independent: Assumptions is that the dependent variable is approximately normally distributed, each population has the same variance (homogeneity of variance), and the outcomes are statistically independent of one another (independence).

For Dependent: Normality ( distribution of each group's population is normal) & independence of difference scores (difference scores for the subjects in the experiment are independent of one another).

b.Know consequences of violating the assumptions

5. Regression

a. Simple Regression

A linear Function used to Summarize a set of points.

A mathematical function used to make predictions.

1. Purpose

We can use a linear function in statistics to summarize the relationship between two variables X and Y.

The linear function can provide the BEST summary of the relationship between two sets of points.

IF the line is draw down the middle of the set of points it is the BEST summary. Means are used to create the line.

A statistical summary attempts to capture the essence of a data set without referring to individual data points.

2. Interpret the slope and intercept

** Slope ** :

The slope of the line is represented by //b//. It indicates the amount of change in Y that is associated with a change in X.

The slope indicates how many units Y increases/decreases for every one-unit increase in X.

The sign of the slope indicates the direction of the line.


 * Positive: Y increase as X increases
 * Negative: Y decreases as X increases

Slope Example:

Y = 100 – 3X where X is the number of pills and Y is the number of pounds gained per day.

The slope of -3 indicates that for each pill taken an average of 3 pounds per day will be lost.

** Intercept (//a)// **

The intercept indicates where the line crosses the Y axis.

It indicates that value of Y when X is zero.

In many cases the intercept is not interpretable.

3. Given a printout, identify the slope and Intercept

y-intercept is usually the (constant)

Slope is listed next to the variable.

4. Given a printout, construct the linear regression equation

Y = //a// + //b//X

Remember a = y intercept, b = slope

5. Interpret R2 and standard error

Standard error of estimate: average discrepancy from the regression line. Indicates the magnitude of error made in estimating Y from X.

6. Given a printout, identify R2 and standard error

See example at end of simple regression PP.

b. Simple Regression 1. Purpose 2. Interpret the slopes and intercept 3. Given a printout, identify the slopes and Intercept 4. Given a printout, construct the multiple linear regression equation 5. Interpret adjusted R2 6. Given a printout, identify the adjusted R2  c. Assumptions

EDF 6938 Test I – Practice Problems

1. During the 1980s the Department of Defense, in conjunction with each of the services, conducted a massive study of military job performance. The study sampled tasks from military jobs and observers’ evaluation of job incumbents’ performance. The mean for the population of performance scores is 50 and the standard deviation is 100. One finding in the analysis of data on both Navy machinist mates and Marine Corps infantry was that the distribution of performance was normal. A random sample of 64 performance scores for Navy machinist was selected. The mean for the sample of scores was 60. Suppose a researcher claims that the sample comes from a population with a mean greater than 50. State the mull and alternative hypothesis associated with the researcher’s claim. H0: The mean for the population is 50. Ha: The mean for the population is greater than 50. (one-tailed) or Ha: The mean for the population ≠ 50. (two-tailed)

2. Suppose you have a positively skewed population distribution. The mean of this distribution is 125. The standard deviation of this distribution is 25. Suppose one draws one sample at n=100. a. What is the shape of the sampling distribution of the means? Positively skewed means big tail to the right.

b. What is the mean of the sampling distribution of the means? “ The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean, μ, then the sampling distribution of the mean is also μ.” Therefore it is 125. From… []

c. What is the standard deviation of the sampling distribution of the means? s X= 25/10 = 2.5 (the "10" is actually the square root of 100)

3. Which measure of central tendency (mean, median, or mode) will always represent the 50th percentile? The Median

4. What is the relationship between the standard deviation and the variance? Which measure is in the units of the original data? The standard deviation is the square root of the variance. The standard deviation would have the units of the original data.

5. What are the assumptions of regression?

__Independence __: scores of any subject are independent of the scores of all other subjects

__Normality __: scores of dependent variable are normally distributed for each of the independent variables.

__Homoscedasticity __: In the population, the variances of the dependent variable for each X variable are equal.

__<span style="color: #ff0000; font-family: 'Arial','sans-serif';">Linearity __<span style="color: #ff0000; font-family: 'Arial','sans-serif';">: In the population, the relationship between the dependent variable and an independent variable is linear. 6 Define: type I error, type II error, significance level and power

__<span style="color: #ff0000; font-family: 'Arial','sans-serif';">Type I Error __<span style="color: #ff0000; font-family: 'Arial','sans-serif';">: rejecting a true H0

__<span style="color: #ff0000; font-family: 'Arial','sans-serif';">Type II Error __<span style="color: #ff0000; font-family: 'Arial','sans-serif';">: accepting a false H0

__<span style="color: #ff0000; font-family: 'Arial','sans-serif';">Significance level __<span style="color: #ff0000; font-family: 'Arial','sans-serif';">: the probability of making a type 1 error.

__<span style="color: #ff0000; font-family: 'Arial','sans-serif';">Power __<span style="color: #ff0000; font-family: 'Arial','sans-serif';">: the probability of rejecting a false null hypothesis is called the power of the test.

7. What is the difference between a non-directional test of hypothesis and a directional test of hypothesis? How is the p-value reported on a printout treated differently for a directional test of hypothesis?

<span style="color: #ff0000; font-family: 'Arial','sans-serif';">A non-directional test of hypothesis is a two-tailed test. It indicates, for example, that the population mean is NOT EQUAL to a specified value.

<span style="color: #ff0000; font-family: 'Arial','sans-serif'; font-size: 16px;">A directional test of hy pothesis is a one-tailed test. For example, the population mean is greater than some value or it is less than some value.

8. For each description provided below identify the statistical method that best addresses the researcher’s needs. Possible methods are


 * 1) Dependent samples t-test
 * 2) Independent samples t-test
 * 3) Correlation
 * 4) Regression (Simple and Multiple)

_ Brothers and sisters were each paired and randomly assigned to one of two pre-school enrichment treatment groups. The researcher wishes to determine if there is a statistically significant difference between the means of the two groups. _ A researcher wants to measure the strength of the association between age and ability to manage stress. Age is measured in number of years and stress level is measured on a 10-point scale. Students were randomly assigned to two different methods of teaching reading comprehension. It is hypothesized that a difference exists between the two methods. A researcher wants to predict a person’s critical thinking skills based on the number of hours of sleep the person gets each night.

9. A researcher wanted to determine whether army technicians’ ability to track targets on a cathode-ray tube (CRT) differed under two conditions of visual display: white foreground on black back-ground and yellow foreground on green background. In a pilot study, a random sample of 8 technicians was selected; all eight were observed under both conditions. The outcome variable was the number of targets correctly tracked. The data and results are summarized in the SPSS printout below. Use the printouts to answer the questions that follow the printouts.


 * ** Paired Samples Statistics ** ||
 * || Mean || N || Std. Deviation || Std. Error Mean ||
 * Pair 1 || White_on_Black || 1.0000 || 8 || 1.06904 || .37796 ||
 * Yellow_on_Green || 2.3750 || 8 || 1.40789 || .49776 ||


 * ** Paired Samples Correlations ** ||
 * || N || Correlation || Sig. ||
 * Pair 1 || White_on_Black & Yellow_on_Green || 8 || .569 || .141 ||


 * ** Paired Samples Test ** ||
 * |||||||||| Paired Differences || t || df || Sig. (2-tailed) ||
 * ^  || Mean || Std. Deviation || Std. Error Mean |||| 95% Confidence Interval of the Difference ||^   ||^   ||^   ||
 * ^  ||^   ||^   ||^   || Lower || Upper ||^   ||^   ||^   ||
 * Pair 1 || White_on_Black - Yellow_on_Green || -1.37500 || 1.18773 || .41993 || -2.36797 || -.38203 || -3.274 || 7 || .014 ||

a. State the null and alternative hypothesis. b. List each assumption associated with the test. c. Identify the test statistic and p-value d. Make a decision to reject or not. e. Write a conclusion. <span style="color: #ff0000; font-family: 'Arial','sans-serif';">THIS IS THE SAMPLE PROBLEM FROM DEPENDENT SAMPLES T-TEST.

10. The results provided are from the article Facilitating Preservice Teachers’ Development of Technological, Pedagogical, and Content Knowledge (TPACK) by Chai, Koh and Tsai. The article is in //Educational Technology and Society//, volume 13. In the article the researchers study the impact of pedagogical knowledge (PK), content knowledge (CK) and technological knowledge (TK) on teachers’ development in technological, pedagogical, and content knowledge (TPACK). The results for the regression analysis using PK, CK, and TK as the independent variables and TPACK post survey score as the dependent variable are presented in the table below.


 * Predictors || B || Std. Error || Significance || R2 ||
 * Constant || 0.43 || 0.16 || <0.01 ||  ||
 * PK || 0.60 || 0.04 || <0.01 ||  ||
 * CK || 0.15 || 0.04 || <0.01 ||  ||
 * TK || 0.20 || 0.03 || <0.01 || 0.74 ||

Use the information above to a. Write the regression equation <span style="color: #ff0000; font-family: 'Arial','sans-serif';">Ŷ = 0.43 + 0.60X1 + 0.15X2 + 0.20X3

b. interpret the slopes <span style="color: #ff0000; font-family: 'Arial','sans-serif';">The slope for PK is 0.60. This means that each point increase in PK score increases the TPACK score by an average of 0.60 points.

<span style="color: #ff0000; font-family: 'Arial','sans-serif';">The slope for CK is 0.15. This means that each point increase in CK score increases the TPACK score by an average of 0.15.

<span style="color: #ff0000; font-family: 'Arial','sans-serif';">The slope for TK is 0.20. This means that each point increase in TK score increases the TPACK score by an average of 0.20 points.

c. Test the null hypothesis that β is equal to 0 <span style="color: #ff0000; font-family: 'Arial','sans-serif';">The p-value (<0.01) is less than the significance level, therefore the null hypothesis is rejected.

d. Explain the proportion of variance accounted for by the model <span style="color: #ff0000; font-family: 'Arial','sans-serif';">74% of the variation in the TPACK post survey score can be attributed to the model.

11. A researcher wanted to compare a counseling method for decreasing depression. A group of thirty patients were randomly assigned to the intervention (15 receive the intervention and 15 are a control group and receive the intervention after completion of the study). After three weeks of treatment, a 10-point scale was given to both groups. Higher scores are an indication of depression. The following results were obtained.


 * ** Independent Samples Test ** ||
 * |||| Levene's Test for Equality of Variances |||||||||||||| t-test for Equality of Means ||
 * ^  || F || Sig. || t || df || Sig. (2-tailed) || Mean Difference || Std. Error Difference |||| 95% Confidence Interval of the Difference ||
 * ^  ||^   ||^   ||^   ||^   ||^   ||^   ||^   || Lower || Upper ||
 * Score || Equal variances assumed || 2.309 || .140 || -5.424 || 28 || .000 || -2.73333 || .50395 || 3.76563 || 1.70103 ||
 * Equal variances not assumed ||  ||   || -5.424 || 25.82 || .000 || -2.73333 || .50395 || 3.76956 || 1.69711 ||

a. State the null and alternative hypothesis. b. List each assumption associated with the test. c. Identify the test statistic and p-value d. Make a decision to reject or not. e. Write a conclusion.