(1) Investigators were interested in determine whether sex and
age affect how children develop new skills. To do so, they measured
skill development (D scores) and age (in years) for six boys and
six girls. These data were compiled and analyzed using the below
data frame and R code. All questions can be answered by running the
code below - feel free to add additional analyses if you think they
are helpful. In your answers, you will need to post relevant
results from your code and explain the results in 1-4 sentences. It
is not sufficient to just provide a number. a. Run a simple linear
regression model, with skill development scores as the outcome
variable and sex as the independent variable. Calculate the model
results for boys and girls and interpret the results and their
meaning of the model coefficient for sex . b. Run the analysis as a
two-sample t-test, testing the null hypothesis that the mean D
score for boys and girls are statistically equal. How do the
results for the regression (part a) and the t-test compare? Why
does this comparison make sense? What are your conclusions with
regard to the mean D scores for boys and girls? c. Now additionally
consider age in your analysis. Create a scatterplot of D scores
versus age. From this scatterplot, do you believe that it is
possible that the difference in D scores found in Part(a) can be
attributed to age differences rather than a difference in sex? Why?
d. Test your answer to Part(c) using a multiple regression model
that include age and sex as independent variables. Interpret your
results with regard to (1) fit of the model, (2) amount of
variation explained by each independent variable, (3) the effect of
age on D scores, (4) the effect of sex on D scores, and (5) the
model diagnostic plots. Do these results alter your conclusion from
Part(c)? e. Add an interaction term of age and sex to your part(d)
regression model Using the results, write the model equation
specific to males and females. Do you believe that the interaction
term adds to the predictive ability of the model? Why or why not?
Support your answer with information about (1) model fit, (2)
coefficients of each predictor and the interaction term, and and
(3) the model diagnostic plots. f. Of the models that you fit in
Parts (a), (d), (e), which model performs the best? Why? R code #
Create dataframe, where D=developmental scores D <- c(8.61,
9.40, 9.86,9.91,10.53,10.61,10.59,13.28,12.76, 13.44,14.27,14.13)
Age <- c(3.33,3.25,3.92,3.50,4.33,4.92,6.08,7.42,
8.33,8.00,9.25,10.75) sex <-
c("F","F","F","F","M","F","M","M","M","F","M","M") data2 <-
data.frame(cbind(D,Age,sex)) # combine groups print(data2) # Make
age and development scores continuous variables data2$age <-
as.numeric(data2$Age) data2$dev <- as.numeric(data2$D) data3 =
subset(data2, select = -c(Age,D) ) # remove D and Age from
dataframe print(data3) box1 <- ggplot(data3, aes(x=sex, y=dev,
fill=sex)) + geom_boxplot() + labs(y= "Development Score", x =
"Sex") + ggtitle("Development Score by Sex") + theme(axis.text.x =
element_text(angle = 90, vjust = 0.5, hjust=1)) print(box1)#
Levene’s test for equal variances library(car) leveneTest(y =
data3$dev, group = data3$sex) # t-tests to compare site types
assuming equal and non-equal variances t.test(data3$dev~data3$sex,
var.equal = TRUE) # simple linear regression model1 <-
lm(data3$dev~data3$sex) summary(model1) anova(model1) # model
diagnostics par(mfrow = c(2, 2)) # Split the plotting panel into a
2 x 2 grid plot(model1) # Plot the model information # multiple
linear regression model2 <- lm(data3$dev~data3$sex + data3$age)
summary(model2) anova(model2) plot1 <- ggplot(data3, aes(x=age,
y=dev, color=sex)) + geom_point() + ggtitle("Development Score and
Age") + ylab("Development Score") + xlab("Age (years)")
print(plot1) # model diagnostics par(mfrow = c(2, 2)) # Split the
plotting panel into a 2 x 2 grid plot(model2) # Plot the model
information # multiple linear regression with interaction term
model3 <- lm(data3$dev~data3$sex + data3$age +
data3$sex*data3$age) summary(model3) anova(model3) # model
diagnostics par(mfrow = c(2, 2)) # Split the plotting panel into a
2 x 2 grid plot(model3) # Plot the model information # comparison
of models anova(model1, model2) anova(model2, model3) anova(model1,
model3) (2) A researcher was asked to assess whether the means of 3
groups were equal using an ANOVA test. Each group included 21
observations. Unfortunately, water spilled on the ANOVA output
before the researcher had a chance to write the report,
making the ANOVA results unreadable. Luckily, the researcher
remembered that the p-value for the test was about 0.01 and that
one of the mean squares was 100 and the other 500, but the
researcher could not remember which mean square was which.
Fortunately, you know how to help her reconstruct the analysis of
variance table based on her memory. Complete the table below:
Source df Sum of Squares Mean Square F p-value Group 0.01 Error
Total (3)You fit a simple logistic regression model of the log odds
of having had a heart attack (variable name "attack") as predicted
by cholesterol levels (variable name "chol"). Having had a heart
attack was coded as 1 and never having a heart attack was coded as
0. The results from this model are shown below. How does an
increase in cholesterol levels impact the odds of having had a
heart attack? (1-3 sentences is sufficient.) Call: glm(formula =
attack ~ chol, family = "binomial", data = dataset) Deviance
Residuals: Min 1Q Median 3Q Max -0.9726 -0.7525 -0.7121 -0.6550
1.8476 Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.960288 0.223854 -8.757 < 2e-16 *** chol 0.11040
0.002966 3.722 0.000198 *** Null deviance: 3042.2 on 2757 degrees
of freedom Residual deviance: 3028.5 on 2756 degrees of freedom
AIC: 3032.5 (4) Fill in the missing values from the output of a
multiple linear regression with two continuous, independent
variables. Show all calculations. For the p-value determination, it
is sufficient to indicate if it is less than 0.05. Interpret the
results with regard to hypothesis testing for the X1 and X2
coefficients, the adjusted R2, and the F statistic. Coefficients
Estimate Std. Error t-value Pr(>|t|) Intercept 163.9709 17.0414
X1 1.0676 9.765 X2 3.3062 3.526 Residual standard error:
______________ on _________degrees of freedom Multiple R-squared:
0.6367, Adjusted R-squared: ____________ F-statistic:
______________ on __________ and __________ DF, p-value:
<2.2e-16 Analysis of Variance Table Source df Sum Sq Mean Sq F
value Pr(>F) X1 485,688 X2 51,522 Residuals 306,585 4,143
(1) Investigators were interested in determine whether sex and age affect how children develop new skills. To do so, the
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
(1) Investigators were interested in determine whether sex and age affect how children develop new skills. To do so, the
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!