Page 1 of 1

2. A sociologist is interested in the historical relationships between characteristics of var- ious occupations. In part

Posted: Mon May 02, 2022 6:40 am
by answerhappygod
2 A Sociologist Is Interested In The Historical Relationships Between Characteristics Of Var Ious Occupations In Part 1
2 A Sociologist Is Interested In The Historical Relationships Between Characteristics Of Var Ious Occupations In Part 1 (135.57 KiB) Viewed 36 times
2. A sociologist is interested in the historical relationships between characteristics of var- ious occupations. In particular, they wish to model the Pineo-Porter prestige score (a measure of the prestige of an occupation with higher values indicating higher prestige) as the response (y) variable. The researcher has access to data on each of a range of occupations, including: - • y - Pineo-Porter prestige score • education - Average duration of education of those in the occupation, in years • income - Average income of those in the occupation, in Canadian dollars • women - Percentage of those in the occupation who were women • type - Type of occupation. A factor with levels: bc (Blue Collar); prof (Profes- sional, Managerial, and Technical); wc (White Collar). The sociologist ran a forward selection algorithm to decide on the variables to be included in their final model. The output of the algorithm is printed on the next two pages.

Forward Selection Method Step 0: AIC = 837.5078 y“ 1 Variable DF AIC Sum Sa RSS R-Sq Adj. R-S. 1 1 education type income women 703.342 724.290 772.624 838.312 21282.471 19775.594 14021.617 343.887 7064.405 8571.281 14325.259 28002.988 0.751 0.698 0.495 0.012 0.74 0.69 0.48 0.00 1 1 Step 1 : AIC = 703.3419 у education Variable DF AIC Sum Sa RSS R-Sq Adj. R-Sq 1 income type women 1 676.669 686.997 694.133 1791.966 1324.358 763.477 5272.439 5740.047 6300.927 0.814 0.798 0.778 0.810 0.791 0.773 1 Step 2 : AIC = 676.6695 у education + income Variable DF AIC Sum Sa RSS R-Sq Adj. R-Sq 1 type women 669.015 678.477 591.163 10.371 4681.276 5262.068 0.835 0.814 0.828 0.808 1

Step 3: AIC = 669.0151 у education + income + type Variable DF AIC Sum Sa RSS R-Sq Adj. R-Sq women 1 670.967 2.288 4678.988 0.835 0.826 CONTINUED OVERLEAF/ 5

Final Model Output Model Summary R R-Squared Adj. R-Squared Pred R-Squared 0.914 0.835 0.828 0.816 RMSE Coef. Var MSE MAE 7.095 14.991 50.336 5.527 RMSE: Root Mean Square Error MSE: Mean Square Error MAE: Mean Absolute Error Sum of Squares DF Mean Square F Sig. 4 117.537 0.0000 Regression Residual Total 23665.599 4681.276 28346.876 5916.400 50.336 93 97 Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) type 2 19775.6 9887.8 196.435 < 2.2e-16 *** education 1 2831.2 2831.2 56.246 3.708e-11 *** income 1 1058.8 1058.8 21.034 1.405e-05 *** Residuals 93 4681.3 50.3

(a) Define the measure used in this algorithm to decide whether or not to include a variable. Explain how this measure is used. (3 MARKS] (b) List the order the search adds variables and explain why it decides to stop. [2 MARKS] (c) Why is the “Analysis of Variance Table” an appropriate summary of the contri- bution of each variable in the final model? [2 MARKS] CONTINUED OVERLEAF/ 6

(d) The coefficients summary table for the final model is given here. Call: lm(formula = y type + education + income, data = Prestige) Residuals: Min 1Q -14.9529 -4.4486 Median 0.1678 3Q 5.0566 Max 18.6320 Coefficients: Estimate Std. Error t value Pr(>1t) (Intercept) -0.6229292 5.2275255 -0.119 0.905 typeprof 6.0389707 3.8668551 1.562 0.122 typewc -2.7372307 2.5139324 -1.089 0.279 education 3.6731661 0.6405016 5.735 1.21e-07 *** income 0.0010132 0.0002209 4.586 1.40e-05 *** Residual standard error: 7.095 on 93 degrees of freedom (4 observations deleted due to missingness) Multiple R-squared: 0.8349, Adjusted R-squared: 0.8278 F-statistic: 117.5 on 4 and 93 DF, p-value: < 2.2e-16 Comment on the tests and goodness of fit summaries included in the table. [3 MARKS] (e) Interpret the coefficients of typeprof and education. [3 MARKS] (f) The sociologist wants more information about the education effect so requires a range of plausible values rather than a point estimate. Calculate the 99% confi- dence interval for the relevant parameter. [3 MARKS]

(g) Why will inference based on confidence intervals correspond to that based on the p-values? [1 MARK] (h) The sociologist later used a Box-Cox transformation approach with each of the pre- dictor variables which led them to refit this model after transforming the income variable using a logarithmic transformation. Outline the process the researcher used to decide on this transformation (you do not need to perform any calcula- tions). [3 MARKS] CONTINUED OVERLEAF/