headache Relief Treatment 5.2 T1 4.7 T1 8.1 T1 6.2 T1 3 T1 9.1 T2 7.1 T2 8.2 T2 6 T2 9.10 T2 3.2 T3 5.8 T3 2.2 T3 3.1 T3
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
headache Relief Treatment 5.2 T1 4.7 T1 8.1 T1 6.2 T1 3 T1 9.1 T2 7.1 T2 8.2 T2 6 T2 9.10 T2 3.2 T3 5.8 T3 2.2 T3 3.1 T3
headache Relief Treatment 5.2 T1 4.7 T1 8.1 T1 6.2 T1 3 T1 9.1 T2 7.1 T2 8.2 T2 6 T2 9.10 T2 3.2 T3 5.8 T3 2.2 T3 3.1 T3 7.2 T3 2.4 T4 3.4 T4 4.1 T4 1 T4 4 T4 7.1 T5 6.6 T5 9.3 T5 4.2 T5 7.6 T5
We imported the data and displayed the structure of the dataframe. 1. A city tax assessor was interested in predicting residential home sales as a function of various charac- terics of the home and surrounding property. Data on 522 transactions were obtained for home sales from the previous year. We will investigate the relationship between the sales price in dollars and the number of bedrooms in the house. real.estate<-read.csv ("RealEstate.csv") str (real.estate) ⠀ ## 'data.frame': ## ## ## 522 obs. of 3 variables: $ Identification: int 1 2 3 4 5 6 7 8 9 10 ... $ Sales.price : int $ Bedrooms : int 360000 340000 250000 205500 275500 248000 229900 150000 195000 160000 ... 4 4 4 4 4 4 3 2 3 3 ...
(a) We fit a model that describes the sales price according to the number of bedrooms, and displayed the corresponding ANOVA table. We also displayed the estimated coefficients of the model. There is something wrong with this table. What did we forget to do? Discuss. Hint: Look at the degrees of freedom for the shift factor. model<-1m (Sales.price-Bedrooms, data-real.estate) anova (model) ## Analysis of Variance Table ## ## Response: Sales.price #0 Df Sum Sq Mean Sq F value Pr (>F) ## Bedrooms 1 1.6931e+12 1.6931e+12 107.14 2.2e-16 *** ## Residuals 520 8.2178e+12 1.5803e+10 #--- ## Signif. codes: 0*** 0.001 0.01 0.05 0.11 coefficients (model) ## (Intercept) Bedrooms ## 82808.80 56200.08 (b) We coerced the shift variable as a factor to produce the plots. Is the study balanced? real estate Bedrooms<-factor (real.estate$Bedrooms) table (real.estate$Bedrooms) ## 00 ## 0 1 2 3 4 5 6 7 1 9 64 202 179 52 12 3 (c) Since the frequencies are small in the extremes, i.e. very small number of rooms, and very large number of rooms, we combined homes with 0, 1, or 2 rooms into one category, and homes with 5, 6, or 7 rooms into one category. library(car) ## Loading required package: carData real estate Bedrooms<-with(real.estate, recode (Bedrooms, "c(¹0³, ¹1³, ¹2¹) = ¹0-2¹; c('5', '6¹,¹7¹)=¹5-7¹")) table (real.estate$Bedrooms) #0 ## 0-2 3 4 5-7 ## 74 202 179 67 Here are comparative boxplots for the sales price of the home according to the number of bedrooms. Based on these plots, is it reasonable to assume homogeneity of variance? If not, do the plots suggest that we might be able to find a suitable variance stabilization transformation.
# comparative boxplots library(ggplot2) ggplot (real.estate, aes (x = Bedrooms, y Sales.price)) + thene_bw() + geon_boxplot (color="dark grey") + geon jitter (height=0,width-0.2) + labs(y "Sales price (in dollars)",x="Number of bedroons") Sales price (in dollars) 750000- 500000- 250000- 0- Number of bedrooms (d) We fitted a log-log model to describe the cell standard deviation as a function of the cell mean. Based on the 95% confidence interval for the slope of this log-log model, what variance-stabilization transformations are suggested? n<- with (real.estate, tapply (Sales.price,Bedrooms, FUN-mean)) s<-with(real estate, tapply (Sales.price, Bedrooms, FUN-ad)) nodel<-In(log(s)-log(n)) # 95% CI for intercept and for the slope confint (model) ** 2.5 % 97.5 % ## (Intercept) -5.4004299 16.584404 ** log(n) -0.3896355 1.365977 # comparative boxplots library(ggplot2) 5-7 (e) We transformed the sales price on a logarithmic scale. Using the transformed response, we conducted the bootstrap modified Levene test, and constructed a normal qq-plot for the standardized residuals from the fitted ANOVA expressing the sales price (on a logarithmic scale) according to the number of bedrooms, and also comparative boxplots. What conclusions can we draw from these plots and test in terms of diagnostics for the one factor ANOVA model with fixed effects? real estateslog.price<-log(real estate Sales.price) ggplot (real estate, aes (x = Bedrooms, y log.price)) + thene_bw() geon_boxplot (color="dark grey") + +