STAT 4110/7110 Statistical Software and Data Analysis Homework 3 Note: You should turn in (1) R program (2) Routput and

Post by **answerhappygod** » Thu Apr 28, 2022 7:22 am

: Stat 4110 7110 Statistical Software And Data Analysis Homework 3 Note You Should Turn In 1 R Program 2 Routput And 1 (75.23 KiB) Viewed 32 times

data(college)
college<- College
attach(college). In order to load college.CSV data.

: Stat 4110 7110 Statistical Software And Data Analysis Homework 3 Note You Should Turn In 1 R Program 2 Routput And 2 (82.11 KiB) Viewed 32 times

STAT 4110/7110 Statistical Software and Data Analysis Homework 3 Note: You should turn in (1) R program (2) Routput and (3) the comments for the whole credit. Pur everything in one pdf file and upload it on Canvas. When you use the provided dataset, do not change anything of the dataset file. Use the file as is. 1. A dataset college.csv includes statistics for a large number of US Colleges from the 1995 issue of US News and World Report. The data contains a number of variables for 777 universities and colleges in the US. The variables are: Private: A factor with levels 'No and Yes indicating private or public university Apps: Number of applications received • Accept: Number of applications accepted Enroll: Number of new students enrolled Top 10pere: Pet. new students from top 10% of H.S. class Top25pere: Pet, new students from top 25% of H.S. class • F.Undergrad: Number of fulltime undergraduates • P.Undergrad: Number of parttime undergraduates • Outstate: Out-of-state tuition • Room.Board: Room and board costs • Books: Estimated book costs • Personal: Estimated personal spending • PhD: Pet of faculty with Ph.D.'s • Terminal: Pet. of faculty with terminal degree • S.F. Ratio: Student/faculty ratio perc alumni: Pet. alumni who donate Expend: Instructional expenditure per student Gid. Rate: Graduation rate (a) Use the read.csv() function in R to read the data into R. Call the loaded data college. Make sure that you have the directory set to the correct location for the data. (b) Look at the data using the fix() function. You should notice that the first column is just (b) Look at the data using the fix() function. You should notice that the first column is just the name of each university. We don't really want to treat this as data. However, it may be handy to have these names for later. Try the following commands: > rownames(college)-college1,1) > fix(college) You should see that there is now a row.names column with the name of each university recorded. This means that R has given cach row a name corresponding to the appropriate university, R will not try to perform calculations on the row names. However, we still need to eliminate the first column in the data where the names are stored. Try > college <- college. - 1) > fix(college)

i. the data set ii. . iv IV. Now you should see that the first data column is Private. Note that another column labeled row.names now appears before the Private column. However, this is not a data column but rather the name that is giving to each row. (c) Use the summary() function to produce a numerical summary of the variables in Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10). Use the plot() function to produce side-by-side boxplots of Outstate versus Private. Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going to divide universities into two groups based on whether or not the TI Create a new qualitative variable, called Elite, by binning the Top Operc variable. We are going to divide universities into two groups based on whether or not the proportion of students coming from the top 10% of their high school classes exceeds 50%. > Elite - rep("No", nrow(college)) > ElitefcollegeSTop10perc > 50] <- "Yes" > Elite <-as.factor(Elite) > college <- data.frame(college, Elite) Use the summary() function to see how many elite universities there are. Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite. Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables. You may find the command par(mfrow = c(2, 2)) useful it will divide the print window into four regions so that four plots can be made simultaneously. Modifying the arguments to this function will divide the screen in other ways. vi. Continue exploring the data and provide a brief summary of what you discover, (d) The question involves the use of multiple linear regression on the data. Produce a scatterplot matrix which includes all of the numerical variables in the Compute the matrix of correlations between the numerical variables using the function cor). Use the Im() function to perform a multiple linear regression with the number of applications accepted (Accept) as the response variable and all other numerical variables. Use the summary() function to print the results. Comment on the output (eg, which predictors appear to have a statistical significant relationship to the response) Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit. Use the and : symbols to fit linear regression models with interaction effects. Do nay interactions appear to be statistically significant? Try a few different transformations of the variables, such as log(Y), V7,72 Comment on your findings. i. data set ii. 11. iv V Vi