Lab Lab 7 - External Data: Using SQL This activity description does not provide the same level of code prompts as previo
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
Lab Lab 7 - External Data: Using SQL This activity description does not provide the same level of code prompts as previo
install.packages('sqldf')library(sqldf)head(airquality)sqldf('select * from airquality where Ozone>(select avg(Ozone)from airquality)')avg<-sqldf('select avg(Ozone) as average from airquality')avg #1 42.12931newAQ<-sqldf('select * from airquality where Ozone>(selectavg(Ozone) from airquality)')head(newAQ)average_ozone <-mean(airquality$Ozone,na.rm=TRUE)average_ozone
install.packages(dplyr)
library(dplyr)newAQ2<-filter(airquality,airquality$Ozone>mean(airquality$Ozone,na.rm=T))print(newAQ2)
I feel like I am missing more work i dont know..Thank you for your help.
Lab Lab 7 - External Data: Using SQL This activity description does not provide the same level of code prompts as previous labs - it is assumed that you remember or can look up the necessary code. The overall goal of this activity is to use SQL to produce a subset of the built-in "airquality" R dataset that contains only those records where the concentration of ozone is higher than the mean level of ozone. These are the conceptual steps you will need to follow¹: 1. Install and activate ("library()") the saldf package in RStudio. With any new package it is possible to run into installation issues depending on your platform and the versions of software you are running, so monitor your diagnostic messages carefully (2 points: 1 point per task). 2. Review online documentation for saldf so that you are familiar with the basic concepts and usage of the package and its commands (1point). 3. (a) Make sure the built-in "airquality" dataset is available for use in subsequent commands (hint: print head(airquality)). It would be wise to reveal the first few records of air quality with head() to make sure that air quality is available. This will also show you the names of the columns of the air quality dataframe which you will need to use in SQL commands. (b) assign airquality to an object "air" (c) what is the data type of air? You must use a simple command to reveal the data type (3 points). 4. (a) Using saldf(), run an SQL select command that calculates the average level of ozone across all records. Assign the resulting value into a variable (average ozone) and (b) print it out in the console (2 points). 1 Submit the output of your runs. Don't forget that the code file you submit for credit must contain full line-by-line comments as well as at least one block comment at the top describing what is going on. Don't forget to cite your sources if you borrow code fragments £1..... 5. Again using saldf), run another SQL command that selects all of the records from air quality where the value of ozone is higher than the average. Note that it is possible to combine steps 4 and 5 into a single SQL command - those who are familiar with SQL syntax and usage should attempt to do so (1 point). 6. (a) Refine step 5 to write the result table into a new R data object called "newAQ." (b) Then run a command to reveal what type of object newAQ is, (c) another command to show what its dimensions are (i.e., how many rows and columns), and (d) a head() command to show the first few rows (4 points: one point per task). 7. Steps above was done using a SQL way. Now, repeat steps 4, 5 and 6 in an R way, using R commands including str, mean, head, dim, which, and tapply, which is a more "R" like way to do the analysis (7 points (a through g below): one point per each step). # Repeat step 4: calculates the average level of ozone across all records. #(a) Exclude Missing Values from calculating "Ozone" mean and assign the resu Lt to "average ozone": Hint:use na.rm # (b) print the result (average ozone) # Repeat step 5 # (c) select rows with bigger values than the average ozone value ######wrong approach: data$Ozone > meanOzone #[1] FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #[23] FALSE FALSE ################### wwwwwww # (d) Repeat step 6 # only keep the rows in which the Ozone values are higher than the average, a nd write the result table into a new R data object called "newAQ2" # (e) reveal what type of object newAQ2 is # (f) reveal the number of rows, then reveal the number of columns # (g) show the first few rows of "newAQ2"