Page 1 of 1

5 ## Question 1 6 data <- read.csv ("week1_cincy_crimes.csv") 7 head(data) 8 head(data, n = 10) 9 10 ## Question 2 11 st

Posted: Wed Apr 27, 2022 10:36 am
by answerhappygod
5 Question 1 6 Data Read Csv Week1 Cincy Crimes Csv 7 Head Data 8 Head Data N 10 9 10 Question 2 11 St 1
5 Question 1 6 Data Read Csv Week1 Cincy Crimes Csv 7 Head Data 8 Head Data N 10 9 10 Question 2 11 St 1 (26.62 KiB) Viewed 60 times
5 Question 1 6 Data Read Csv Week1 Cincy Crimes Csv 7 Head Data 8 Head Data N 10 9 10 Question 2 11 St 2
5 Question 1 6 Data Read Csv Week1 Cincy Crimes Csv 7 Head Data 8 Head Data N 10 9 10 Question 2 11 St 2 (37.63 KiB) Viewed 60 times
1. Import the data set into RStudio. Show the code to import the
data and then display
the first 10 rows of data in the console.
2. Examine the structure of the data set.
3. Do the variable names need changed/edited? If so, how would you
change them?
4. Do any variable types need changed? Explain why or why not, and
change any
variable types as you see fit.
5. How many missing values are present per column? Would you remove
an entire
observation if it contained a missing value? Why or why not? Give a
good rationale
for your answer.
Data cleaning
6. Look at unique values for every column. Do values in a column
need combined,
relabeled, or removed? (e.g., Are there multiple ways that a column
labels missing
values or genders? Should any values be removed or recoded?) Show
your process for
modifying values and your rationale for doing so. You will
definitely spend a couple
hours on this step.
7. Are there any outliers or aberrant values in the numeric
columns? How do you
know? Do you remove or recode them? Show your process for modifying
values and
your rationale for doing so. (You should leverage information from
other
analytics/statistics/quantitative courses you’ve taken either in
the Business Analytics
program or elsewhere throughout your education.)
8. Take care of any missing values. Do you keep them in the data
set, remove
observations, impute missing values, or use some other procedure?
Show your
processes and any rationale for doing so.
EDA (Exploratory Data Analysis)
9. Show appropriate visualizations or summaries for all character
variables. Do any
insights appear as a result of these?
10. Show appropriate visualizations or summaries for all numeric
variables. Do any
insights appear as a result of these?
5 ## Question 1 6 data <- read.csv ("week1_cincy_crimes.csv") 7 head(data) 8 head(data, n = 10) 9 10 ## Question 2 11 str(data) 12 13 ##Question 3 14 names (data) <- c("opening" ,"day_of_week" 15 names (data) 16 17 ## Question 4 18 19 ##Question 5 20 sum(is.na(data)) 21 colsums (is.na (data)) 22 | "victim_gender", "total_number_victims", "total_suspects")
> data <- read.csv ("week1_cincy_crimes.csv") > head(data) instanceid closed opening dayofweek victim_gender 1 92A296 AB-D1B7-40CE-BD96-209CFF141FDA J--CLOSED <NA> SATURDAY FEMALE 2 44ACB102-5B1D-40F8-9E2B-1F823A26705D Z--EARLY CLOSED <NA> THURSDAY FEMALE 3 2 CED4B80-3AB7-46DF-BBD9-3531A0C6727A D--VICTIM REFUSED TO COOPERATE <NA> TUESDAY FEMALE 4 EEB 41765-CBC3-476C-BDBE-4273B4EOCC7E J--CLOSED <NA> WEDNESDAY FEMALE 5 F4622DF5-8274-4290-ABOE-73CB9A720905 J--CLOSED <NA> TUESDAY FEMALE 6 EF456 EDO-031E-4171-8C96-29CF91BC9A9B Z--EARLY CLOSED <NA> SUNDAY FEMALE totalnumbervictims totalsuspects 1 1 1 2 1 NA 1 1 NA 1 NA 6 2 NA опе ме 'M 1 MMM
VU + WNP > head(data, n = 10) instanceid closed opening dayofweek victim_gender 1 92A296AB-D1B7-40CE-BD96-209CFF141FDA J--CLOSED <NA> SATURDAY FEMALE 2 44ACB102-5B1D-40F8-9E2B-1F823A26705D Z--EARLY CLOSED <NA> THURSDAY FEMALE 3 2 CED4B80-3 AB 7-46DF-BBD9-3531A0C6727A D--VICTIM REFUSED TO COOPERATE <NA> TUESDAY FEMALE 4 EEB 41765-CBC3-476C-BDBE-4273B4E0CC7E J--CLOSED <NA> WEDNESDAY FEMALE 5 F4622DF5-8274-4290-ABOE-73CB 9A720905 J--CLOSED <NA> TUESDAY FEMALE 6 EF456 EDO-031E-4171-8C96-29CF91BC9A9B Z--EARLY CLOSED <NA> SUNDAY FEMALE 7 0859E500-4543-469D-910E-D14F603 AB 5BC H--WARRANT ISSUED <NA> TUESDAY MALE 8 9B091265-0352-4198-A19E-B960BDE15091 Z--EARLY CLOSED <NA> WEDNESDAY <NA> 9 D2DAF 74C-1991-4E51-B 81C-6BE79F7DAOF 7 Z--EARLY CLOSED <NA> FRIDAY <NA> < 10 43EEB 437-DF03-47DO-AB01-951B 9 EBBFA04 J--CLOSED <NA> WEDNESDAY FEMALE totalnumbervictims totalsuspects 1 1 1 NA 1 1 NA 1 NA 6 2 NA 1 2 NA 1 10 1 NA Homoni 'M 1 M MM 1M 1M
. > ## Question 2 > str(data) data.frame': 21153 obs. of 7 variables: $ instanceid : chr "92A296AB-D1B7-40CE-BD96-209CFF141FDA" "44ACB102-5B1D-40F8-9E2B-1F823A26705D" "2CED4B80-3 AB 7-46DF-BBD9-3531A0C6727A" "EEB 41765-CBC3-476C-BDBE-4273B4E0CC7E" $ closed : chr "J--CLOSED" "Z--EARLY CLOSED" "D--VICTIM REFUSED TO COOPERATE" "J--CLOSED" $ opening : chr NA NA NA NA $ dayofweek : chr "SATURDAY" "THURSDAY" "TUESDAY" "WEDNESDAY" $ victim_gender chr "FEMALE" "FEMALE" "FEMALE" "FEMALE" $ totalnumbervictims: int 1 1 1 1 1 2 1 2 1 1 ... $ total suspects : int 1 NA 1 NA NA NA 1 NA 1 NA ... > ##Question 3 > names (data) <- c("opening" ,"day_of_week" "victim_gender", "total_number_victims", "total_suspects") > names (data) [1] "opening" "day_of_week" "victim_gender" "total_number_victims" [5] "total_suspects" NA NA > ##Question 5 > sum(is.na(data)) [1] 33636 > colsums (is.na (data)) opening day_of_week victim_gender total_number_victims total_suspects 0 487 19562 363 3165 NA NA 20 10039