Note:only need answer for 4, 5, 6, 7,11,12 In this Module 2 Discussion, we shall discuss how to use R to obtain informat
Posted: Thu Jul 14, 2022 2:18 pm
Note:only need answer for 4, 5, 6, 7,11,12
In this Module 2 Discussion, we shall discuss how to use R toobtain information by exploring, cleaning, and preprocessing thedata. The following is a kind of checklist of frequent steps indata preparation. More precisely, they are also typical steps in“cleansing” data. Such steps include (at least):
No.
Steps
Rfunctions
1
Loading and looking at the dataset in R
2
Identify missing values
3
Identify outliers
4
Check for overall plausibility and errors (e.g, typos)
5
Identify highly correlated variables
6
Identify variables with (nearly) no variance
7
Identify variables with strange names or values
8
Check variable classes (eg. Characters vs factors)
9
Remove/transform some variables (maybe your model does not likecategorial variables)
10
Rename some variables or values (especially interesting if largenumber)
11
Check some overall pattern (statistical/numericalsummaries/graphical illustrations)
12
Center/scale variables
In view of the above steps, please scan through the threeexamples (Example 1,2,3) in Data Mining and Business Analytics withR Chapter 2 and Data Mining for Business Analytics: Concepts,Techniques, and Applications in R section 2.4 (found in thisweek's Reading & Resources) to find and then fill in theblanks in the above table for those R functions we can use tohandle these steps, respectively. For example, you may putread.csv() and view() in the first row as they are the ways torealize that specific step. You may also refer to some openresources to find relevant R functions to fill in those blanks andeach blank can have multiple R functions as answers.
In this Module 2 Discussion, we shall discuss how to use R toobtain information by exploring, cleaning, and preprocessing thedata. The following is a kind of checklist of frequent steps indata preparation. More precisely, they are also typical steps in“cleansing” data. Such steps include (at least):
No.
Steps
Rfunctions
1
Loading and looking at the dataset in R
2
Identify missing values
3
Identify outliers
4
Check for overall plausibility and errors (e.g, typos)
5
Identify highly correlated variables
6
Identify variables with (nearly) no variance
7
Identify variables with strange names or values
8
Check variable classes (eg. Characters vs factors)
9
Remove/transform some variables (maybe your model does not likecategorial variables)
10
Rename some variables or values (especially interesting if largenumber)
11
Check some overall pattern (statistical/numericalsummaries/graphical illustrations)
12
Center/scale variables
In view of the above steps, please scan through the threeexamples (Example 1,2,3) in Data Mining and Business Analytics withR Chapter 2 and Data Mining for Business Analytics: Concepts,Techniques, and Applications in R section 2.4 (found in thisweek's Reading & Resources) to find and then fill in theblanks in the above table for those R functions we can use tohandle these steps, respectively. For example, you may putread.csv() and view() in the first row as they are the ways torealize that specific step. You may also refer to some openresources to find relevant R functions to fill in those blanks andeach blank can have multiple R functions as answers.