Page 1 of 1

In this Module 2 Discussion, we shall discuss how to use R to obtain information by exploring, cleaning, and preprocessi

Posted: Thu Jul 14, 2022 2:17 pm
by answerhappygod
In this Module 2 Discussion, we shall discuss how to use R toobtain information by exploring, cleaning, and preprocessing thedata. The following is a kind of checklist of frequent steps indata preparation. More precisely, they are also typical steps in“cleansing” data. Such steps include (at least):
No.
Steps
R functions
1
Loading and looking at the dataset in R
2
Identify missing values
3
Identify outliers
4
Check for overall plausibility and errors (e.g, typos)
5
Identify highly correlated variables
6
Identify variables with (nearly) no variance
7
Identify variables with strange names or values
8
Check variable classes (eg. Characters vs factors)
9
Remove/transform some variables (maybe your model does not likecategorial variables)
10
Rename some variables or values (especially interesting if largenumber)
11
Check some overall pattern (statistical/numericalsummaries/graphical illustrations)
12
Center/scale variables
In view of the above steps, please scan through the threeexamples (Example 1,2,3) in Data Mining and Business Analytics withR Chapter 2 and Data Mining for Business Analytics: Concepts,Techniques, and Applications in R section 2.4 (found in this week'sReading & Resources) to find and then fill in the blanks in theabove table for those R functions we can use to handle these steps,respectively. For example, you may put read.csv() and view() in thefirst row as they are the ways to realize that specific step. Youmay also refer to some open resources to find relevant R functionsto fill in those blanks and each blank can have multiple Rfunctions as answers.