1. You must find a dataset that meets the following criteria: a.The dataset must have at least 100 rows of data. Do not cut it downto 100 rows if it’s larger; it does not make the project any harderto have more data and only makes it better in terms of analysis. b.The dataset must have a time variable such as an exact date, day ofweek, month, year, etc. i. There are possibly only 100 rows of dataso make sure the dates are not all the same month or the same year.It needs to vary in order to plot this as a time series. c. Thedataset needs numeric and categorical variables included. i. Itneeds to include at least 3 numeric variables and 2 categoricalvariables plus the time series variable. Try to not includecategorical variables with too many individual categories ( likestate or country).
2. Steps: a. Summarize the categorical variables by: i.Calculating the counts of each category of the variable along withthe percentages ii. Plotting the percentages for each variable in ahistogram or pie chart iii. Create a cross tabulation of thecategorical variables and analyze what this shows you about yourdata set. iv. Analyze what these show you about your data set. b.Summarize the numeric variables by: i. Calculating the: 1. Mean 2.Median 3. Mode 4. IQR 5. Range 6. Standard deviation 7. Skewnessii. Also create a histogram and boxplot for each numeric variableiii. Analyze what these the summary measures and graphs tell youabout your data set. iv. Create scatterplots comparing differentnumeric variables and analyze what the relationship shows. Addtrendlines in them. v. Plot the numeric variables that make senseto plot as a time series and analyze what it shows.
1. You must find a dataset that meets the following criteria: a. The dataset must have at least 100 rows of data. Do not
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am