Page 1 of 1

Problem 1: Simulating Data We're going to let you in on a secret. The turtle data from the autograded assignment was sim

Posted: Mon May 09, 2022 6:58 am
by answerhappygod
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 1
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 1 (177.33 KiB) Viewed 25 times
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 2
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 2 (144.3 KiB) Viewed 25 times
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 3
Problem 1 Simulating Data We Re Going To Let You In On A Secret The Turtle Data From The Autograded Assignment Was Sim 3 (143.27 KiB) Viewed 25 times
Problem 1: Simulating Data We're going to let you in on a secret. The turtle data from the autograded assignment was simulated...fake data! Gasp! Importantly, simulating data, and applying statistical models to simulated data, are very important tools in data science. Why do we use simulated data? Real data can be messy, noisy, and we almost never really know the underlying process that generated real data. Working with real data is always our ultimate end goal, so we will try to use as many real datasets in this course as possible. However, applying models to simulated data can be very instructive: such applications help us understand how models work in ideal settings, how robust they are to changes in modeling assumptions, and a whole host of other contexts. a And in this problem, you are going to learn how to simulate your own data.
1. (a) A Simple Line Starting out, generate 10 to 20 data points for values along the x-axis. Then generate data points along the y- axis using the equation Y; = Bo + BiX;. Make it a straight line, nothing fancy. Plot your data (using ggplot!) with your x data along the X-axis and your y data along the y-axis. In the Markdown cell below the R cell, describe what you see in the plot. Tip: You can generate your x-data deterministically, e.g., using either a :b syntax or the seq() function, or randomly using something like runif() or rnorm(). In practice, it won't matter all that much which one you choose. In [5] : # your code here Type Markdown and LaTeX: a2
1. (b) The Error Component That is a perfect set of data points, but that is a problem in itself. In almost any real life situation, when we measure data, there will be some error in those measurements. Recall that our simple linear model is of the form: Y; = Bo + Bix; + Ej, = €; ~ N(0,0%) Add an error term to your y-data following the formula above. Plot at least three different plots (using ggplot!) with the different values of 02. How does the value of o2 affect the final data points? Type your answer in the Markdown cell below the R cell. Tip: To randomly sample from a normal distribution, check out the rnorm() function. In [5]: # Your Code Here Type Markdown and LaTeX: a2