See question 1 and its answer
according to that form question 2 in R programming
Consider the following experimental design definitions: simulations: Number of samples you repeatedly take - for all Part 4, Q2 we set this number equal to 10000, i.e., you have 10000 samples. If you have trouble understanding this, perhaps it is time to rewatch the lecture recordings/materials. n: Number of observations per sample, this will be given in the question as we will experiment with different values of n. PMF (Y): Is the probability mass function that the random variable y follows (please check Lecture 2 and Tutorial 2). Similar to n, we can experiment with different settings for PMF (Y). Random Variables RVs Y₁, Y2,..., YnPMF(Y): All the random variables in the sample (observation RVs) will follow the distribution set out by the PMF. Again, the number of observations n as well as the distribution PMF (Y) have not been set here but will be given in the questions. Question 1: Theoretical Set-up for the CLT (No Coding or Simulation here!) (2 Marks) Before simulating CLT, we must first establish what we would want to see from the simulation, i.e., what the theory tells us. Thus, we are going to set up the experiment here as well as ΣΥ, set up our expectation for the (1) Summation Distribution Y, and (2) Mean Distribution Y - n We will consider one of the possible set-ups for the distribution PMF(Y) as shown below. Additionally, we will also consider three different values for n, namely nsmall = 5, nMedium 30, nBig 100. = Simply, we would like to obtain the distribution for (1) and (2) with each pair of n, and PMF(Y) that we set here. Again, please revisit the lecture materials if you have any doubts since we have done a live presentation of this in our unit. Please put down your results up to five decimal places as we would like to compare this result with the simulation results later. y 1 2 3 4 5 Pr (Y= y) 0.35 0.05 0.15 0.05 0.4
√n(X₂-μ) + N(0,0²) According to the Question E(Y) = 1*0.35+2*0.05+3+0.15+4*0.05+5*0.4 = 3.2 E(Y²) Var(Y) = V(Y) = E(Y2) - (E(Y))² = 2.62 Now let us consider, Sn = 1 and Y = = -1/n Yi Using E(Y) and E(Y2) values, since, = 1 * 0.35 + 4* 0.05 +9* 0.15+ 160.05 + 25 * 0.4 = 0.35 +0.2 +1.35 +0.96 + 10 = 12.86 E(Y₂) = 3.2 V(Y₂) = 2.62 E(S)=nE(Y) = 3.2n i=1 V(Sn) = nV (Y) = 2.62n i=1 3.2n E(Y) = = 3.2 n 2.62n 2.62 V(Y): n² n = 10
Now, According to CLT, as n becomes large, Thus theory tells that, n=5 • n=30 n=100 d N (0,2.62) √n(Y₁-3.2) Sn-3.2nd, N(0,2.62) √T S₁-3.2nN(0,2.62n) - N(3.2n, 2.62n) Yi i=1 ▼ N(3.2, 2.62) Σy 4 N(16, 13.1) i=1 YN (3.2,0.524) ₁ N (96,78.6) Yi i=1 YN (3.2,0.0833) Σy 4 N(320, 262) i=1 YN (3.2,0.0262) Sn =
Question 2: Simulating the CLT result (NO LIBRARIES ALLOWED) (8 Marks) After finishing Question 1, you should have collected the theoretical results. In this question, you will use these theoretical results to compare with the simulation results and verify the CLT. As you should know by now, the CLT is based on the idea of repeated sampling. Thus, please simulate your results accordingly under the given PMF(Y) and the three sample sizes n for the two distributions (1) and (2). The number of pairings is the same with question 1 since we would like to compare simulations with theoretical values. For each pair of n, PMF(Y) under each distribution (1) and (2), you are required to display a histogram to represent the results of repeated sampling, and a curve to display the theoretical results from Question 1. Explain your findings and results (no more than 150 words). Instructions for plots (MUST FOLLOW): The marking for this question also includes the cleanliness of your plots (proper labels for axes, name of the plot must include the type of sampling distribution, and the sample size that you are using, e.g. Mean Distribution: n = 30 ). The theoretical values and simulated values need to be presented accordingly for ease of comparison - you must put these values in the legends. Instructions for codes (MUST FOLLOW): The code needs to be elegant (do not hard code) with enough comments describing what you want to do. Furthermore, the naming of the variables needs to make sense. If you need to use a chunk of code for more than one time, please write a function for it, we will deduct marks if you copy and paste your codes here and there. As specified from the beginning, please put your result with 5 decimal places so we can compare and assess the theoretical results of the CLT and its simulation.
See question 1 and its answer according to that form question 2 in R programming
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am