Please help me with the following python pandas practice! Will rate good answers! **Question 1.** You'd like to do a hyp

Post by **answerhappygod** » Wed Mar 30, 2022 9:21 am

Please help me with the following python pandas practice! Will
rate good answers!

: Please Help Me With The Following Python Pandas Practice Will Rate Good Answers Question 1 You D Like To Do A Hyp 1 (34.38 KiB) Viewed 25 times

: Please Help Me With The Following Python Pandas Practice Will Rate Good Answers Question 1 You D Like To Do A Hyp 2 (47.26 KiB) Viewed 25 times

**Question 1.** You'd like to do a hypothesis
test to determine whether your model is accurate in describing your
data. However, there are six categories of M&M colors: red,
orange, yellow, green, brown, and blue, so you're a little confused
about which test statistic to use here. Which of the following is
**not** a reasonable choice of test statistic? Save your choice in
the variable `unreasonable_test_statistic`. You may only choose
one.
1. The total variation distance between the theoretical
distribution (expected proportion of colors) and the empirical
distribution (actual proportion of colors).
2. The sum of the absolute difference between the theoretical
distribution (expected proportion of colors) and the empirical
distribution (actual proportion of colors).
3. The absolute difference between the sum of the theoretical
distribution (expected proportion of colors) and the sum of the
empirical distribution (actual proportion of colors).
code: unreasonable_test_statistic =
...
**Question 2.** We'll use the TVD, i.e. total
variation distance, as our test statistic. Below, complete the
implementation of the function `total_variation_distance`, which
takes in two distributions (stored as arrays) as arguments and
returns the total variation distance between the two arrays.
Then, use the function `total_variation_distance` to determine
the TVD between the empirical color distribution you observed and
the theoretical color distribution. Assign this TVD to
`observed_tvd`.
code:
def total_variation_distance(first_distrib,
second_distrib):
'''Computes the total variation distance between two
distributions.'''
...
observed_tvd = ...
observed_tvd
**Question 3.** Now we'll calculate 5000
simulated TVDs to see what a typical TVD between an empirical
distribution and the theoretical distribution would look like if
our model were accurate. Since our real-life data includes 33,335
M&Ms, in each trial of the simulation, we'll draw 33,335
M&M's at random from our theoretical distribution, then
calculate the TVD between the color distribution from this sample
and the theoretical color distribution. Store these 5000 simulated
TVDs in an array called `simulated_tvds`.
code:
simulated_tvds = ...
# Visualize the distribution of TVDs with a
histogram
pd.DataFrame().assign(TVD =
simulated_tvds).plot(kind='hist', density=True,
ec='w');
**Question 4.** Now, we check the p-value of
our claim by computing the proportion of times in our simulation
that we saw a TVD greater than or equal to our observed TVD. Assign
your result to `color_p_value`.
Additionally, conclude whether we should reject the null
hypothesis at the standard 0.05 significance level. Set the
variable `color_null` below to `True` if you think your model is
plausible or `False` if you think the null hypothesis should be
rejected.
code:
color_p_value = ...
color_null = ...
color_p_value, color_null
It turns out that the different colors of the popular candy M&M's are made separately by different machines, and then combined into bags for sale. Each bag contains 6 colors (red, orange, yellow, green, brown, and blue). You're curious whether all six of the colors are equally represented in each bag, so you get to work eating your way through an absurd number of bags of M&M's. Your data is below. Each row represents one bag, and each column represents a color. You've counted how many of each color were in each bag. m = m = pd.read_csv("data/m&m.csv") m 2] ✓ 0.55 Python red orange yellow green brown blue 0 10 15 11 7 18 10 1 5 12 17 15 10 9 2 16 11 15 11 9 9 11 شي د ه من 3 15 00 8 13 16 7 13 4 11 14 20 8 7 11 463 11 11 12 13 11 12 464 17 10 8 11 12 13 465 9 14 12 10 15 12 466 12 14 11 10 10 16 467 11 8 8 12 13 15 12 468 rows x 6 columns
Imagine dumping all 468 bags together and then separating all those M&M's by color to get a distribution of M&Ms by color. The array below represents the actual proportions of M&M's of each color (the order is red, orange, yellow, green, brown, blue). This represents an empirical distribution because it's based on the data that you actually observed # You don't have to know how this code works. # It's summing up each column in the DataFrame above to find the total number of each color, # and dividing the total number of each color by the total number of all colors combined. empirical_color_distribution = m.to_numpy().sum(axis=8) / m.to_numpy().sum() empirical_color_distribution 0.3s 3] Python array([0.16748163, 0.16496175, 0.17063147, 0.15902205, 0.1700615 , 0.16784161]) Your original belief is that all colors are distributed uniformly overall. The array below represents the proportions of M&M's of each color, according to your model. This is a theoretical probability distribution because it's based on your theoretical model. DE DE DO ... 目 theoretical_color_distribution = np.array([1/6, 1/6, 1/6, 1/6, 1/6, 1/6]) theoretical_color_distribution ✓ 0.3s ] Python array([0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667])

Answer Happy

Please help me with the following python pandas practice! Will rate good answers! Question 1. You'd like to do a hyp

Please help me with the following python pandas practice! Will rate good answers! Question 1. You'd like to do a hyp