Please help me with the following python pandas practice! Will rate good answers! Q1. Suppose you live in New Jersey and

Post by **answerhappygod** » Wed Mar 30, 2022 9:18 am

Please help me with the following python pandas practice! Will
rate good answers!

: Please Help Me With The Following Python Pandas Practice Will Rate Good Answers Q1 Suppose You Live In New Jersey And 1 (72.47 KiB) Viewed 44 times

: Please Help Me With The Following Python Pandas Practice Will Rate Good Answers Q1 Suppose You Live In New Jersey And 2 (52.53 KiB) Viewed 44 times

Q1.
Suppose you live in New Jersey and you only survey players from
the three closest teams:
- New York Knicks (`'NYK'`)
- Brooklyn Nets (`'BRK'`)
- Philadelphia 76ers (`'PHI'`)
Assign `convenience_sample` to a subset of `full_data` that
contains only the rows for players on one of these three teams.
code: convenience_sample =
...
Q2.
Assign `convenience_stats` to an array of the mean `'Points'`
and mean `'Salary'` of your convenience sample. Since they're
computed on a sample, these are called *sample means*.
*Hint*: It's fine to draw two histograms as well as assign the
variable `convenience_stats`.
code: convenience_stats = ...
Homework 4: Simulation, Sampling, and Hypothesis Testing , # please don't change this cell, but do make sure to run it import pandas as pd import matplotlib.pyplot as plt import numpy as np [1] ✓ 1.25 Python 1. Sampling with NBA Data In this question, we'll use our familiar player and salary data from the 2015-16 NBA season to get some practice with sampling. Run the cells below to load the player and salary data, which come from different DataFrames, and to merge them into a single DataFrame, indexed by player. DE DE DB .. De player_data = pd.read_csv ("data/player_data.csv").set_index("Name') salary_data = pd.read_csv("data/salary_data.csv").set_index('PlayerName') full_data = salary_data.merge(player_data, left_index=True, right_index=True) full_data Python = = [2] [ 0.55 Salary Age Team Games Rebounds Assists Steals Blocks Turnovers Points Kobe Bryant 23500000 36 LAL 35 199 197 47 7 128 782 Amare Stoudemire 23410988 32 TOT 59 329 45 29 38 78 680 Joe Johnson 23180790 33 BRK 80 384 292 59 14 137 1154 Carmelo Anthony 22458401 30 NYK 40 264 122 40 17 89 966 Dwight Howard 21436271 29 HOU 41 431 50 28 53 115 646 Sim Bhullar 29843 22 SAC 3 1 1 0 1 0 2 David Stockton 29843 23 SAC 3 2 9 2 0 0 4 David Wear 29843 24 SAC 2 2 1 0 0 Andre Dawkins f woo 29843 23 MIA 4 4 2 2 1 w o o 0 1 Vander Blue 14409 22 LAL 2 9 8 3 0 6 22 492 rows x 10 columns
We'll start by creating a function called compute_statistics that takes as input a DataFrame with two columns, ' Points' and 'Salary', and then: • draws a histogram of 'Points', • draws a histogram of 'Salary', and • returns a two-element array containing the mean’Points' and mean 'Salary'. Run the cell below to define the compute_statistics function, and a helper function called histograms. Don't worry about how this code works, and please don't change anything. # Don't change this cell, just run it. def histograms (df): points = df.get('Points').values salaries = df.get('Salary').values a = plt.figure(1) plt.hist(points, density=True, alpha=0.5, color='blue', ec='w', bins=np.arange(e, 2500, 50)) plt.title('Distribution of Points) 5 = plt.figure(2) plt.hist(salaries, density=True, alpha=0.5, color='blue', ec='w', bins=np.arange(0, 3.5 * 10**7, 2.5 * 10**6)) plt.title( 'Distribution of Salaries') def compute_statistics(points_and_salary_data, draw=True): if draw: histograms (points_and_salary_data) points = np.average (points_and_salary_data.get('Points').values) salary = np.average (points_and_salary_data.get('Salary').values) avg_points_salary_array = np.array([points, salary]) return avg_points_salary_array 1 Python We can use this compute_statistics function to show the distribution of 'Points' and 'Salary' and compute their means, for any collection of players. Run the next cell to show these distributions and compute the means for all NBA players. Notice that the array containing the mean 'Points' and mean’Salary' values is displayed before the histograms, and the numbers are given in scientific notation. full_stats = compute_statistics(full_data) full_stats ] Python
array([5.00071138e+02, 4.26977577e+06]) Distribution of Points 0.0025 0.0020 0.0015 0.0010 0.0005 0.0000 0 500 1000 1500 2000 2500 le-7 Distribution of Salaries 2.00 1.75 1.50 125 100 0.75 0.50 0.25 0.00 0.0 0.5 10 15 2.0 2.5 3.0 le7
Now, imagine that instead of having access to the full population of NBA players, we had only gotten data on a smaller subset of the players, or a sample. For 492 players, it's not so unreasonable to expect to see all the data, but usually we aren't so lucky. Instead, we often make statistical inferences about a large underlying population using a smaller sample. A statistical inference is a statement about some characteristic of the underlying population, such as "the average salary of NBA players in 2014 was $3 million". You may have heard the word "inference" used in other contexts. It's important to keep in mind that statistical inferences can be wrong. A common strategy for inference using samples is to estimate parameters of the population by computing the same statistics on a sample. This strategy sometimes works well and sometimes doesn't. The degree to which it gives us useful answers depends on several factors. One very important factor in the utility of samples is how they were gathered. Let's look at some different sampling strategies. B... O Convenience sampling One sampling methodology, which is generally a bad idea, is to choose players who are somehow convenient to sample. For example, you might choose players from a team that's near your house, since it's easier to survey them. This is called, somewhat pejoratively, convenience sampling.