Project One: Data Visualization, Descriptive Statistics, Confidence Intervals This notebook contains the step-by-step di

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899559
Joined: Mon Aug 02, 2021 8:13 am

Project One: Data Visualization, Descriptive Statistics, Confidence Intervals This notebook contains the step-by-step di

Post by answerhappygod »

Project One: Data Visualization, Descriptive Statistics,
Confidence Intervals
This notebook contains the step-by-step directions for Project
One. It is very important to run through the steps in order. Some
steps depend on the outputs of earlier steps. Once you have
completed the steps in this notebook, be sure to write your summary
report.
You are a data analyst for a basketball team and have access to
a large set of historical data that you can use to analyze
performance patterns. The coach of the team and your management
have requested that you use descriptive statistics and data
visualization techniques to study distributions of key performance
metrics that are included in the data set. These data-driven
analytics will help make key decisions to improve the performance
of the team. You will use the Python programming language to
perform the statistical analyses and then prepare a report of your
findings to present for the team’s management. Since the managers
are not data analysts, you will need to interpret your findings and
describe their practical implications.
There are four important variables in the data set that you will
study in Project One.
Variable
What does it represent?
pts
Points scored by the team in a game
elo_n
A measure of the relative skill level of the team in the
league
year_id
Year when the team played the games
fran_id
Name of the NBA team
The ELO rating, represented by the variable
elo_n, is used as a measure of the relative skill
of a team. This measure is inferred based on the final score of a
game, the game location, and the outcome of the game relative to
the probability of that outcome. The higher the number, the higher
the relative skill of a team.
In addition to studying data on your own team, your management
has assigned you a second team so that you can compare its
performance with your own team's.
Team
What does it represent?
Your Team
This is the team that has hired you as an analyst. This is the
team that you will pick below. See Step 2.
Assigned Team
This is the team that the management has assigned to you to
compare against your team. See Step 1.
Reminder: It may be beneficial to review the summary report
template for Project One prior to starting this Python script. That
will give you an idea of the questions you will need to answer with
the outputs of this script.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Step 1: Data Preparation & the Assigned Team
This step uploads the data set from a CSV file. It also selects
the assigned team for this analysis. Do not make any changes to the
code block below.
Click the block of code below and hit the Run
button above.
In [1] :import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import display, HTML
nba_orig_df = pd.read_csv('nbaallelo.csv')
nba_orig_df = nba_orig_df[(nba_orig_df['lg_id']=='NBA') &
(nba_orig_df['is_playoffs']==0)]
columns_to_keep =
['game_id','year_id','fran_id','pts','opp_pts','elo_n','opp_elo_n',
'game_location', 'game_result']
nba_orig_df = nba_orig_df[columns_to_keep]
# The dataframe for the assigned team is called
assigned_team_df.
# The assigned team is the Chicago Bulls from 1996-1998.
assigned_years_league_df =
nba_orig_df[(nba_orig_df['year_id'].between(1996, 1998))]
assigned_team_df =
assigned_years_league_df[(assigned_years_league_df['fran_id']=='Bulls')]

assigned_team_df = assigned_team_df.reset_index(drop=True)
display(HTML(assigned_team_df.head().to_html()))
print("printed only the first five observations...")
print("Number of rows in the data set =",
len(assigned_team_df))
## Step 2: Pick Your Team
In this step, you will pick your team. The range of years that you
will study for your team is <font
color='blue'><strong>2013-2015</strong></font>.
Make the following edits to the code block below:
1. <font color='red'> Replace
<strong>??TEAM??</strong> with your choice of team from
one of the following team names. </font>
<font color='blue'>*Bucks, Bulls, Cavaliers, Celtics,
Clippers, Grizzlies, Hawks, Heat, Jazz, Kings, Knicks, Lakers,
Magic, Mavericks, Nets, Nuggets, Pacers, Pelicans, Pistons,
Raptors, Rockets, Sixers, Spurs, Suns, Thunder, Timberwolves,
Trailblazers, Warriors, Wizards*</font>

Remember to enter the team name within single quotes. For example,
if you picked the Suns, then ??TEAM?? should be replaced with
'Suns'.
After you are done with your edits, click the block of code
below and hit the **Run** button above.
In [2] :# Range of years: 2013-2015 (Note: The line below
selects ALL teams within the three-year period 2013-2015. This is
not your team's dataframe.
your_years_leagues_df =
nba_orig_df[(nba_orig_df['year_id'].between(2013, 2015))]
# The dataframe for your team is called your_team_df.
# ---- TODO: make your edits here ----
your_team_df =
your_years_leagues_df[(your_years_leagues_df['fran_id']=='
your_team_df = your_team_df.reset_index(drop=True)()
display(HTML(your_team_df.head().to_html()))
print("printed only the first five observations...")
print("Number of rows in the data set =", len(your_team_df))
## Step 3: Data Visualization: Points Scored by Your Team
The coach has requested that you provide a visual that shows the
distribution of points scored by your team in the years 2013-2015.
The code below provides two possible options. Pick **ONE** of these
two plots to include in your summary report. Choose the plot that
you think provides the best visual for the distribution of points
scored by your team. In your summary report, you must explain why
you think your visual is the best choice.
Click the block of code below and hit the **Run** button above.

NOTE: If the plots are not created, click the code section and hit
the **Run** button again.
In [5] :
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
# load in data
your_team_df = pd.read_csv("nba_game.csv")
plt.title("Scatterplot of points scored by your team in 2013 to
2015",fontsize=18)
sns.regplot(your_team_df['year_id'],your_team_df['pts'],ci=None)

plt.show()
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply