Please help me with the following python pandas practice! Will
rate good answers!
Introduction and preparation:
**Question 1.**
You want to determine if free or paid apps have a higher average
rating. Calculate the difference between the mean rating for all
`Paid` apps and the mean rating for all `Free` apps (do `Paid`
minus `Free`) and store the result in variable
`true_difference`.
code: true_difference = ...
**Question 2.**
Create a function that takes as input a DataFrame of apps with
columns `'Rating'` and `'Type'`, and returns the difference between
the mean rating for `Paid` apps and the mean rating for `Free` apps
(again, calculate `Paid` minus `Free`).
When called on input `playstore_apps`, the output should be the
same as `true_difference`, however, this function should work on
*any* DataFrame of apps, provided there are at least some `Paid`
apps and some `Free` apps.
code:
def mean_diff(app_df):
...
mean_diff(playstore_apps)
**Question 3.**
Let's suppose that as an app developer, you don't know the value
of `true_difference` because it is calculated from all the apps in
the data set, while you can only load 1000 apps at a time. You want
to look at 1000 random apps, sampled without replacement, to get a
representative sample of the full data set. Write a function called
`pick_1000` that simulates this. Specifically, the function should
take *no* arguments and should return a DataFrame of 1000 randomly
selected apps.
code:
def pick_1000():
"""Randomly select 1000 different apps
from Google Play Store."""
...
pick_1000()
#Now, even without access to the full
`playstore_apps` data set, you can get an idea of the difference
between mean ratings of `Paid` and `Free` apps, based on the 1000
in your random sample. The `mean_diff` function you wrote should be
able to calculate the difference in mean ratings for our random
sample.
mean_diff(pick_1000())
But what if you'd picked a different random 1000 apps for your
sample? Surely, you'd get a different answer, but how different?
Run the cell above a few times. You should get different results
each time. If not, check for a mistake in your `mean_diff` function
or your `pick_1000` function.
To answer this question of how the mean difference changes as
our sample changes, let's repeat our experiment.
**Question 4.**
500 times, randomly select 1000 apps and calculate the
difference of mean ratings between `Paid` and `Free` apps (do
`Paid` minus `Free`). Record the 500 differences of mean ratings in
an array called `experiment_differences`.
*Hint*: Feel free to use previously defined functions. First try
simulating 10 trials. Once you are sure you have that figured out,
change it to 500 trials. It may take about a minute to run with 500
trials.
code: experiment_differences =
...
**Question 5.**
When you ran your experiment 500 times, you got 500 different
estimates for the difference of mean ratings between `Paid` and
`Free` apps, stored in `experiment_differences`. These estimates
are statistics because they come from samples. Create a density
histogram showing the distribution of these statistics.
code: # Create your histogram
here.
**Question 6.**
Compute the average value of the 500 statistics in
`experiment_differences` and store your average in
`approximate_difference`. This average is an estimate of the
difference in mean ratings for the full data set, which is the
population parameter you're trying to approximate here.
code: approximate_difference =
...
**Question 7.**
Now you have an estimate for the difference in mean ratings
between `Paid` and `Free` apps, but you'd like to know how good of
an estimate it is. How far is `approximate_difference`, calculated
from your sample statistics, from `true_difference`, the parameter
calculated from the full `playstore_apps` population? Compute the
absolute difference between the two values and store it in the
variable `error`.
code: error = ...
Google Play Store Apps + Code + Markdown In this problem, we will work with the data set of Google Play Store Apps. This time, we'll pretend you're an app developer, looking to draw some insight from this data set to help you make a better app. # Run this cell to load the data playstore_apps = pd.read_csv('data/googleplaystore.csv') playstore_apps = ] 0 Category Rating Reviews ART_AND_DESIGN 4.1 159 ART_AND_DESIGN 3.9 967 1 App Photo Editor & Candy Camera & Grid & ScrapBook Coloring book moana U Launcher Lite - FREE Live Cool Themes, Hide ... Sketch - Draw & Paint Pixel Draw - Number Art Coloring Book Size Installs Type Price Content Rating Genres 19M 10000.0 Free 0 Everyone Art & Design 14M 500000.0 Free 0 Everyone Art & Design:Pretend Play 8.7M 5000000.0 Free 0 0 Everyone Art & Design 25M 50000000.0 Free 0 Teen Art & Design 2.8M 100000.0 Free 0 Everyone Art & Design:Creativity 2 4.7 87510 ART_AND_DESIGN ART_AND DESIGN 3 4.5 215644 4 ART_AND_DESIGN 43 967 9361 FR Calculator FAMILY 4.0 7 2.6M 500.0 Free 0 Education 9362 4.5 38 53M 5000.0 Free 0 Education Everyone Everyone Everyone Mature 17+ 9363 5.0 4 3.6M 100.0 Free 0 Sya9a Maroc - FR FAMILY Fr. Mike Schmitz Audio Teachings FAMILY The SCP Foundation DB fr nnn BOOKS_AND_REFERENCE iHoroscope - 2018 Daily Horoscope & Astrology LIFESTYLE Education 9364 4.5 Free 0 Books & Reference 114 Varies with device 1000.0 398307 19M 10000000.0 9365 4.5 Free 0 Everyone Lifestyle 9366 rows x 10 columns As a reminder, each row in the DataFrame is an app. We've cleaned the data a bit from the last time we've seen it removing rows with missing values. Let's set the index of the DataFrame to the app's name in order to be able to interpret what the rows represent more easily.
= playstore_apps = playstore_apps.set_index('App') playstore_apps ✓ 0.45 Python Category Rating Reviews Size Installs Type Price Content Rating Genres ART_AND_DESIGN 4.1 159 19M 0 0 10000.0 Free 500000.0 Free ART_AND_DESIGN 3.9 967 14M 0 App Photo Editor & Candy Camera & Grid & ScrapBook Coloring book moana U Launcher Lite - FREE Live Cool Themes, Hide Apps Sketch - Draw & Paint Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.7 87510 8.7M 5000000.0 Free 0 Everyone Art & Design Everyone Art & Design:Pretend Play Everyone Art & Design Teen Art & Design Everyone Art & Design:Creativity ART_AND_DESIGN 4.5 215644 25M 50000000.0 Free 0 ART_AND_DESIGN 4.3 967 2.8M 100000.0 Free 0 FR Calculator FAMILY 4.0 7 2.6M 500.0 Free 0 Education 4.5 38 53M 5000.0 Free 0 Everyone Everyone Everyone Education 5.0 4 3.6M 100.0 Free 0 Education Sya9a Maroc - FR FAMILY Fr. Mike Schmitz Audio Teachings FAMILY The SCP Foundation DB fr nnn BOOKS_AND_REFERENCE iHoroscope - 2018 Daily Horoscope & Astrology LIFESTYLE 4.5 Free 0 Mature 17+ Books & Reference 114 Varies with device 1000.0 398307 19M 10000000.0 4.5 Free 0 Everyone Lifestyle 9366 rows x 9 columns Suppose that as an app developer, you want to address the question of whether there is any significant difference in rating between free apps and paid apps. The only columns of data we'll need to answer this question are 'Rating' and 'Type', so let's keep just those by using .get() and passing in a list of columns. DE Dc D ... O playstore_apps = playstore_apps.get(['Rating', 'Type']) , ' playstore_apps Python ✓ 0.35
playstore_apps playstore_apps playstore_apps.get(['Rating', 'type']) 0.3s Rating Type 4.1 Free 3.9 Free App Photo Editor & Candy Camera & Grid & ScrapBook Coloring book moana U Launcher Lite – FREE Live Cool Themes, Hide Apps Sketch - Draw & Paint Pixel Draw - Number Art Coloring Book 4.7 Free 4.5 Free - 4.3 Free 4.0 Free - 4.5 Free FR Calculator Sya9a Maroc - FR Fr. Mike Schmitz Audio Teachings The SCP Foundation DB fr nnn iHoroscope - 2018 Daily Horoscope & Astrology 5.0 Free 4.5 Free 4.5 Free 9366 rows x 2 columns
Please help me with the following python pandas practice! Will rate good answers! Introduction and preparation: **Questi
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am