Page 1 of 1

import numpy as np import pandas as pd # allow output to span multiple output lines in the console pd.set_option('displa

Posted: Fri May 20, 2022 10:31 am
by answerhappygod
import numpy as np
import pandas as pd
# allow output to span multiple output lines in the
console
pd.set_option('display.max_columns', 500)
#
# read the data
#
df =
pd.read_csv("https://raw.githubusercontent.com/fivet ... -grads.csv")
#
# series
#
###############################################################

# In your answers to all remaining problems you can assume
# that df is defined, and has the columns of the dataframe df
# just defined.
###############################################################
#@ I
# Compute a series containing the values for major
'MICROBIOLOGY'.
#@ J
# Compute the total number of people represented in the
# data set (in other words, sum the values in the 'Total'
column).
# Your result should be a single number.
#@ K
# Compute the overall fraction of women
# Your result should be a number between 0.55 and 0.60
#@ L
# Compute the major with the highest value of ShareWomen
# Your result should be a single string value.
#@ M
# Compute the median earnings for all the majors with
# share of women > 90%. Your result should be a series
# that begins like this:
# Major
# MEDICAL ASSISTING SERVICES 42000
# SPECIAL NEEDS EDUCATION 35000
# ELEMENTARY EDUCATION 32000
#@ N
# Compute a series containing the median earnings for all the
# majors with share of women < 15%. Your result should
begin
# like this:
# Major
# PETROLEUM ENGINEERING 110000
# MINING AND MINERAL ENGINEERING 75000
# NAVAL ARCHITECTURE AND MARINE ENGINEERING 70000
#@ O
# Compute the ratio of the median of the highest earning
major
# to the median of the lowest earning major. Your output
should
# be a single number, and it should be close to 5.
#@ P
# Compute the top 10 majors by median earnings. For each of
these
# majors, your result should contain the median, .25 percentile,
and
# .75 percentile earnings. Sort the result by median earnings,
largest first.
# Your result should be a data frame that begins like this:
# Median P25th P75th
# Major
# PETROLEUM ENGINEERING 110000 95000 125000
# MINING AND MINERAL ENGINEERING 75000 55000 90000
# METALLURGICAL ENGINEERING 73000 50000 105000
# NAVAL ARCHITECTURE AND MARINE ENGINEERING 70000 43000 80000
#@ Q
# Repeat the previous problem, but compute the 10 majors with
the
# lowest median earnings, and sort with lowest meadian salary
first
#@ R
# For each major, compute the fraction of people who have a
non-college
# job (in other words, a job not requiring a college degree).
# Your result should contain only the top 10 majors,
# sorted in decreasing order by fraction of people. Your
result
# should be a series that begins like this:
# Major
# COSMETOLOGY SERVICES AND CULINARY ARTS 0.702569
# NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
0.697070
# ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
0.681314
#@ S
# For each major category, compute the total number of people
# in that category. Your result should
# be sorted by number of people, in descending order.
# You will need the 'Total' column.
# Your result should be a series that begins like this:
# Major_category
# Business 1302376.0
# Humanities & Liberal Arts 713468.0
#@ T
# For each major category, compute the fraction of people
# associated with that category. Order your output by
fraction
# of people, in decreasing order.
# Your result should be a series that begins like this:
# Major_category
# Business 0.192328
# Humanities & Liberal Arts 0.105361
# Education 0.082569