Econometrics using R
baroda.dta
install.packages("wooldridge")library("wooldridge")
# Q1data(k401k)help(k401k)mean(k401k$prate)mean(k401k$mrate)lm_prate <- lm(prate~mrate, data = k401k)summary(lm_prate)predict(lm_prate, newdata = data.frame(mrate=3.5))
# Q2data(hprice1)lm_hp <- lm(price~sqrft+bdrms, data = hprice1)summary(lm_hp)0.12844*140+15.19819predict(lm_hp, newdata = data.frame(sqrft=2438,bdrms=4))
# Q3data(wage2)delta.til <- coef(lm(IQ~educ, data = wage2))[2]beta1.til <- coef(lm(lwage~educ, data = wage2))[2]beta1.hat <- coef(lm(lwage~educ+IQ, data = wage2))[2]beta2.hat <- coef(lm(lwage~educ+IQ, data = wage2))[3]beta1.tilbeta1.hat+beta2.hat*delta.til
# Q4data(vote1)lm_vote <- lm(voteA~log(expendA)+log(expendB)+prtystrA, data =vote1)summary(lm_vote)# theta=beta1+beta2# beta1=theta-beta2vote1$expend_diff <- log(vote1$expendB)-log(vote1$expendA)lm_vote1 <- lm(voteA~log(expendA)+expend_diff+prtystrA, data =vote1)summary(lm_vote1)
# Q5data(jtrain2)sum(jtrain2$train)max(jtrain2$mostrn)lm_train <- lm(train~unem74+unem75+age+educ+black+hisp+married,data = jtrain2)summary(lm_train)lm_train1 <-glm(train~unem74+unem75+age+educ+black+hisp+married, family = binomial(link = "probit"), data = jtrain2)summary(lm_train1)require(lmtest)lrtest(lm_train1)lm_train2 <- lm(unem78~train, data = jtrain2)summary(lm_train2)lm_train3 <- glm(unem78~train, family = binomial(link ="probit"), data = jtrain2)summary(lm_train3)predict(lm_train2, newdata = data.frame(train = 0), type ="response")predict(lm_train2, newdata = data.frame(train = 1), type ="response")predict(lm_train3, newdata = data.frame(train = 0), type ="response")predict(lm_train3, newdata = data.frame(train = 1), type ="response")
lm_train4 <-lm(unem78~train+unem74+unem75+age+educ+black+hisp+married, data =jtrain2)lm_train5 <-glm(unem78~train+unem74+unem75+age+educ+black+hisp+married, family = binomial(link = "probit"), data = jtrain2)predict4 <- predict(lm_train4, type = "response")predict5 <- predict(lm_train5, type = "response")cor(predict4,predict5)plot(predict4,predict5)
jtrain2c <- jtrain2jtrain2c$train <- 1-jtrain2c$trainpredict6 <- predict(lm_train5, newdata = jtrain2c, type ="response")PE <- predict5-predict6PE[jtrain2$train==0] <- -PE[jtrain2$train==0]mean(PE)summary(lm_train4)
Q1. The following chart and accompanying text arereproduced from the “Why Australia: Benchmark Report 2021” preparedby Australian Trade Commission, Australian Government. “A bigspender on research and development: Australia’s annual grossdomestic expenditure on research and development (GERD) reachedA$34 billion in 2018–19. This places Australia alongside the UK,Singapore and France as one of the highest spenders on research anddevelopment (R&D). Australia’s trend in R&D is upwards.GERD rose by around 7% per year from 2000–01 to 2018–19 and it nowrepresents 1.8% of Australian GDP. This creates a pool of skilledresearchers who are globally competitive.”
(i) Discuss the purpose of Chart 1 and critically assesshow well it meets this purpose.
(ii) Would you recommend that the data be presented in adifferent way? If not, why not? If so, explain how you wouldpresent the data in a better way.
Q2. Use the data “htv” from the “Wooldridge” package inR to answer this question. The data set includes information onwages, education, parents’ education, and several other variablesfor 1,230 working men in 1991.
(i) Estimate the regression model 𝑒𝑑𝑢𝑐 = 𝛽0 + 𝛽1𝑚𝑜𝑡ℎ𝑒𝑑𝑢𝑐+ 𝛽2𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 + 𝛽3𝑎𝑏𝑖𝑙 + 𝛽4𝑎𝑏𝑖𝑙 2 + 𝑢 by OLS and report the resultsin the equation form. How much sample variation in 𝑒𝑑𝑢𝑐 isexplained? Interpret the coefficient on𝑚𝑜𝑡ℎ𝑒𝑑𝑢𝑐.
(ii) Find the value of 𝑎𝑏𝑖𝑙, call it 𝑎𝑏𝑖𝑙 ∗ , where 𝑒𝑑𝑢𝑐is minimized, holding other factors fixed. (1 mark)
(iii) Test the null hypothesis that 𝑒𝑑𝑢𝑐 is linearlyrelated to 𝑎𝑏𝑖𝑙 against the alternative that the relationship isquadratic. (1 mark)
(iv) Test H0: 𝛽1 = 𝛽2 against a two-sided alternative.(1 mark)
(v) Add the two college tuition variables 𝑡𝑢𝑖𝑡17 and𝑡𝑢𝑖𝑡18 to the regression and determine whether they are jointlystatistically significant. (1 mark)
Q3. This problem focuses on the impact of a computerassisted learning program (cal) on educational outcomes. Thisprogram is a computer-assisted learning program where children ingrade 4 are offered two hours of shared computer time per weekduring which they play games that involve solving math problemswhose level of difficulty responds to their ability to solve them.The data file “baroda.dta” contains the data. Use the “read_dta”function from the “haven” package to read the file in R.Observations are at the child level. “𝑐𝑎𝑙” indicates whether thechild was selected in the cal program. Implementation of theprogram was intended to be randomised among children in grade 4.The main outcome of interest is whether the intervention resultedin improvement in math test scores. Performance in math wasmeasured using 𝑝𝑟𝑒_𝑚𝑎𝑡ℎ𝑛𝑜𝑟𝑚 before implementation, and𝑝𝑜𝑠𝑡_𝑚𝑎𝑡ℎ𝑛𝑜𝑟𝑚, after the intervention. The tests scores have beennormalised to be standardised variables, as indicated by variablenames.
Note of caution: the program is implementedonly in grade 4 (grade is measured by the variable “𝑠𝑡𝑑”)
(i) Discuss the potential sources of selection bias andthe direction of the bias for such an education program. (1mark)
(ii) Using the standardised variables for tests scoresin math, check whether the randomisation has performed well. (1mark)
(iii) Estimate the ATE of the program in math. Can weinterpret the effect as causal? (1 mark)
(iv) Estimate the effect of the program on whetherchildren improved their math scores relative to what would havebeen expected relative to their initial scores. In order to dothis, estimate a specification in which the dependent variable isimprovement in math scores and in which you control for initialmath scores. Why would you want to do this? What can you concludewith respect to the likely effect of the program on math outcomes?(1 mark)
(v) Using a logit regression, estimate the propensityscore of program participation based on premath scores. Estimatethe effect of the program on improving math score adjusting for thepropensity score of participation. How does your estimate compareto the one obtained in (iv)?
World of research and development Size of circle reflects the relative amount of annual gross domestic expenditure on R\&D (GERD) in US\$ current prices and purchasing power parity terms Indicators except for Brazil, India and Indcnesia from the UNESCO Inst tute for Statistics (US). Cultural Organization, 2020, UIS Statistics; Austrade
Econometrics using R baroda.dta install.packages("wooldridge") library("wooldridge") # Q1 data(k401k) help(k401k) mean(k
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am