Longitudinal/Panel Data Analysis
Raymond Duch
University of Oxford
Nuffield College
raymond.duch@nuffield.ox.ac.uk
raymondduch.com/trinity10/paneldata
April 27, 2010
1 / 26
Readings
1
Gellman, Andrew and Jennifer Hill. 2007. Data Analysis Using
Regression and Multilevel/Hierarchical Models. Cambridge
University Press
2
Stata 11.0 Manual Longitudinal/Panel Data
3
Rabe-Hesketh, Sophia and Anders Skrondal. 2005. Multilevel
and Longitudinal Modeling Using Stata. Stata Press
2 / 26
What is longitudinal panel data?
1
Marriage of regression and time-series analysis
2
A broad cross-section of subjects observed over time
3
Individuals surveyed repeatedly over time (American National
Election Study; U.S. Panel Study of Income Dynamics)
4
Statistics compiled over time for a particular geo-political
entity (Divorce Rates and welfare rates collected annually from
U.S. States)
5
Statistics compiled on hospital patients over time
3 / 26
Modeling Panel Data
(Repeated) cross-sectional regression analysis generates the
following model
y
it
= α+ βx
it
+ �
it
(1)
y
it
= α+ x′
it
B+ �
it
(2)
1
Heterogeneity or uniqueness of subjects captured in �
it
2
The cross-sectional units (individuals, firms, cities) are
represented by i
3
Repeated time units are represented by t
4 / 26
Varying Intercept Model
y
it
= α
j
+ βx
it
+ �
it
(3)
Group D
Group C
Group B
Group A
Y
X
5 / 26
Varying Slope Model
y
it
= α+ β
j
x
it
+ �
it
(4)
Group C
Group B
Group A
Group D
Y
X
6 / 26
Varying Intercepts and Slopes Model
y
it
= α
j
+ β
j
x
it
+ �
it
(5)
Group C
Group B
Group A
Y
Group D
X
7 / 26
Data Preparation in Stata: Australian Smoking Study
1
data is available at
http://www.stat.columbia.edu/ gelman/arm/
2
variables: newid (identifies each unique respondent) sex
(1=female) parsmk (1=parents smoke) wave (identifies each
of 6 waves) smkreg (is respondent regular smoker)
8 / 26
. list
+-------------------------------------------+
| newid sex_1_f_ parsmk wave smkreg |
|-------------------------------------------|
1. | 1 1 0 1 0 |
2. | 1 1 0 2 0 |
3. | 1 1 0 4 0 |
4. | 1 1 0 5 0 |
5. | 1 1 0 6 0 |
|-------------------------------------------|
6. | 2 0 0 1 0 |
7. | 2 0 0 2 0 |
8. | 2 0 0 3 0 |
9. | 2 0 0 4 0 |
10. | 2 0 0 5 0 |
|-------------------------------------------|
11. | 2 0 0 6 0 |
12. | 3 1 0 1 0 |
13. | 3 1 0 2 0 |
14. | 3 1 0 3 0 |
15. | 3 1 0 4 0 |
|-------------------------------------------|
16. | 3 1 0 5 0 |
17. | 3 1 0 6 0 |
18. | 4 1 0 1 0 |
19. | 4 1 0 2 0 |
20. | 4 1 0 3 0 |
|-------------------------------------------|
21. | 4 1 0 4 0 |
22. | 4 1 0 5 0 |
23. | 4 1 0 6 0 |
24. | 5 0 0 1 0 |
25. | 5 0 0 2 0 |
|-------------------------------------------|
26. | 5 0 0 3 0 |
27. | 5 0 0 4 0 |
28. | 5 0 0 5 0 |
29. | 5 0 0 6 0 |
30. | 6 0 0 1 0 |
9 / 26
Smoking by Sex over Panel Waves
girls
boys
0
50
10
0
15
0
pr
op
or
tio
n
sm
ok
er
s
in
p
op
ul
at
io
n
1 2 3 4 5 6
wave
10 / 26
Modeling the Smoking Longitudinal Data
Pr(y
jt
= 1) = logit−1(β
0
+ β
1
psmoke
jt
+ β
2
female
jt
+ (6)
β
3
t + β
4
female
jt
∗ t + α
j
+ �
jt
), t = 1, ....T
j
, j = 1, ...., n. (7)
α
j
∼ N(µα, σ2α) (8)
11 / 26
Estimation with Gllamm in Stata
. use "e:\Oxford08\Department08\Trinity_Panel\Data\gelman\smoke_pub.dta", clear
.
. tsset newid wave
panel variable: newid (unbalanced)
time variable: wave, 1 to 6, but with gaps
delta: 1 unit
. gllamm smkreg parsmk wave, i(newid) link(logit) family(binom)
12 / 26
Estimation with Gllamm in Stata
gllamm model
log likelihood = -2074.7563
------------------------------------------------------------------------------
smkreg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
parsmk | 1.270422 .1998237 6.36 0.000 .8787746 1.662069
wave | .4195264 .0365132 11.49 0.000 .3479619 .4910909
_cons | -7.24026 .2742149 -26.40 0.000 -7.777711 -6.702808
------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (newid)
var(1): 13.679018 (.88531601)
------------------------------------------------------------------------------
13 / 26
Estimation with Gllamm in Stata: Incorporating Time Trend
.
. gen male_time=wave*(1-sex_1_f)
. gen female_time=wave*sex_1_f
. gen sex_time=wave*sex_1_f
. gllamm smkreg parsmk wave sex_time, i(newid) link(logit) family(binom)
14 / 26
Estimation with Gllamm in Stata: Incorporating Time Trend
.
. gen male_time=wave*(1-sex_1_f)
. gen female_time=wave*sex_1_f
. gen sex_time=wave*sex_1_f
. gllamm smkreg parsmk wave sex_time, i(newid) link(logit) family(binom)
15 / 26
Estimation with Gllamm in Stata: Incorporating Time Trend
number of level 1 units = 8730
number of level 2 units = 1760
Condition Number = 17.565231
gllamm model
log likelihood = -2071.4531
------------------------------------------------------------------------------
smkreg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
parsmk | 1.314832 .2278361 5.77 0.000 .8682812 1.761382
wave | .3598051 .0432529 8.32 0.000 .275031 .4445792
sex_time | .10706 .0424822 2.52 0.012 .0237965 .1903235
_cons | -7.263204 .2767673 -26.24 0.000 -7.805658 -6.72075
------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (newid)
var(1): 13.797342 (.90193295)
------------------------------------------------------------------------------
16 / 26
Estimation with Gllamm in Stata: Incorporating Time Trend
+--------------------------------------------+
| newid constant reffm1 inter_eb |
|--------------------------------------------|
1. | 1 -7.263204 -1.1592099 -8.422414 |
2. | 1 -7.263204 -1.1592099 -8.422414 |
3. | 1 -7.263204 -1.1592099 -8.422414 |
4. | 1 -7.263204 -1.1592099 -8.422414 |
5. | 1 -7.263204 -1.1592099 -8.422414 |
+--------------------------------------------+
. list newid constant reffm1 inter_eb in 1090/1095
+--------------------------------------------+
| newid constant reffm1 inter_eb |
|--------------------------------------------|
1090. | 202 -7.263204 -.76498347 -8.028188 |
1091. | 203 -7.263204 7.1519595 -.1112444 |
1092. | 203 -7.263204 7.1519595 -.1112444 |
1093. | 203 -7.263204 7.1519595 -.1112444 |
1094. | 203 -7.263204 7.1519595 -.1112444 |
|--------------------------------------------|
1095. | 203 -7.263204 7.1519595 -.1112444 |
+--------------------------------------------+
. list newid constant reffm1 inter_eb in 1160/1165
+--------------------------------------------+
| newid constant reffm1 inter_eb |
|--------------------------------------------|
1160. | 215 -7.263204 6.0917393 -1.171465 |
1161. | 215 -7.263204 6.0917393 -1.171465 |
1162. | 215 -7.263204 6.0917393 -1.171465 |
1163. | 215 -7.263204 6.0917393 -1.171465 |
1164. | 216 -7.263204 -1.4855779 -8.748782 |
|--------------------------------------------|
1165. | 216 -7.263204 -1.4855779 -8.748782 |
+--------------------------------------------+
17 / 26
Data Preparation in Stata: Growth Curve Modeling
1
data is available with following command: net from
http://www.stata-press.com/data/mlmus2/
2
variables: id (child identifier) weight (weight in Kg) age (age
in years) gender (1 male; 2 female)
18 / 26
. list
+-------------------------------------------------+
| id occ age weight brthwt gender |
|-------------------------------------------------|
1. | 45 1 .136893 5.171 4140 boy |
2. | 45 2 .657084 10.86 4140 boy |
3. | 45 3 1.21834 13.15 4140 boy |
4. | 45 4 1.42916 13.2 4140 boy |
5. | 45 5 2.27242 15.88 4140 boy |
|-------------------------------------------------|
6. | 258 1 .19165 5.3 3155 girl |
7. | 258 2 .687201 9.74 3155 girl |
8. | 258 3 1.12799 9.98 3155 girl |
9. | 258 4 2.30527 11.34 3155 girl |
10. | 287 1 .134155 4.82 3850 boy |
|-------------------------------------------------|
11. | 287 2 .70089 9.09 3850 boy |
12. | 287 3 1.16906 11.1 3850 boy |
13. | 287 4 2.2423 16.8 3850 boy |
14. | 483 1 .747433 5.76 2875 girl |
15. | 483 2 1.01848 6.92 2875 girl |
|-------------------------------------------------|
16. | 483 3 2.24504 9.53 2875 girl |
17. | 725 1 .120465 4.4 3280 girl |
18. | 725 2 2.30527 12.25 3280 girl |
19. | 800 1 1.12252 10.89 3900 boy |
20. | 800 2 2.26146 12.7 3900 boy |
19 / 26
Observed growth trajectories for boys and girls
5
10
15
20
0 1 2 3 0 1 2 3
boy girl
W
ei
gh
t i
n
Kg
Age in years
Graphs by gender
20 / 26
Modeling the Growth Trajectory Data
y
jt
= β
0
+ β
1
age
jt
+ β
2
age
2
jt
+ α
j
+ �
jt
, (9)
t = 1, ....T
j
, j = 1, ...., n. (10)
α
j
∼ N(µα, σ2α) (11)
21 / 26
Estimation with xtmixed in Stata
. gen age2=age^2
. xtmixed weight age age2 || id:, mle
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -276.83266
Iteration 1: log likelihood = -276.83266
Computing standard errors:
Mixed-effects ML regression Number of obs = 198
Group variable: id Number of groups = 68
Obs per group: min = 1
avg = 2.9
max = 5
Wald chi2(2) = 2623.63
Log likelihood = -276.83266 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 7.817918 .2896529 26.99 0.000 7.250209 8.385627
age2 | -1.705599 .1085984 -15.71 0.000 -1.918448 -1.49275
_cons | 3.432859 .1810702 18.96 0.000 3.077968 3.78775
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
sd(_cons) | .9182256 .0973788 .7458965 1.130369
-----------------------------+------------------------------------------------
sd(Residual) | .7347063 .0452564 .6511507 .8289837
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 78.07 Prob >= chibar2 = 0.0000
.
end of do-file
22 / 26
Incorporating Gender Differences to the Growth Model
y
jt
= β
0
+ β
1
age
jt
+ β
2
age
2
jt
+ β
3
girl
jt
+ β4girl ∗ age
jt
(12)
α
j
+ �
jt
, t = 1, ....T
j
, j = 1, ...., n. (13)
α
j
∼ N(µα, σ2α) (14)
23 / 26
Estimation with xtmixed in Stata
. xtmixed weight age age2 girl age_girl || id:, mle
Iteration 1: log likelihood = -270.7967
Mixed-effects ML regression Number of obs = 198
Group variable: id Number of groups = 68
Obs per group: min = 1
avg = 2.9
max = 5
Wald chi2(4) = 2705.20
Log likelihood = -270.7967 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 7.932362 .2935717 27.02 0.000 7.356973 8.507752
age2 | -1.70546 .1069802 -15.94 0.000 -1.915138 -1.495783
girl | -.4889737 .2752022 -1.78 0.076 -1.02836 .0504127
age_girl | -.2289743 .1377625 -1.66 0.096 -.4989839 .0410353
_cons | 3.676974 .2212291 16.62 0.000 3.243373 4.110575
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
sd(_cons) | .8470338 .0921964 .6843065 1.048457
-----------------------------+------------------------------------------------
sd(Residual) | .7261711 .0446575 .6437132 .8191916
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 69.16 Prob >= chibar2 = 0.0000
24 / 26
Also D-in-D Model
With two periods and strict exogeneity,
y
it
= β
0
+ β
1
D
i2
+ β
2
T
t
+ β
3
T
t
D
it
+ �
it
(15)
1
D
i2
= dummy variable for a treatment that takes place
between time 1 and time 2 for some of the individuals
2
T
t
=a time dummy variable, 0 in period 1, 1 in period 2
3
This is a classic regression model. If there are no regressors,
using least squares,
β
3
= (y
2
− y
1
)
D=1 − (y2 − y1)D=0 (16)
25 / 26
Readings for Week 2
Gellman, Andrew and Jennifer Hill. 2007. Data Analysis Using
Regression and Multilevel/Hierarchical Models. Cambridge
University Press, Chapter 13 and 14
Stata 11.0 Manual Longitudinal/Panel Data, xtmixed, xtreg,
xtregar
Rabe-Hesketh, Sophia and Anders Skrondal. 2005. Multilevel
and Longitudinal Modeling Using Stata. Stata Press, Chapter
3 and 4
Halaby, Charles. 2004. "Panel Models in Sociological
Research: Theory and Practice." Annual Review of Sociology.
30: 507-44
Wooldridge, J.M. 2002. "Econometric Analysis of Cross
Section and Panel Data Cambridge, MA : MIT Press
(especially chapters 13 and 14).
26 / 26
Readings
Introduction: What is Panel Data?
Data Preparation and Exploratory Data Analysis: Smoking Study
本文档为【5分 stata面板数据分析】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。