{smcl} {* 3 August 2012} {hline} help for {it:{hi:tsls}} {hline} {title: Fast and Small 2SLS with FE, IV and Clustered SE} {title:Description} {pstd} {cmdab:tsls} {depvar} {indepvars} {cmd:(}{it:varlist2} {cmd:=} {it:varlist_iv}{cmd:)} {cmdab:fe(panelid)} [, {cmdab:a:reg} {cmdab:c:luster(clusterid)} {cmdab:d:emean} {cmdab:r:eplace}] {pstd}This procedure does two-stage least squares with fixed effects, instumental variables and clustered standard errors. While not covering all the capabilities of {cmd:xtivreg2} or {cmd:ivregress} it is memory efficient and is many times faster. Coeficients and standard errors are unaffected. It is intended for datasets with hundreds of millions of observations and hundreds of variables and for users with time for a bit of care and preparation. {title:Options} {pstd} {opt areg} {opt fe(panelid)} must also be specified. Use the {cmd:areg} instead of the {cmd: regress} procedure for the second stage regression, absorbing {it:panelid} with means calculated on-the-fly. This option is incompatible (and unnecessary) with {opt demean} and {opt replace}. See notes below. Standard errors are corrected to match {cmd:xtivreg}. {p_end} {pstd}{opt cluster(clusterid)} Cluster standard errors by {it:clusterid}, which may be different from {it:panelid}. {p_end} {pstd}{opt demean} Demean the variables by {opt fe(panelid)} before running the regression. This is incompatible and unneccessary with {opt areg}. If {opt replace} is specified the demeaning is done in place and the original data is overwritten. This reduces the memory load and if you have multiple regressions with overlapping variables it is efficient to include all your variables in an initial regression with {opt demean} and then subsequent regressions with only the {opt fe(panelid)}. The first regression will drop rows with missing data, and subsequent regressions will be from the same subsample. Note that if you add an un-demeaned variable in one of the subsequent regressions, there will be no error message but the result will be wrong. {p_end} {pstd}{opt fe(panelid)} Required. Specify the variable identifying panel units. If {opt demean} is not specified this only affects the degrees of freedom. {pstd}{opt replace} Used with {opt demean} to cause variables listed in the regression to be replaced with their own deviations from panel unit means. {title:Examples} {pstd}Fixed effects with a storage constraint and clustered errors. This doesn't affect the data. {p_end} {phang2} {cmd:. tsls y1 y2,areg fe(panelid) cluster(clusterid) } {p_end} {pstd} Fixed effect and instrumental variable but the original data is overwritten. {p_end} {phang2}{cmd:. preserve} {p_end} {phang2}{cmd:. tsls y1 (y2 = z1),demean fe(panelid) replace} {pstd} Add clustered standard errors but use the previously demeaned data {p_end} {phang2} {cmd:. tsls y1 (y2=z1) fe(panelid) cluster(clusterid) } {pstd} Drop the IV procedure, still using demeaned data {p_end} {phang2} {cmd:. tsls y1 y2,fe(panelid) }{p_end} {pstd} Check the IV result against {cmd: xtivreg2} {p_end} {phang2}{cmd:. restore} {p_end} {phang2} {cmd:. xtivreg2 y1 (y2 = z1) vce(clustervar clusterid) absorb(panelid)} {p_end} {title:Notes} {pstd} Please note that if any regressions expecting demeaned data refer to variables that are not demeaned the result will be incorrect. Hence the order of commands in the example.{p_end} {pstd} Variables listed in {it varlist2} and {it varlist_iv} must not overlap with any variables listed among {it indepvars}. {pstd}