Reading a compressed .dta file through a Unix pipe

The Stata knowledge base includes a note on reading ASCII data from a pipe, which allows one to process a file without storing the decompressed version on disk. The code shown there may or may not work depending on your shell and the setting of the "noclobber" variable and it stopped working in version 12, only to reappear in version 16, or perhaps a little before that. I have found the following example works for me, where pipetest.dta.gz is a gzipped .dta file:

. ! /bin/rm mypipe.dta . ! mknod mypipe.dta p . ! zcat pipetest.dta.gz >> mypipe.dta & . use mypipe

The pipe is created with the mknod command. The decompressed file is written into it. Because nothing is reading from the other end of the pipe, nothing happens yet - the write blocks. The decompression program is placed in the background so that execution can return to Stata. The Stata -use- statement reads from the pipe, and that wakes up the decompression program. Note the use of ">>" rather than ">" in the zcat. That is the only actual difference between this solution and Stata's, but it helps for shells with "noclobber" set.

In tests this is a bit slower than reading from a decompressed file, but much faster than decompressing to disk and then reading. Even in Stata 16 you can still create files in version 11 format with:

saveold filename,version(11)

Note that the Stata commands -zipfile- and -unzipfile- do not use pipes, but create a temporary file and do not save any I/O or cpu time compared to using shell commands to decompress the file before reading it.


Last updated June 17 2021 by drf