/* -egen- is convenient but can be slow for very large files, or if repeated often. In this demonstration there are 10 million records in 2 million groups and we wish to place the maximum of variable x for each group in all the records for that group. The variable is random and sometimes missing. We can substitute 2 statements for one -egen- command and speed processing by a factor of 20. The method is quite general, it works for minimum, sum, etc with slight and obvious modification. We use by id: replace x = max(x[_n-1],x) to get the maximum within the group into the last member of the group. Then we use by id: gen groupmax = x[_N] to copy the last (and greatest) value to all the other records as the new variable "groupmax". The speed increase in the test run is a factor of twenty over the equivalent -egen- command. The final -assert- statement shows that the results are equivalent. */ log using egen,text replace set obs 10000000 set rmsg on gen id = 5*int(_n/5-.1) sort id gen x = runiform() replace x = . if runiform()>.95 by id: replace x = max(x[_n-1],x) by id: gen groupmax = x[_N] by id: egen egenmax = max(x) assert egenmax == groupmax