How NBER determines which papers or publications were downloaded together

Alex Aminoff, NBER web programmer
April 2017

A particular client IP address downloads one or more papers or publications (henceforth, WPs) in a session. A session ends (and a new one potentially begins) when a given client IP address does not download any more WPs for 4 hours.

The affinity score of all WPs in a session with each other is incremented by the reciprocal of the number of WPs downloaded in that session. This has the nice side effect that bots, which download hundreds or thousands of WPs in a session, contribute minimally to the score.

We would like to weight recent sessions more than older ones, so we weight each affinity increment by the time elapsed since Jan 1 2000. The download data we have analyzed only reaches back to 2010, so the recency weighting is relatively mild.

WPs that come out at the same time are often downloaded together in the first few weeks after the date of issue, even if they are unrelated. This is because some users may click through to several unrelated papers from the New This Week email or from the New This Week page on the web site. To avoid this artificial affinity, we discount the affinity score of a pair of WPs if the age of the youngest WP at the time of download is less than 45 days. The discount is a function of the age, dropping to 0 at 8 days.

This algorithm appear to produce reasonable results, as judged by NBER staff and an informal sample of economists.

If you need more details about this algorithm, you are welcome to contact NBER's IT department.