Improving Website Hyperlink Structure Using Server Logs

Ashwin Paranjape*, Robert West*, Jure Leskovec, Leila Zia
9th ACM International Conference on Web Search and Data Mining (WSDM), 2016

Good websites should be easy to navigate via hyperlinks, yet main- taining a link structure of high quality is difficult. Identifying pairs of pages that should be linked may be hard for human editors, es- pecially if the site is large and changes are frequent. Further, given a set of useful link candidates, the task of incorporating them into the site can be expensive, since it typically involves humans edit- ing pages. In the light of these challenges, it is desirable to de- velop data-driven methods for partly automating the link placement task. Here we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of as yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia, a large website for which we have access to both server logs (used for finding useful new links) and the complete revision history (used as ground truth). As our method is based exclusively on standard server logs, it may also be applied to any other website, as we show at the example of the biomedical research site Simtk.