Computer Systems Laboratory Colloquium

11:00AM, Wednesday, June 24, 1998
Terman 156

Datamining the Web to create a Navigation service: Alexa

Brewster Kahle
President Alexa Internet and Internet Archive

Named for the Library of Alexandria, Alexa is a free Internet navigation service that learns from people. To do this, we use a full archive of the public web and aggregated usage paths of many people. For any web page, Alexa suggests other pages that one might want to see. Going beyond keyword search, we use the paths that other users have taken to find "the good stuff". If we develop a system that can leverage what millions have thought, then we will have built something new and useful.

Keyword searching is being stretched: using 2 words to find the right 10 documents out of 100 million is a very difficult task. Another approach is to manually catalog web documents to create a directory. The largest directory only points to much less than 1% of the current web. The problem is getting worse: the number of websites is doubling every 6 months (we have found this because we are crawling and archiving the whole public web). Alexa's goal is to be knowledgable on every subject to suggest where the quality web resoures are. Alexa is not an "artificial intelligence", rather aggregates and organizes what it learns from people. We do this by looking for patterns in the usage patterns, hypertext link structures, and content of the web.

7 terabytes of web content, usage trails of 10's of thousands, tape robots are all combined to build this service. We welcome ideas on how to do this better.

About the speaker:

Cofounded Alexa Internet in 1996. In 1989 invented Wide Area Information Servers, an early Internet publishing system, founded WAIS Inc, which worked with Dow Jones, NYTimes, Government Printing Office, Encyclopaedia Britannica to put them on the net. Sold WAIS Inc to America Online in 1995.

Before that went to MIT, and helped start Thinking Machines a massively parallel computer company. There he architected a Connection Machine and started using them for mining large text collections.

Contact information:

Brewster Kahle
Alexa Internet
Presidio, Bldg. 116, Box 29141
Corner Sheridan and Ord St.
San Francisco, CA 94129