PITTSBURGH Being among the first to pick up on Internet news and gossip and rapidly detecting contamination anywhere in a water supply system are similar problems, at least from a computer scientists point of view. Both can be solved with a versatile algorithm developed by Carnegie Mellon University researchers.
Using a problem-solving method called the Cascades algorithm, Carlos Guestrin, assistant professor of computer science and machine learning, and his students compiled a list of the best 100 blogs to read to find the biggest news on the Web as early as possible, http://www.blogcascades.org/. It includes well-known blogs, such as Instapundit and Boing Boing, but also some more obscure ones like Watcher of Weasels and Don Surber.
The goal of our system when looking at blogs is to detect the big stories as early on and as close to the source as possible, Guestrin said. He, Andreas Krause and Jure Leskovec, doctoral students in computer science and machine learning, respectively, analyzed 45,000 blogs (those that actively link to other blogs) to compile the list, checking the time stamps to determine where news items were being posted first.
But reading even 100 blogs, many of them with numerous postings, may be more than many Web surfers can handle. Recasting the problem, the researchers used their algorithm to compile a list of blogs if a person wanted to read only 5,000 postings. This list is quite different, with summarizer blogs, such as The Modulator and Anglican predominating.
Similarly, Guestrin and his students used the same algorithm to determine the optimal number and placement of sensors for detecting the introduction and spread of contaminants in a municipal water supply. Their report on the blog and water system case studies, Cost-Effective Outbreak Detection in Networks, was presented at the Association for Computing Machinerys International Conference on Knowledge Discovery and Data Mining earlier this year.
|Contact: Byron Spice|
Carnegie Mellon University