By Rob Knies
March 6, 2007 6:00 AM PT
One of the great things about the Internet is the sheer magnitude of the resources it offers. It’s humbling, when you stop to think about it. With Web pages numbering in the billions, any individual’s particular interests, no matter how expansive, amount to merely a drop in the online bucket. The greatest limitation to surfing the Net is the imagination of the surfer.
That’s good, though. Practically anything imaginable is available for investigation. Even a relatively narrow topic is likely to offer tens of thousands of Internet trails to explore. Never before have so many had access to so much information.
That’s bad, though. In most cases, we don’t need to tread down a thousand info-trails. Usually, we need something specific—a name, a number, a technique, an explanation. Time is short, and we are busy. On occasion, too much data can be as shackling as too little.
But don’t fret. Help is on the way—as demonstrated at a pair of booths featured March 7-8 during TechFest 2007, Microsoft Research’s annual research-project showcase. Each, in its own way, sets its sights on helping cut through the clutter to enable users to identify precisely what they need and want from their online experiences.
One focuses on streamlining Internet navigation to provide easy access to pages of interest. The other intends to make Internet video as simple and inviting as watching TV. Two different problems, two different approaches, one common denominator: whittling the Web down to size.
“Comprehending the scope and the size of a Web site based on the home page alone is very difficult,” says Natasa Milic-Frayling, a senior researcher within the Integrated Systems group at Microsoft Research Cambridge.
Milic-Frayling and her colleagues Eduarda Mendes Rodrigues and Blaz Fortuna are exploring the challenges in effective navigation of Web sites that, in many cases, evolve organically to the point where they include thousands of pages, some of them created for specific purposes, in styles significantly different from the site’s home page.
How can users possibly come to understand the global organization of such sites and thus gain easy access to pages of interest?
Milic-Frayling and her team have devised a technology called InSite Live! that analyzes the organization of a site based on a technique called Link Structure Graph (LSG). It uses information about the groupings of navigational links on individual pages—such as menus and clusters of links that refer to pages with related content—to identify subsites and their prominent topics.
“Based on this information,” Mendes Rodrigues says, “InSite Live! provides visualization of the navigation and topic structure and enables the user to explore the site not only by navigating individual pages, but also by hopping from one subsite to another.”
To accomplish this, InSite Live! crawls a site and performs LSG analysis on the collected pages. The biggest challenge the technology must overcome is to collect all site pages to provide a full representation. Many Web pages these days are dynamically generated and can change often, making a complete site representation obsolete in a hurry.
InSite Live! can help in that regard because of the advantages it offers to Web-site administrators. The technology can help them organize their site and create a representation that would enable visitors to learn more about the site’s scope and organization. Meanwhile, the Web-site administrator gets a chance to see the frequency at which subsites are accessed and, therefore, can develop a strategy to maximize traffic as desired.
Once InSite Live! has concluded its site analysis, it generates a graphical site map dazzling in its complexity and its visual connections.
“For the first time, we can see how constellations of Web pages are connected through navigation menus,” Mendes Rodrigues says, “and how these menus are connected to take the user from the home page to peripheral parts of the site. With a single click, we can see which topics are covered by the site and its subsites.”
Such a view also provides a bit of reward for Milic-Frayling, Fortuna, and Mendes Rodrigues—who literally can see the fruits of their labors.
“It is absolutely fascinating to view the InSite map of an entire site, which may have thousands of pages,” Milic-Frayling says. “No single user can possibly browse and build in their mind a comprehensive model of the Web-site organization. With InSite Live!, we can.”
New Things to Watch on TV
Internet video has enjoyed mushrooming popularity in recent months, as exemplified by the explosive popularity of Web sites such as YouTube and Soapbox on MSN Video. Millions of people are accessing short video clips every day, for occasional edification and an endless stream of laughs. It has become a virtual medium in itself, a user-controlled alternative to mainstream television.
One problem, though. Almost exclusively, such content is accessible only via computers. And computers, for all their many conveniences, do not offer the same easy, relaxed viewing experience as does your garden-variety TV. Couch potatoes want in on the fun, too.
Kit Thambiratnam wants to change all that. Thambiratnam, a researcher within the Speech Group at Microsoft Research Asia, has been working on a project he calls Relaxed Internet Video Exploration and Discovery, and his mission—and that of collaborating colleagues Frank Seide and Roger Yu—is to bring Internet video to the masses.
“The purpose of this project,” Thambiratnam says, “is to make enjoying Internet video as simple and easy as it is to watch traditional TV content.
“Video content was made to be enjoyed on your TV, not on your PC,” he continues. “Unfortunately, it’s just much too difficult to enjoy all the great Internet video content on your TV.”
The reasons for this are many, but they boil down to ease of use. When people are sitting back on their couch, looking for something to watch, they don’t want to browse lists, search for content, or formulate queries. They want to click a remote and have entertaining video wash over them.
“We want to build upon familiar concepts such as TV channels and channel surfing,” Thambiratnam explains, “but then use machine learning and other ‘smart’ technologies to bridge the gap between the user and the vast amounts of video on the Internet.”
To do so, Thambiratnam is packaging three technologies into a seamless whole:
Of the three, the content recommendations are paramount in this project.
“We use speech-recognition technology to understand what’s being said in the videos, so when we recommend content, it’s more related to the actual content of the video,” Thambiratnam explains. “The idea is to provide a broad range of content related to what you’re watching. If you’re watching a documentary on houses, we may offer a news clip about rising house prices or a do-it-yourself home-improvement show.
“We're also trying to build a system that allows you to vote for content that you like, so that, as you watch, you can teach the computer how to make recommendations for you.”
Given the diversity of Internet video, one of the biggest challenges Thambiratnam faces is developing a machine-learning algorithm that works across genre, type, and quality of particular videos.
“The only ways we can realize this project is to use a cross-section of technologies,” he says, “all backed by the fundamental machine-learning techniques we use in speech recognition.
“The goal is to create the TV of tomorrow. We want to realize a video jukebox that is as simple to use as your TV, but uses machine-learning and intelligence technologies to transparently give you access to the almost infinite amount of video on the Internet.”
And the most rewarding part of working on a project like this? Thambiratnam smiles.
“It’s something I would actually use.”