Quick Links |Home|Worldwide
Microsoft

Search:
Integrated Systems
Integrated Systems  |  People  |  Projects  |  Publications  |  MSRC

InSite Live!
InSite Live!
Overview
InSite is a tool for visualizing the structure of a Web site that helps Web site visitors to  search and browse through the site.

It identifies sub-sites within a site and displays the topics they cover in order to assist the users in finding pages of interest.

It enables Web site administrators to learn how users interact with their Web sites and how to improve the site organization.

Link Structure Graph (LSG)
The Link Structure Graph model provides a new representation of the Web hyperlink structure based on link blocks. It captures the organization of links at the page level and the overall link structure of the site.

The graph includes several types of link blocks:
  • Structural link blocks (s-nodes) - blocks repeated across pages and are typically navigation menus
  • Content link blocks (c-nodes) - blocks often grouped by topic association and unlikely to be repeated across pages
  • Isolated links (i-nodes) - links that are not part of a link group often found in the body of the text.

LSG Algorithm
Step 1 – Page layout analysis

Parse the HTML Document Object Model (DOM) structure of each individual page
At each DOM level look for lists of consecutive links.

Step 2 – Link block classification

Compare similarity of candidate blocks across pages
DOM path + block target set
Classify the link blocks into s-node and c-nodes based on their re-usability across pages.

Step 3 – LSG graph generation

Connect nodes A and B with an edge if any of the target pages of block A contain block B.
 

Identification of Subsites
Sub-sites consist of collections of Web pages within a larger site.

Pages from a sub-site often share a common template and the same navigation mechanism.

Sub-sites can be identified by decomposing the LSG into Strongly Connected Components (SCC)  of s-nodes.

Prototype
Community Buzz UI

Project  Team
  • Natasa Milic-Frayling
  • Eduarda Mendes Rodrigues
  • Blaz Fortuna

©2007 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement