|
Overview
|
InSite is a tool for visualizing the structure of
a Web site that helps Web site visitors to search and browse
through the site.
It identifies sub-sites within a site and displays the topics they
cover in order to assist the users in finding pages of
interest.
It enables Web site administrators to learn how users interact
with their Web sites and how to improve the site organization. |
|
Link Structure Graph (LSG) |
 |
The Link Structure Graph model provides a new representation of
the Web hyperlink structure based on link blocks. It captures the
organization of links at the page level and the overall link structure
of the site.
The graph includes several types of link blocks:
- Structural link blocks (s-nodes) - blocks repeated across pages and are typically navigation menus
- Content link blocks (c-nodes) - blocks often grouped by topic association and unlikely to be repeated across pages
- Isolated links (i-nodes) - links that are not part of a link group often found in the body of the text.
|
|
LSG Algorithm |
Step 1 – Page layout analysis
Parse the HTML Document Object Model (DOM) structure of each individual page
At each DOM level look for lists of consecutive links.
Step 2 – Link block classification
Compare similarity of candidate blocks across pages
DOM path + block target set
Classify the link blocks into s-node and c-nodes based on their re-usability across pages.
Step 3 – LSG graph generation
Connect nodes A and B with an edge if any of the target pages of block A contain block B. |
 |
|
Identification of Subsites |
 |
Sub-sites consist of collections of Web pages within a larger site.
Pages from a sub-site often share a common template and the same navigation mechanism.
Sub-sites can be identified by decomposing the LSG into Strongly Connected Components (SCC) of s-nodes. |
|
Prototype |
 |
Project Team |
- Natasa Milic-Frayling
- Eduarda Mendes Rodrigues
- Blaz Fortuna
|