With the rapid growth of the web, there are grand challenges when making sense of web data: big volume, high velocity, high variety, and unknown veracity. In the physical world, a sensor is a converter that measures a physical quantity and converts it into a signal that can be read by an observer or by an instrument—today, mostly electronic. This project creates a virtual, WebSensor layer atop the web.
A WebSensor is a programmable, focused crawler that continuously discovers, extracts, and aggregates structured information about a topic. A WebSensor platform based on Windows PowerShell and the .NET Framework makes it easy for developers to create WebSensors that continuously extract information from the web and generate time-series stream data. End users also can create WebSensors easily for their daily life.
The websensor platform has many built-in capabilities to extract and collect time-sequenced data embedded in web sites. These built-in capabilities include:
- Convenient wrapper generation on webpages (just by a few clicks)
- Automatically wrapper adaption to page layout change
- Easy to configure and run
- Easy to extend using simple script language
- Easy to manage and retrieve the data collected
Websensors can connect to form a sensor network for more complex analysis tasks that involve multiple time-sequenced data.
Tracking count of Bill Gates' followers on twitter.com
It's super easy to track Bill Gates' follower count: just by a click on the current count of followers (8,903,947 on the following snapshot). A time series will then be generated and it will keep update.
the original Bill Gates' Twitter page
The time series ouputted by the sensor which tracks Bill Gates' follower count
Tracking product price
Price of Microsoft Surface (32GB), on http://www.amazon.com/gp/product/B009XNBFJK/