In this project, we investigate near-duplicate document detection, focusing primarily on the detection of evolving news stories. These stories often consist primarily of syndicated information, with local replacement of headlines, captions, and the addition of locally-relevant content. By detecting near-duplicates, we can offer users only those stories with content materially different from previously-viewed versions of the story.
- Omar Alonso, Dennis Fetterly, and Mark Manasse, Duplicate News Story Detection Revisited, in The Ninth Asia Information Retrieval Societies Conference, Springer Verlag, 9 December 2013
- Omar Alonso, Dennis Fetterly, and Mark Manasse, Duplicate News Story Detection Revisited, no. MSR-TR-2013-60, May 2013