Applying Semantic Analyses to Content-Based Recommendation and Document Clustering

Microsoft Research Connections intern, Eric Rozell, presents the results of his research on feature generation techniques for unstructured data sources. He applies Probase—a web-scale knowledge base that was developed by Microsoft Research Asia and is generated from the Bing index, search query logs, and other sources—to extract concepts from text. He compares the performance of features generated from Probase and two other forms of semantic analysis: Explicit Semantic Analysis using Wikipedia and Latent Dirichlet Allocation. He evaluates the semantic analysis techniques on two tasks: recommendation, by using Matchbox (a platform for probabilistic recommendations from Microsoft Research Cambridge) and clustering, by using K-Means.

Speaker Details

Eric Rozell is a graduate student from Rensselaer Polytechnic Institute working with the Tetherless World Constellation. His research focus is in Semantic e-Science, and he has worked on problems ranging from knowledge representation in virtual observatories to Semantic Web services in application integration scenarios. Eric is also a student fellow for the Federation of Earth Science Informatics Partners and has previously worked as a Summer Student Fellow at Woods Hole Oceanographic Institution.

Date:
Speakers:
Eric Rozell
Affiliation:
Microsoft Research Intern
    • Portrait of Jeff Running

      Jeff Running