About Me
Dr. Jun Yan received the Ph.D. degree in digital signal processing and pattern recognition from the department of information science, school of mathematical science, Peking University, P.R. China. During his Ph.D., he has been a research intern of MSRA from 2003 to 2005 and awarded as Microsoft fellow in 2004. Before join Microsoft, he has been a research associate at CBI, HMS, Harvard, Cambridge, MA, in 2005. He joined Microsoft Research Asia (MSRA) from 2006. Currently he is working in the machine learning group of MSRA as a researcher. His research interests are on online advertising, large scale information extraction and mining, data preprocessing and information retrieval etc. So far, he has successfully incubated several technologies, which have been used in Microsoft products. In academia, he has more than 50 quality papers published in referred conferences and journals, including SIGKDD, SIGIR, WWW, ICDM, TKDE, etc. He has been the PC members of international conferences SIGKDD, SIGIR etc. and is also reviewers of journals articles TKDE, TPAMI etc.
Research Interests
- Large scale Web knowledge extraction and mining
- Behavioral targeted online advertising
- Large scale data preprocessing
- Machine learning for information retrieval
- User modeling and understanding
Selected Projects
- Knowledge Table. Kable, which is known as Knowledge Table project, aims to automatically extract structured domain knowledge from the semi-structured and unstructured World Wide Web. And then process and store the knowledge in Table format with each row stands for a domain entity and each column stands for an attribute. The cells in Kable are the attribute values of corresponding entity-attribute pairs. Construct this kind of structured knowledge base is important for different OSD applications such as BING search, paid search, display ads etc. Kable research concept map has three layers, which are Data Layer, Model Layer and Application Layer.
- Intent based behavioral targeting project. Description: this project aims to sale the “intents” to advertisers in behavioral targeted advertising. We classify the user search behaviors into different user intent categories, based on which we can accurately deliver ads to audience. In this project, I mainly focus on the algorithm design and driving the cross group research efforts within MSRA.
- · Online ad relevance verification project. Description: this project aims to improve the ad relevance in Bing paid search. We propose novel features and classifier to improve the ad relevance in a machine learning view. In this project, I mainly focus on the algorithm design, feature proposal and lead the research efforts within MSRA.
- · Bing search task classification project. Description: this project aims to understand whether the Bing search users have the intent to compare sports domain Web objects. Bing will return the side by side comparison without requiring users to browse the 10-blue links. We propose classification solution to make it give satisfied performance to online users. In this project, I work together with product team to design and transfer the intent classifier.
- · Self-service BT prototyping. Description: this project aims to let the advertiser can customize their user segments for ads delivery. We propose the Minhash based user clustering solution and implement the prototype. In this project, I mainly focus on the scenario design, algorithm design and leading the team wide research efforts within MSRA.
- · Office online assets recommendation project. Description: This project aims to recommend the possibly user interested assets to “office online” users based on similar users’ behaviors. We develop the algorithm to make this online recommendation and transfer the technology to Office Online AP team. In this project, I mainly focus on the algorithm design and driving the research efforts within MSRA.
Patents
- · Indexing Semantic User Profiles for Targeted Advertising
- Web Knowledge Extraction for Search Task Simplification
- Build of Website Knowledge Tables
- Forecasting Search Queries based on Time Dependencies(Appl. No. 11/770,462)
- · Clustering Aggregator for RSS feeds (Appl. No. 20090327320)
- · Prediction of Future Popularity of Query Terms (Appl. No. 20090222321)
- · Categorizing Online User Behavior Data (MS#327757.01)
- · Representing Queries and Determining Similarity based on An ARIMA Model (Appl. No. 20090006326)
- · Identification of Events of Search Queries (Appl. No. 11/770,423)
- · Forecasting Time-Dependent Search Queries (Appl. No. 11/770,385)
- · Learning Latent Semantic Space for Ranking
- · Identification of Similar Queries based on Overall and Partial Similarity of Time Series
- · Determination of Time Dependency of Search Queries (Appl. No. 11/770,358.)
- · Forecasting Time Independent Search Queries (Appl. No. 11/770,445)
- · Scalable Parallel User Clustering in Discrete Time Window (Appl. No. 20100169258)
- · Learning User Intent from Rule-based Training Data (MS# 329229.01)
- · Related Links Recommendation (MS# 329226.01)
Publications
- Jian Tang, Jun Yan, Lei Ji, Ming Zhang et al. Collaborative Users’ Brand Preference Mining across Multiple Domains from Implicit Feedback, AAAI 2011
- Siyu Gu, Jun Yan, Shuicheng Yan, Cross Domain Random Walk for Query Intent Pattern Mining from Search Engine Log, ICDM 2011
- Jianwei Cui, Hongyuan Liu, Jun Yan, Lei JI et al.A Novel Multi-view Random Walk Algorithm for Search Task Discovery from Click-through Log, CIKM 2011
- Yingqin Gu Jun Yan et al. Structured Data Extraction from Web Sites, CIKM 2011
- Jian Tang, Ning Liu, Jun Yan, Yelong ShenLearning to Rank Audience for Behavioral Targeting in Display Ads, CIKM 2011
- Zeyu Zheng, Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen, A Novel Contrast Co-Learning Framework for Generating High Quality Training Data, ICDM 2010
- Yelong Shen, Jun Yan, Shuicheng Yan, Lei Ji, Ning Liu, Zheng Chen, Sparse Hidden-Dynamics Conditional Random Fields for User Intent Understanding, WWW 2011
- · Jun Yan, Dou Shen, Teresa Muh, Ning Liu, Zheng Chen, Ying Li, behavioral targeted online advertising, book chapter in IGI book, MMADS
- · Jun Yan, Ning Liu, Shuicheng Yan, Qiang Yang, Weiguo Fan, Wei Wei, Zheng Chen, Trace Oriented Feature Analysis for Large Scale Text Data Dimension Reduction, to appear in IEEE Transactions on Data Engineering
- · Jun Yan, Zeyu Zheng, Li Jiang, Yan Li, Shuicheng Yan, Zheng Chen, learning user intent from rule based training data, SIGIR 2010 (poster)
- · Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen: Straightforward Feature Selection for Scalable Latent Semantic Indexing. SDM 2009: 1159-1170
- · Jun Yan, Ning Liu, Elaine Qing Chang, Lei Ji, Zheng Chen: Search result re-ranking based on gap between search queries and social tags. WWW 2009: 1197-1198
- · Jun Yan, Ning Liu, Gang Wang, Wen Zhang, Yun Jiang, Zheng Chen: How much can behavioral targeting help online advertising? WWW 2009: 261-270
- · Jun Yan: Text Representation. Encyclopedia of Database Systems 2009: 3069-3072
- · Jun Yan, Jian Hu: Text Semantic Representation. Encyclopedia of Database Systems 2009: 3075-3078
- · Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen: Learning the Latent Semantic Space for Ranking in Text Retrieval. ICDM 2008: 1115-1120
- · Jun Yan, Ning Liu, Qiang Yang, Weiguo Fan, Zheng Chen: TOFA: Trace Oriented Feature Analysis in Text Categorization. ICDM 2008: 668-677
- · Jun Yan, Xiaobo Zhou, Qiong Yang, Ning Liu, QianSheng Cheng, Stephen T. C. Wong: An Effective System for Optical Microscopy Cell Image Segmentation, Tracking and Cell Phase Identification. ICIP 2006: 1917-1920
- · Jun Yan, Ning Liu, Qiang Yang, Benyu Zhang, QianSheng Cheng, Zheng Chen: Mining Adaptive Ratio Rules from Distributed Data Sources. Data Min. Knowl. Discov. 12(2-3): 249-273 (2006)
- · Jun Yan, Benyu Zhang, Ning Liu, Shuicheng Yan, QianSheng Cheng, Weiguo Fan, Qiang Yang, Wensi Xi, Zheng Chen: Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing. IEEE Trans. Knowl. Data Eng. 18(2): 320-333 (2006)
- · Jun Yan, Benyu Zhang, Shuicheng Yan, Ning Liu, Qiang Yang, QianSheng Cheng, Hua Li, Zheng Chen, Wei-Ying Ma: A scalable supervised algorithm for dimensionality reduction on streaming data. Inf. Sci. 176(14): 2042-2065 (2006)
- · Jun Yan, QianSheng Cheng, Qiang Yang, Benyu Zhang: An Incremental Subspace Learning Algorithm to Categorize Large Scale Text Data. APWeb 2005: 52-63
- · Jun Yan, Ning Liu, Benyu Zhang, Qiang Yang, Shuicheng Yan, Zheng Chen: A Novel Scalable Algorithm for Supervised Subspace Learning. ICDM 2006: 721-730
- · Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, QianSheng Cheng, Weiguo Fan, Wei-Ying Ma: OCFS: optimal orthogonal centroid feature selection for text categorization. SIGIR 2005: 122-129
- · Jun Yan, Benyu Zhang, Shuicheng Yan, Qiang Yang, Hua Li, Zheng Chen, Wensi Xi, Weiguo Fan, Wei-Ying Ma, QianSheng Cheng: IMMC: incremental maximum margin criterion. KDD 2004: 725-730
- · Jun Yan, Ning Liu, Shuicheng Yan, Qiang Yang, Zheng Chen, Synthesizing Novel Dimension Reduction Algorithms in Matrix Trace Oriented Optimization Framework, ICDM 2009 (regular)
- · Depin Chen, Jun Yan, Yan Xiong, Gui-Rong Xue, Gang Wang, Zheng Chen Knowledge Transfer for Cross Domain Learning to Rank, to appear in Journal of Information Retrieval DOI : 0.1007/s10791-009-9111-2
- · Hanhua Chen, Jun Yan, Hai Jin, Yunhao Liu, and Lionel M. TSS: Efficient Term-Set Search in Large Peer-to-Peer Textual Collections, to appear in IEEE Transactions on Computers
- · Xiaobai Liu, Shuicheng Yan, Jun Yan, and Hai Jin, Unified Solution to Nonnegative Data Factorization Problems, ICDM 2009 (regular)
- · Lei Ji, Jun Yan, Ning Liu, Wen Zhang, Weiguo Fan, Zheng Chen ExSearch: A Novel Vertical Search Engine for Online Barter Business, CIKM 2009 (regular)
- · Tianqi Chen, Jun Yan, Guirong Xue, Zheng Chen, Transfer Learning for Behavioral Targeting, WWW 2010 (poster)
- · Ning Liu, Jun Yan, Dou Shen, Depin Chen, Ying Li, Zheng Chen, learning to rank audience for behavioral targeted advertising, SIGIR 2010 (poster)
- · Junwu Du, Zhimin Zhang, Jun Yan, Zheng Chen, Named Entity Recognition in query using session context, SIGIR 2010 (poster)
- · Siyu Gu, Jun Yan, Ning Liu, Zheng Chen et al, what are driving users click ads? User habits, attitudes and commercial intention, KDD workshop, ADKDD 2010
- · Zeyu Zheng, Jun Yan, chi Zhang, Zheng Chen et al, learning user intent for sponsored search, KDD workshop, ADKDD 2010
- · Bingbing Ni, Shuicheng Yan, Guangyu Zhu, Zheng Song, Yongning Lu, Dong Guo, Jun Yan, A Vision-based Demographic Advertisement System, ICCV 2009 (Demo)
- · J. Cui, Y. Gu, J. He, X. Jiang and X. Du H. Liu J. Yan StoryTeller: Detecting Hot Topics and Topic Development from Click Through Data, KDD 2010 (demo)
- · Ting Li, Ning Liu, Jun Yan, Gang Wang, Fengshan Bai, Zheng Chen: A Markov chain model for integrating behavioral targeting into contextual advertising. KDD Workshop on Data Mining and Audience Intelligence for Advertising 2009: 1-9
- · Xiaohui Wu, Jun Yan, Ning Liu, Shuicheng Yan, Ying Chen, Zheng Chen: Probabilistic latent semantic user segmentation for behavioral targeted advertising. KDD Workshop on Data Mining and Audience Intelligence for Advertising 2009: 10-17
- · Depin Chen, Ning Liu, Zhijun Yin, Yang Tong, Jun Yan, Zheng Chen: CLHQS: Hierarchical Query Suggestion by Mining Clickthrough Log. PAKDD 2009: 764-771
- · Yunzhang Zhu, Gang Wang, Junli Yang, Dakan Wang, Jun Yan, Jian Hu, Zheng Chen: Optimizing search engine revenue in sponsored search. SIGIR 2009: 588-595
- · Wen Zhang, Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen: Temporal query substitution for ad search. SIGIR 2009: 798-799
- · Ning Liu, Jun Yan, Zheng Chen: A probabilistic model based approach for blended search. WWW 2009: 1075-1076
- · Ning Liu, Jun Yan, Weiguo Fan, Qiang Yang, Zheng Chen: Identifying vertical search intention of query through social tagging propagation. WWW 2009: 1209-1210
- · Xin Li, Jun Yan, Weiguo Fan, Ning Liu, Shuicheng Yan, Zheng Chen: An online blog reading system by topic clustering and personalized ranking. ACM Trans. Internet Techn. 9(3): (2009)
- · Shuicheng Yan, Huan Wang, Yun Fu, Jun Yan, Xiaoou Tang, Thomas S. Huang: Synchronized Submanifold Embedding for Person-Independent Pose Estimation and Beyond. IEEE Transactions on Image Processing 18(1): 202-210 (2009)
- · Depin Chen, Jun Yan, Gang Wang, Yan Xiong, Weiguo Fan, Zheng Chen: TransRank: A Novel Algorithm for Transfer of Rank Learning. ICDM Workshops 2008: 106-115
- · Ning Liu, Jun Yan, Shuicheng Yan, Weiguo Fan, Zheng Chen: Web Query Prediction by Unifying Model. ICDM Workshops 2008: 436-441
- · Weizhu Chen, Jun Yan, Benyu Zhang, Zheng Chen, Qiang Yang: Document Transformation for Multi-label Feature Selection in Text Categorization. ICDM 2007: 451-456
- · Wen Pu, Ning Liu, Shuicheng Yan, Jun Yan, Kunqing Xie, Zheng Chen: Local Word Bag Model for Text Categorization. ICDM 2007: 625-630
- · Xuefeng He, Jun Yan, Jinwen Ma, Ning Liu, Zheng Chen: Query topic detection for reformulation. WWW 2007: 1187-1188
- · Xin Li, Jun Yan, Zhi-Hong Deng, Lei Ji, Weiguo Fan, Benyu Zhang, Zheng Chen: A novel clustering-based RSS aggregator. WWW 2007: 1309-1310
- · Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen: Diverse Topic Phrase Extraction through Latent Semantic Analysis. ICDM 2006: 834-838
- · Ning Liu, Shuzhen Nong, Jun Yan, Benyu Zhang, Zheng Chen, Ying Li: Similarity of Temporal Query Logs Based on ARIMA Model. ICDM Workshops 2006: 366-370
- · Ning Liu, Jun Yan, Fengshan Bai, Benyu Zhang, Wensi Xi, Weiguo Fan, Zheng Chen, Lei Ji, Chenyong Hu, Wei-Ying Ma: A Similarity Reinforcement Algorithm for Heterogeneous Web Pages. APWeb 2005: 121-132
- · Ning Liu, Fengshan Bai, Jun Yan, Benyu Zhang, Zheng Chen, Wei-Ying Ma: Supervised Semi-definite Embedding for Email Data Cleaning and Visualization. APWeb 2005: 972-982
- · Dong Zhuang, Benyu Zhang, Qiang Yang, Jun Yan, Zheng Chen, Ying Chen: Efficient Text Classification by Weighted Proximal SVM. ICDM 2005: 538-545
- · Ning Liu, Benyu Zhang, Jun Yan, Zheng Chen, Wenyin Liu, Fengshan Bai, Leefeng Chien: Text Representation: From Vector to Tensor. ICDM 2005: 725-728
- · Benyu Zhang, Jun Yan, Ning Liu, QianSheng Cheng, Zheng Chen, Wei-Ying Ma: Supervised semi-definite embedding for image manifolds. ICME 2005: 592-595
- · Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, Dong Zhuang: SimFusion: measuring similarity using unified relationship matrix. SIGIR 2005: 130-137
- · Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, Fengshan Bai, Wei-Ying Ma: Learning similarity measures in non-orthogonal space. CIKM 2004: 334-341
- · Ning Liu, Benyu Zhang, Jun Yan, Wensi Xi, Shuicheng Yan, Zheng Chen, Fengshan Bai, Wei-Ying Ma: Online Supervised Learning for Digital Library. ICADL 2004: 683
- Chenyong Hu, Benyu Zhang, Shuicheng Yan, Qiang Yang, Jun Yan, Zheng Chen, Wei-Ying Ma: Mining Ratio Rules Via Principal Sparse Non-Negative Matrix Factorization. ICDM 2004: 407-410
Contact Me:
Jun Yan
Microsoft Research Asia
14452, Building 2, No. 5 Dan Ling Street, Haidian District, Beijing, P.R. China, 100080
xxx at microsoft dot com, xxx=junyan
Office: +(86)10-5917-5012
