Keynote 1: Learning to Represent Natural Language
Yoshua Bengio (Université de Montréal)
Abstract: An important shift in machine learning research of recent years has been the emphasis on learning representations rather than solely relying on hand crafted ones. This has been the motivation behind the very successful recent research on deep learning, which has become a ubiquitous technology for object (and face) recognition in images and speech recognition, while becoming a core element of natural language processing. We review the theoretical motivations for representation learning, in particular for distributed and deep representations. We then review the work on learning word embedding and neural language models and present challenges for ongoing research in learning to represent natural language: learning to represent phrases and sentences, the difficulty of learning long-term dependencies, and the underfitting and optimization issues involved with training large deep nets and recurrent or recursive nets.
Bio: Yoshua Bengio received a PhD in Computer Science from McGill University, Canada in 1991. After two post-doctoral years, one at M.I.T. with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun and Vladimir Vapnik, he became professor at the Department of Computer Science and Operations Research at Université de Montréal. He is the author of two books and more than 200 publications, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Industrial Chair, since '2005 his is a Fellow of the Canadian Institute for Advanced Research. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the new International Conference on Learning Representations. His current interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning. In April 2014, Google Scholar finds more than 16700 citations to his work, yielding an h-index of 56.
Keynote 2: Natural Language Understanding and Generation Powered by Knowledge Graph and Semantic Embedding
Wei-Ying Ma (Microsoft Research)
Bio: Dr. Wei-Ying Ma is an Assistant Managing Director at Microsoft Research Asia where he oversees multiple research groups including Web Search and Mining, Natural Language Computing, Data Management and Analytics, and Internet Economics and Computational Advertising. He and his team of researchers have developed many key technologies that have been transferred to Microsoft’s Online Services Division including Bing Search Engine and Microsoft Advertising. He has published more than 250 papers at international conferences and journals. He is a Fellow of the IEEE and a Distinguished Scientist of the ACM. He currently serves on the editorial boards of ACM Transactions on Information System (TOIS) and ACM/Springer Multimedia Systems Journal. He is a member of International World Wide Web (WWW) Conferences Steering Committee. In recent years, he served as program co-chair of WWW 2008, program co-chair of Pacific Rim Conference on Multimedia (PCM) 2007, general co-chair of Asia Information Retrieval Symposium (AIRS) 2008, and the general co-chair of ACM SIGIR 2011.
Before joining Microsoft in 2001, Wei-Ying was with Hewlett-Packard Labs in Palo Alto, California where he worked in the fields of multimedia content analysis and adaptation. From 1994 to 1997, he was engaged in the Alexandria Digital Library project at the University of California, Santa Barbara. He received a bachelor of science in electrical engineering from the National Tsing Hua University in Taiwan in 1990. He earned a Master of Science degree and doctorate in electrical and computer engineering from the University of California at Santa Barbara in 1994 and 1997, respectively.
In recent years, deep learning has been applied to various text mining and NLP tasks, where the common practice is to learn word embedding. Because words rarely yield meaningful relationships in the original space when viewed as individual tokens, word embedding aims at representing words with semantic correlation into closer positions in the latent space. Such representations of text are typically derived by applying existing neural network frameworks to text corpora and have successfully demonstrated their effectiveness in solving various text-related tasks.
However, as human languages are governed by both syntactic regularities as defined in morphology and grammars and semantic notions supported by common sense knowledge, learning models solely from large text corpora without recognizing the inherent structure in languages may not be the most efficient strategy. Given the existence and availability of rich knowledge stored in different forms, such as databases of word facts like Freebase and Yago, linguistic resources like WordNet and FrameNet, or even implicit usage data from click-through logs from search engines and social media, we believe deep learning frameworks can benefit substantially from leveraging these knowledge resources and thus further advance the state-of-the-art of various text mining tasks.
In this workshop, our goal is to bring together researchers and practitioners in this area, and review and share the latest research results, as well as discussing future directions.
We look forward to your contribution and attendance! See you in Beijing!