Portrait of Hoifung Poon

Hoifung Poon

General Manager, Health Futures

About

General Manager at Microsoft Health Futures.

I lead Real-World Evidence (RWE) research for advancing AI in precision health.

Medicine today is imprecise. For the top 20 prescription drugs in the U.S., 80% of patients are non-responders. The dream of precision health is to develop a data-driven, learning system where new health information is instantly incorporated to optimize healthcare delivery and accelerate biomedical discovery. In reality, however, the health ecosystem is plagued by prevalent unstructured data and unscalable manual processing.

My research interest is in developing next-generation AI technology to accelerate progress in access, safety, and preventative care for precision health. At Microsoft, I lead biomedical AI research and incubation, with a particular focus on scaling real-world evidence generation by structuring all medical data:

  • Biomedical large-language model (LLM): We are among the first to develop and apply large language models in biomedicine (e.g., PubMedBERT, BioGPT). LLMs are extremely powerful in supercharging the structuring of biomedical data, but they are also prone to incorrect generation and other limitations. We are developing methods to teach LLMs to fact check themselves, to provide fine-grained provenance, and to facilitate efficient verification with humans in the loop.
  • Biomedical multimodal learning: Besides text, there are other information-rich modalities such as radiology images, digital pathology slides, and genomics. We are developing multi-modal learning and fusion methods, with end-to-end applications such as predicting disease progression and drug response.
  • Causal learning for real-world evidence: Observational data is plagued by confounders. We’re developing advanced causal methods to correct implicit biases and scale biomedical discovery.

We have made promising progress in real-world applications through deep partnership with various stakeholders in healthcare and life sciences. Example talks about our research: AI2, Microsoft Research Seminar, AKBC. I obtained B.S. with Distinction in Computer Science from Sun Yat-Sen University, and Ph.D. in Computer Science and Engineering (my dissertation) from University of Washington. I am an affiliated faculty at University of Washington Medical School, and serve as co-PI for various academic projects such as DARPA Big Mechanisms. My past work spans diverse topics in machine learning and NLP, and has been recognized with Best Paper Awards in top conferences such as NAACL, EMNLP, and UAI.

For more information, check out my publications and LinkedIn profile.

Tutorials: Markov Logic in Natural Language Processing (NAACL-2010), Natural Language Processing for Precision Medicine (ACL-2017), Machine Reading for Precision Medicine (AAAI-2018), Precision Health in the Age of Large Language Models (KDD-2023).

Selected Press Coverage: Bloomberg Technology, Microsoft News, Verge, ZDNet, eWeek, Puget Sound Business Journal, SWE Magazine cover story, Popular Mechanics, Der Spiegel, Medscape, Microsoft AI Blog, GeekWire, FierceHealthcare, GenomeWeb, Venture Beat, Microsoft Research Blog: Domain-Specific Language Model Pretraining, Microsoft Biomedical Search, Microsoft Tech Minutes: Biomed NLP, Clinical Leader, Microsoft Research Blog: Advancing Health at the Speed of AI, Nature News: Multimodal Generative AI in Medicine, Multimodal Generative AI in Precision Health.

Publications

Enhancing Medical Text Evaluation with GPT-4. [Paper]
Yiqing Xie, Sheng Zhang, Hao Cheng, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon.

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. [Paper]
Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz.

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology. [Paper]
Wenhui Wang, Shuming Ma, Hanwen Xu, Naoto Usuyama, Jiayu Ding, Hoifung Poon, Furu Wei.

TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models. [Paper]
Javier González, Cliff Wong, Zelalem Gero, Jass Bagga, Risa Ueno, Isabel Chien, Eduard Orakvin, Emre Kiciman, Aditya Nori, Roshanthi Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon.

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys. [Paper] [Demo]
Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew P. Lungren, Jianfeng Gao, Hoifung Poon.

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. [Paper] [Demo] [Models] [MSR Podcast]
Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon.
In Proceedings of the International Conference on Learning Representations (ICLR), 2024.

Exploring the Boundaries of GPT-4 in Radiology. [Paper]
Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2023.

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology. [Paper]
Cliff Wong, Sheng Zhang, Yu Gu, Christine Moung, Jacob Abel, Naoto Usuyama, Roshanthi Weerasinghe, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon.
In Proceedings of Machine Learning for Health Care (MLHC), 2023

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events. [Paper]
Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon.

Self-Verification Improves Few-Shot Clinical Information Extraction. [Paper]
Zelalem Gero, Chandan Singh, Hao Cheng, Tristan Naumann, Michel Galley, Jianfeng Gao, Hoifung Poon.
In ICML Workshop on Interpretable Machine Learning in Healthcare (IMLH), 2023.

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day. [Paper], [Model]
Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao.
In Proceedings of NeurIPS 2023 Datasets and Benchmarks Track (Spotlight).

Context-faithful Prompting for Large Language Models. [Paper]
Wenxuan Zhou, Sheng Zhang, Hoifung Poon, Muhao Chen.
In Findings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2023.

Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing. [Paper], [Model]
Sheng Zhang, Yanbo Xu, Naoto Usuyama, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Matthew P. Lungren, Tristan Naumann, Hoifung Poon.

BLIAM: Literature-based Data Synthesis for Synergistic Drug Combination Prediction. [Paper]
Cai Yang, Addie Woicik, Hoifung Poon, Sheng Wang.

Compositional Zero-Shot Domain Transfer with Text-to-Text Models. [Paper]
Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Sheng Zhang, Tristan Naumann, Aditya Nori, Hoifung Poon, Javier Alvarez-Valle, Ozan Oktay, Stephanie L. Hyland.
In Transactions of the Association for Computational Linguistics (TACL), 2023.

Continual Contrastive Finetuning Improves Low-Resource Relation Extraction. [Paper]
Wenxuan Zhou, Sheng Zhang, Tristan Naumann, Muhao Chen, Hoifung Poon.
In Proceedings of the Sixty First Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

Optimizing Bi-Encoder for Named Entity Recognition via Contrastive Learning. [Paper]
Sheng Zhang, Hao Cheng, Jianfeng Gao, Hoifung Poon.
In Proceedings of the International Conference on Learning Representations (ICLR), 2023

Towards Structuring Real-World Data at Scale: Deep Learning for Extracting Key Oncology Information from Clinical Text with Patient-Level Supervision. [Paper]
Sam Preston, Mu Wei, Rajesh Rao, Robert Tinn, Naoto Usuyama, Michael Lucas, Roshanthi Weerasinghe, Soohee Lee, Brian Piening, Paul Tittel, Naveen Valluri, Tristan Naumann, Carlo Bifulco, Hoifung Poon.
Patterns (Cell Press), April 2023.

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing. [Paper]
Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao and Hoifung Poon.
Patterns (Cell Press), April 2023.

Multilingual translation for zero-shot biomedical classification using BioTranslator. [Paper]
Hanwen Xu, Addie Woicik, Hoifung Poon, Russ Altman, Sheng Wang
Nature Communications, 2023

Knowledge-Rich Self-Supervised Entity Linking. [Paper]
Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao and Hoifung Poon.
In Findings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2022.

Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. [Paper]
Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton
Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori,
Javier Alvarez-Valle, Hoifung Poon, and Ozan Oktay.
In Proceedings of the 17th European Conference of Computer Vision (ECCV), 2022.

BioGPT: generative pre-trained transformer for biomedical text generation and mining. [Paper], [Model]
Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu.
In Briefings in Bioinformatics, September 2022.

Combining Probabilistic Logic and Deep Learning for Self-Supervised Learning. [Paper].
Hoifung Poon, Hai Wang, Hunter Lang.
Book chapter in Pascal Hitzler, Md Kamruzzaman Sarker (eds.), Neuro-Symbolic Artificial Intelligence: The State of the Art. Frontiers in Artificial Intelligence and Applications Vol. 342, IOS Press, Amsterdam, 2022.

Modular Self-Supervision for Document-Level Relation Extraction. [Paper]
Sheng Zhang, Cliff Wong, Naoto Usuyama, Sarthak Jain, Tristan Naumann, Hoifung Poon.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2021.

Breaching the curation bottleneck with human-machine reading symbiosis. [Paper]
Cliff Wong, Rajesh Rao, Taofei Yin, Cara Statz, Susan Mockus, Sara Patterson, Hoifung Poon.

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature. [Paper] [Microsoft Biomedical Search]
Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon.
In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2021.

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. [Paper, [Model], BLURB, MSR Blog, Webinar]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon.
Special Issue on Computational Methods for Biomedical Natural Language Processing, ACM Transactions on Computing for Health, 2021.

Targeted Adversarial Training for Natural Language Understanding. [Paper]
Lis Pereira, Xiaodong Liu, Hao Cheng, Hoifung Poon, Jianfeng Gao, Ichiro Kobayashi.
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.

Self-Supervised Self-Supervision by Combining Deep Learning and Probabilistic Logic. [Paper]
Hunter Lang, Hoifung Poon.
In Proceedings of the Thirty Fifth Annual Meeting of the Association for the Advancement of Artificial Intelligence (AAAI), 2021.

Adversarial Training for Large Neural Language Models. [Paper]
Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao.

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding. [Paper]
Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao.
In Proceedings of the Fifty Eighth Annual Meeting of the Association for Computational Linguistics (ACL), Demo paper, 2020.

DoubleTransfer at MEDIQA 2019:
Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain. [Paper]
Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, and Jianfeng Gao.
In Proceedings of BioNLP, August 2019.

Augmenting subnetwork inference with information extracted from the scientific literature. [Paper]
Sid Kiblawi, Deborah Chasman, Amanda Henning, Eunju Park, Hoifung Poon, Michael Gould, Paul Ahlquist, Mark Craven.
In PLOS Computational Biology, June 2019.

Document-Level N-ary Relation Extraction with Multiscale Representation Learning. [Paper, Code]
Robin Jia, Cliff Wong, Hoifung Poon.
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), June 2019.

Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision. [Paper, Code]
Hai Wang and Hoifung Poon.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), November 2018.

EZLearn: Exploiting Organic Supervision in Automated Data Annotation. [Paper]
Maxim Grechkin, Hoifung Poon, Bill Howe.
In the 27th International Joint Conference on Artificial Intelligence (IJCAI), July 2018.

Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach. [Paper]
Emmanouil A. Platanios, Hoifung Poon, Tom M. Mitchell, Eric Horvitz.
In NIPS, December 2017.

Classification of common human diseases derived from shared genetic and environmental determinants. [Paper]
Kanix Wang, Hallie Gaitsch, Hoifung Poon, Nancy J Cox, and Andrey Rzhetsky.
In Nature Genetics, August 2017.

Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. [Paper]
Stephen Kurtz et al.
In Proceedings of the National Academy of Sciences of the United States of America (PNAS), July 2017.

Wide-Open: accelerating public data release by automating detection of overdue datasets. [Paper] (Nature News, The Scientist, UW Today)
Maxim Grechkin, Hoifung Poon, and Bill Howe.
In PLOS Biology, June 2017.

Cross-Sentence N-ary Relation Extraction with Graph LSTMs. [Paper, Code]
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Scott Yih.
In Transactions of the Association for Computational Linguistics (TACL), 2017.

Distant Supervision for Relation Extraction beyond the Sentence Boundary. [Paper]
Chris Quirk and Hoifung Poon
In Proceedings of the Fifteenth Conference of the European Association for Computational Linguistics (EACL), 2017.

Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text. [Paper]
Kristina Toutanova, Xi Victoria Lin, Wen-Tau Yih, Hoifung Poon, and Chris Quirk.
In Proceedings of the Fifty Fourth Annual Meeting of the Association for Computational Linguistics (ACL), 2016.

Representing Text for Joint Embedding of Text and Knowledge Bases. [Paper]
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2015.

Model Selection for Type-Supervised Learning with application to POS Tagging. [Paper]
Kristina Toutanova, Waleed Ammar, Pallavi Chourdhury, and Hoifung Poon.
In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2015.

Grounded Semantic Parsing for Complex Knowledge Extraction. [Paper]
Ankur Parikh; Hoifung Poon; Kristina Toutanova
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015.

Distant Supervision for Cancer Pathway Extraction from Text. [Paper]
Hoifung Poon, Kristina Toutanova, and Chris Quirk
In Proceedings of the Pacific Symposium on Biocomputing, 2015.

Literome: PubMed-Scale Genomic Knowledge Base in the Cloud. [Paper]
Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman
Bioinformatics 2014; doi: 10.1093/bioinformatics/btu383

Grounded Unsupervised Semantic Parsing. [Paper]
Hoifung Poon.
In Proceedings of the Fifty First Annual Meeting of the Association for Computational Linguistics (ACL), 2013.

Probabilistic Frame Induction. [Paper]
Jackie Cheung, Hoifung Poon and Lucy Vanderwende.
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2013.

An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data. [Paper]
Christoph Lippert, Jennifer Listgarten, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, David Heckerman.
In Scientific Reports, 2013, doi:10.1038/srep01099.

Sum-Product Networks: A New Deep Architecture. [Paper] [Slides] [Download code and results]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, (UAI), 2011.
Best Paper Award

Unsupervised Ontology Induction from Text. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2010.

Joint Inference for Knowledge Extraction from Biomedical Literature. [Paper]
Hoifung Poon and Lucy Vanderwende.
In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2010.

Unsupervised Semantic Parsing. [Paper] [Slides] [Download data and code]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.
Best Paper Award

Unsupervised Morphological Segmentation with Log-Linear Models. [Paper]
Hoifung Poon, Colin Cherry, and Kristina Toutanova.
In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2009.
Best Paper Award

Language ID in the Context of Harvesting Language Data off the Web. [Paper]
Fei Xia, William Lewis, and Hoifung Poon.
In Proceedings of the Conference of European Association for Computational Linguistics (EACL), 2009.

Joint Unsupervised Coreference Resolution with Markov Logic. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008.

A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC. [Paper]
Hoifung Poon, Pedro Domingos, and Marc Sumner.
In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI), 2008.

Markov Logic. [Book Chapter]
Pedro Domingos, Stanley Kok, Daniel Lowd, Hoifung Poon, Matthew Richardson, Parag Singla.
In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming, 2008.

Joint Inference in Information Extraction. [Paper] [Online Appendix]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI), 2007.

Sound and Efficient Inference with Probabilistic and Deterministic Dependencies. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.

Unifying Logical and Statistical AI. [Paper]
Pedro Domingos, Stanley Kok, Hoifung Poon, Matthew Richardson, Parag Singla.
In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.
Invited paper.