The Media Computing Group aims to advance the evolution of multimedia technology and computing technology. Our research enables new computing experiences such as intelligent human-machine-cloud interaction and seamless sharing of resources by effectively understanding and virtualizing the inputs and outputs of computing devices in the network. Envisioning the future of real-time communication and online collaboration, we also develop an end-to-end neural multimedia communication framework by exploring the potential of AI in real-time, intelligent, and fully immersive meeting experiences. Current research interests include, but are not limited to, video analytics, computer vision, audio and speech, media compression, real-time communication.

Research Topics

Computer Vision – We conduct our research on computer vision mainly in three areas: scene understanding, visual recognition, and visual media manipulation. Specifically, we tackle fundamental problems and promote applications including 2D/3D scene parsing, 3D reconstruction, 2D/3D object detection, video classification, video object segmentation, multi-view correspondence learning, and video enhancement and retouching. We sustain excellence in the academic area and also contribute our advanced techniques to Microsoft products, such as video background blur/replacement and together mode for virtual group meeting in Microsoft Teams.

Audio and Speech – Our research aims to provide real-time and intelligent audio and speech technologies for real-world applications. We’re rolling out a new feature of blur for voice for Microsoft Teams, where using deep speech enhancement we can eliminate distracting background noises. We also apply AI to fill in the voice gaps to make it sound like a steady stream of conversation. Other related research topics include AI-based echo cancellation, speech super-resolution, speech recovery, and quality control in a real-time audio pipeline.

Media Compression – We develop advanced media compression technologies for image, video, and graphics. One of the big things is the screen codec (a.k.a. Titanium) that we have developed to improve user experiences of screen sharing for various Microsoft products. Our group is also an active contributor to video coding standards such as MPEG-4, H.264/AVC and H.265/HEVC. Our current focus is to develop the AI-powered media compression framework.

AI-based RTC Optimization – Many rules and codes developed with traditional system approach may result in suboptimal performance. The latest advances in AI could be leveraged to replace these rules with models trained from the real-world data. Reinforcement-learning-based RTC optimization is a new paradigm shift of AI-based software engineering. Our research aims to advance AI to optimize the quality and reduce the latency of audio, video, and screen sharing.