The Role of Signal Processing in the Multimedia Communications Revolution

Lawrence Rabiner, Center for Advanced Information Processing, Rutgers University, USA

 

Getting Internet Video Ready for Prime Time  

Bernd Girod, Stanford University, USA

 

Advances in Mobile Computing   

Ya-Qin Zhang, Microsoft, USA

 

The keynote will be presented by Jenq-Neng Hwang on Ya-Qin Zhang's behalf.

 

A personal history of Perceptual Coding --Tripping over the cobblestones to MP3 and beyond  

James D. (JJ) Johnston, Microsoft, USA

 

KEYNOTE 1

 

When:            Wednesday, October 4, 2004, 8:15 AM – 9:15 AM

Where:           Crystal Ballroom

 

Title:              The Role of Signal Processing in the Multimedia Communications Revolution

 

Speaker:        Lawrence Rabiner

Center for Advanced Information Processing, Rutgers University, USA

lrr@caip.rutgers.edu

 

Abstract:

 

We are now in the midst of a Multimedia Communications Revolution in which virtually every aspect of telecom is changing in ways that would have been considered unthinkable just a decade or so ago.  Perhaps the greatest challenge in realizing this communications revolution is to figure out how to provide a range of new services that seamlessly integrate text, sound, image, and video information and to do it in a way that preserves the ease-of-use and interactivity of conventional telephony, irrelevant of the bandwidth or means of access of the connection to the service.  In order to achieve this overarching goal, there are a number of technological problems that must be considered, including:

 

·       compression and coding of multimedia signals, including algorithmic issues, standards issues, and transmission issues;

·       synthesis and recognition of multimedia signals, including speech, images, handwriting, and text;

·       organization, storage, and retrieval of multimedia signals;

·       access methods to the multimedia signal;

·       searching;

·       browsing.

 

In each of these areas a great deal of progress has been made in the past few years, driven in part by the relentless growth in processing and storage capacity of VLSI chips, and in part by the availability of broadband access to and from the home and to and from wireless connections. 

It is the purpose of this talk to review the status of the technology in each of the areas listed above and to illustrate some of the challenges and limitations of current capabilities.

 

Speaker’s bio:

 

Lawrence Rabiner was born in Brooklyn, New York, on September 28, 1943.  He received the S. B., and S. M. degrees simultaneously in June 1964, and the Ph.D. degree in Electrical Engineering in June 1967, all from MIT.

From 1962 through 1964, he participated in the cooperative program in Electrical Engineering at AT&T Bell Laboratories.  During this period Dr. Rabiner worked on digital circuitry, military communications problems, and problems in binaural hearing.  Dr. Rabiner joined AT&T Bell Labs in 1967 as a Member of the Technical Staff.  He was promoted to Supervisor in 1972, Department Head in 1985, Director in 1990, and Functional Vice President in 1995.  He joined AT&T Labs in 1996 as Director of the Speech and Image Processing Services Research Lab, and was promoted to Vice President of Research in 1998 where he managed a broad research program in communications, computing, and information sciences technologies.  Dr. Rabiner retired at the end of March 2002 and is now a Professor of Electrical and Computer Engineering at Rutgers University, and the Associate Director of the Center for Advanced Information Processing (CAIP).  He also has a joint appointment as a Professor of Electrical and Computer Engineering at the University of California at Santa Barbara.

 

 

KEYNOTE 2

 

When:            Thursday, October 5, 2004, 8:15 AM – 9:15 AM

Where:           Crystal Ballroom

 

Title:              Getting Internet Video Ready for Prime Time

 

Speaker:        Bernd Girod

Stanford University, USA

bgirod@stanford.edu

 

Abstract:      

A decade after the introduction of video streaming, Internet video is finally getting ready for prime time. Despite the well-known challenges of congestion, packet loss, and delay jitter, Internet video will soon look better than conventional broadcast television. This is in no small measure due to advances in media processing and communication, which enable efficient and robust media delivery. In this talk, I review recent advances and current challenges in Internet video delivery and consider some of the key questions of real-time transport. Is best-effort good enough? How hard are media delivery deadlines? How can congestion be avoided? Should transport mechanisms be media-aware? Should we bother with packet scheduling? Can multipath routing help? I will argue that a cross-layer paradigm comprising network-adaptive media processing and media-aware transport is essential for superior system performance and show examples from our current research on IPTV delivery over wireless home networks and P2P live video multicast.

 

 

Speaker’s bio:         

                       

Bernd Girod is Professor of Electrical Engineering and (by courtesy) Computer Science in the Information Systems Laboratory of Stanford University, California. He was Chaired Professor of Telecommunications in the Electrical Engineering Department of the University of Erlangen-Nuremberg from 1993 to 1999. His research interests are in the areas of networked media systems and video signal compression. Prior visiting or regular faculty positions include MIT, Georgia Tech, and Stanford. He has been involved with several startup ventures as founder, director, investor, or advisor, among them Vivo Software, 8x8 (Nasdaq: EGHT), and RealNetworks (Nasdaq: RNWK). Since 2004, he serves as the Chairman of the new Deutsche Telekom Laboratories in Berlin. He received the Engineering Doctorate from University of Hannover, Germany, and an M.S. Degree from Georgia Institute of Technology. Prof. Girod is a Fellow of the IEEE.

 

 

KEYNOTE 3

 

When:            Friday, October 6, 2004, 8:15 AM – 9:15 AM

Where:           Crystal Ballroom

 

Title:             Advances in Mobile Computing

 

Speaker:        Ya-Qin Zhang

Microsoft

yzhang@microsoft.com

 

The keynote will be presented by Jenq-Neng Hwang on Ya-Qin Zhang's behalf.

Abstract:      

We see a continued convergence of mobile, computer, and consumer electronics industry with rapid advances in smart devices, communications and networking, and new applications and services.  New intelligent devices are emerging with powerful 32-bit embedded processors and multi-tasking operating systems. The continued evolution from 2G/2.5G to 3G and advances in PAN/LAN/WAN lead to all-IP infrastructure with high-speed access, multi-radio technology, always-on capability, and seamless connectivity. While voice continues to be a critical driving force for synchronous communications, new data-centric applications, such as messaging, media, push-to-talk, emails, web browsing, location-based service, and corporate data access, create most exciting opportunities for operators, OEM/ODM, developers, consumers, and business.

This talk presents Microsoft’s vision on seamless mobile computing that enables (a) deep connectivity of mobile devices with desktop PCs, backend servers, web, and other devices; (b) automatic detection, seamless roaming and soft handover in a multi-radio environment with the “best” QoS and consistent user experiences; (c) natural user interface with voice dialing, voice command, TTS, ink, and vision; and (d) a powerful platform and ecosystem with  compelling applications and services developed by ISVs, OEMs, and operators. The talk will discuss new advances in the embedded and mobile space, and in particular highlight a plethora of new devices built on Windows CE, PocketPC, and Smartphone platforms. The talk will also touch on a few examples of our active research work on mobility and networking across Microsoft Research labs, including seamless roaming, mobile media, navigation, and mesh networks.

 

Speaker's bio:

Ya-Qin Zhang is Corporate Vice President of Microsoft Corporation, and President of Microsoft China Research and Development Group. He was the Corporate Vice President of Microsoft Corporation, responsible for product development of Microsoft’s Mobile and Embedded Division, including WinCE operating system, Smartphone, PocketPC, and other Windows Mobile platform and devices. Before then he was the Managing Director of Microsoft Research Asia, Microsoft’s basic research arm in Asia-Pacific region. From 1994 to 1999, he was the Director of Multimedia Technology Laboratory at Sarnoff Corporation in Princeton, NJ (RCA Laboratories). He was with GTE (now Verizon) Corp. in Waltham, MA from 1989 to 1994. He has published over 200-refereed papers in leading international conferences and journals. He has been granted over 50 US patents in digital video, Internet, multimedia, wireless and satellite communications. Many of the technologies he and his team developed have become the basis for start-up ventures, commercial products, and international standards. He serves on the Board of Directors of five high-tech IT companies. He served as the Editor-In-Chief for the IEEE Transactions on Video Technology, editorial boards of seven other professional journals, and over a dozen conference committees. He has been a key contributor to the ISO/MPEG and ITU standardization efforts in digital video and multimedia. Ya-Qin is a Fellow of IEEE.

Ya-Qin received his B.S. and M.S. in Electrical Engineering from the University of Science and Technology of China (USTC) in 1983 and 1985. He received his Ph.D in Electrical Engineering from George Washington University, Washington D.C. in 1989. He had executive business training from Harvard University.

 

 

KEYNOTE 4

 

 

When:            Banquet                     

Where:           TBA

 

Title:              A personal history of Perceptual Coding --Tripping over the cobblestones to MP3 and beyond

 

Speaker:        James D. (JJ) Johnston

Microsoft

jamesdj@windows.microsoft.com

 

Abstract:      

 

Somewhere in the middle of the 1970’s, while I was working on speech coding at Bell Labs, it became completely obvious that there was more to the world than SNR or any kind of Mean Squared Error criterion.  When the Alliant ™ Minicomputers arrived at Bell Labs in the early 1980’s, I wrote a test program for the first one that built a perceptual model, and applied it to an FFT filterbank (overlap add, non-critically sampled), using lots of features of the new language then available, mostly in order to test the performance of the computer. The memory size of the older computers had prevented this kind of work, it would not fit in the memory available.  The results were surprising, in fact, considering perception led to an enormous drop in the necessary bit rate, so much so that I had to spend some time ensuring that the results weren’t just a wild mistake.   The end result of this programming exercise was called “PXFM”. After working on PXFM, Bob Safranek and I, with some collaboration from Phil Chou (then a new Bell Labs MTS) and Ruth Rosenholtz, created a perceptual image coder called “PIC”, and determined rather quickly that even primitive coding methods, along with some guidance from perceptual models, worked remarkably well for images, as well.

Just about then, a surprise encounter with Karlheinz Brandenburg at an ICASSP, where we discovered that we could have literally given each others’ posters without prior study, started a long collaboration with Karlheinz, then Jurgen Herre of FHG-IIS in Erlangen. From that point we went to work in MPEG, back to the lab for a while, and then back to MPEG, with MP3 and the MPEG-2 AAC standards as a result.   From that point, I moved on to study the interactions of the human being with a soundfield, and into soundfield capture techniques, and then into mechanisms for “fixing” the playback system in less than optimal circumstances, each time considering as the primary issue the response of the human auditory system.   My experience over all these years is that when you’re designing anything that will be presented to the human being, be it audio, images, video, or some other combination, the study of human perception needs to be a primary consideration.

 

 

Speaker's bio:

 

JJ is currently employed at Microsoft Corporation as an audio architect. He is retired from AT&T Labs - Research, quartered at Florham Park, NJ, Speech Processing Software and Technology Research Department. Before that, he was employed by AT&T Bell Laboratories, in the Acoustics Research Department under Dr. J. L. Flanagan, and in the Signal Processing Research Department.

His original assignments involved using analog signal processing to do speech coding (APCM, ADPCM, SBC) for testing of algorithms, sampling rates, and quantizer resolutions. His first IEEE paper detailed the hardware construction of an ADPCM implementation using analog multipliers and integrators to provide both step-size and predictor "calculation", in a form that allowed sampling rate and quantizer resolution changes.

Since then, he has worked in analog signal processing, speech coding, voice privacy, quadrature mirror filter design, and perceptual coding of both audio and images. During this work on perceptual audio coding, he has been the primary investigator of the early PXFM audio coder which was reported on at the ASSP Digital Audio Meeting in Mohonk, NY in 1986 and a co-inventor and standards proponent of the ASPEC algorithm, the quality leader in the MPEG-1 audio competition.

During this time, he also did an investigation of coding of still-frame images using a forward-driven perceptual model with Dr. R. J. Safranek, also of AT&T Bell Laboratories. This image coder, called PIC (for Perceptual Image Coder), used very simple techniques to provide state of the art still-image compression. He was until recently the primary researcher and inventor of AT&T's contributions to the MPEG_2 AAC audio coding algorithm. He also represented AT&T in the ANSI accredited group X3L3.1, and X3L3.1 in the ISO-MPEG-AUDIO (AAC) arena in support of the AAC algorithm.

He received his BSEE and MSEE from Carnegie-Mellon University, with side interests in mathematics, radio broadcasting and coherent image signal processing. In 1977, he was elected a Fellow of the Audio Engineering Society for his work on perceptual coding of audio. In February 2001, he received a New Jersey Inventor of the Year award for his contributions to MP3 and audio coding in general. He was elected a Fellow of the IEEE in 2002. He is the 2006 Recipient of the IEEE James L. Flanagan Speech and Audio Processing Technical Field Award, "For pioneering research in Perceptual Audio Coding and Contributions to its Standardization".