JunLin Li, Li-wei He, and Dinei Florencio
Traditional multiparty audio conferencing uses a star-shaped topology where all the clients connect to a central MCU (Multipoint Control Unit). The MCU mixes the signals from the speakers, encodes it, and sends back the encoded signal to each client. To prevent the speakers from hearing their own voices, the MCU has to produce and encode a different mixed signal for each speaker. As a result, the CPU load on the MCU increases proportionally to the number of speakers in the conference. In this paper, we introduce a new conferencing architecture, where the MCU produces a single encoded signal sum of all received signals and each client is responsible for removing its own signal if necessary. This architecture can substantially reduce CPU load on the MCU. The major challenge, however, is that the client’s original speech is non-linearly distorted by the MCU encoding process. Simply subtracting the original speech from the mixed signal would produce an echo-like distortion. We solve that problem using a novel algorithm which completely removes the echo with minimal artifacts. Mean Opinion Score (MOS) results imply that the proposed algorithm works well, making the proposed multiparty audio conferencing architecture promising.