Microsoft’s Musical Talent
By Hui Ma, China Internet Weekly
September 20, 2009 1:00 PM PT

Allan is at peace — a cup of tea by his side, a novel in his hands, the voices of Lisa Ono and then Norah Jones providing the soft accompaniment for a cozy afternoon spent lounging in a wicker chair. Allan’s invigorating time at the gym, however, requires a different soundtrack — punk rock, for example, that can keep pace with a run on the treadmill. Music Steering technology, developed by Microsoft Research Asia, is smart enough to pick music most suitable to one’s situation or preference — regardless of whether you appreciate R&B or blues, or are driving or jogging. Technology is getting to know people through human-machine interactions.

"I spend most of my working time dealing with sounds," said Dr. Lie Lu of the Speech Group at Microsoft Research Asia, after singing a love song in place of an opening remark. Holding his cell phone close to his mouth, he sang, "You ask how deeply I love you, and how much I love." In a few seconds, Teresa Teng's well known "The Moon Represents My Heart" track stored on the phone was automatically located and played.

Searching for songs just by humming a melody is only one of Lu and his team’s numerous voice- and music-related research subjects.

In 2002, Microsoft Research Asia began to look at basic research on rhythm analysis, music classification, and music mood detection and then set off in 2005 to bring these features together into a convenient application. Throughout the process, Lu and his colleagues gathered user feedback to better understand people's music appreciation habits. Lu said he was constantly discussing interface details with designers at Microsoft Research Asia to make the application cooler and more convenient for users.

Music knows no national or linguistic boundaries, and can have real emotional resonance among people. Lu and his colleagues strive to really understand music and to enable music players to understand people's minds. The result of their effort is Allan’s blissful afternoon of soft tunes and punk-powered work-out session at the fitness center.

A Music Player That Knows Its User

Choosing appropriate music from the hundreds of songs stored on a music player (such as Zune, iPod, and smart phones) is a common occurrence. "You may not have the time to dig for the songs you crave, and you may also prefer different styles now and then,” said Lu. “The ‘shuffle’ function in products on the market is merely able to play music in a random order to meet user’s basic needs." Lu and his colleagues were motivated to solve this problem with their Music Steering technology.

"Music Steering provides a 'smart shuffle' function that enables users to select and enjoy music more conveniently," said Lu, adding that users with "Music Steering" only need to select one piece of music and the system will automatically generate a "music station" that recommends pieces (from the player’s storage) similar to the selected song. Users can also choose pieces suitable for given circumstances using the Mood Filter — selecting, for example, soft music for reading. In this process, the machine makes educated guesses about the users’ preferences before composing a preliminary list. Users can either listen to the recommended songs, or continue to filter the list by removing unwanted items. The machine then performs further analysis of the user's musical preferences based on this feedback and automatically adjusts the content of the Music Station. In this way, recommendations get increasingly closer to the user’s "taste." With an interactive music list generated through continuous content analysis, recommendations and filtering, users can easily find the songs they desire.

A Simple Song

As Lu and his colleagues see it, even a song that seems very simple still contains various musical elements including different styles, instruments, tones, melodies and rhythms. So a “simple song” doesn’t really exist.

Lu’s career as a researcher of music has turned his ears into scalpels that are able to deconstruct songs into smaller elements -- rhythm, musical instruments, and tones. "We now characterize music based on 10 selected elements. Through quantification, detection and classification a rough framework was developed for a basic description of musical pieces.” Lu explained “The ‘style’ criterion has more than a dozen categories such as pop, country, rock, and blues, the same for ‘instrument’; emotional feelings fall into three categories -- positive, negative, and neutral; ‘rhythm’ is based on intensity and speed."

The most difficult part of music analysis is a situation in which a variety of musical instruments and multiple melodies are mixed. Songs of varying styles are played with different combinations of instruments, and have complex connections among verses. What is worse is that there is no standard definition of tones in the academic community, so the research team had to extract music characteristics in various ways.

Lu said he wants Music Steering to work as an algorithm that can decide which elements of music count more than others, so as to better identify user preferences. In addition, when users only have a rough idea about the songs they want to listen to, the Mood Filter can help the user to set his or her options. In addition, users can pre-set a number of scenarios, — “before sleep,” “exercise,” or “reading,” to name just few — to help search for suitable songs. Some of these ideas are still very premature, or conceptual, but Lu and his colleagues are bringing them closer to reality.

Microsoft's Voice of Innovation

"I have been researching voice and audio processing since my first day here at Microsoft, and gradually I’ve been able to do precise detection. The accuracy rate of automated music annotation now stands at no higher than 60 percent, but users are already reasonably satisfied." Music Steering technology will likely mature alongside Lu’s own growth at Microsoft Research Asia.

"We are going to improve,” Lu said confidently. “Most of our current property settings merely target Western pop music instead of classical music, and the special attributes of songs in Chinese, Japanese and Korean are still not accurate enough. Moreover, some of the training data we have is not necessarily high in quality because none of us are professional musicians. So we would like to work with companies in this sector to use their professional training models to enable more accurate and complete automatic music analysis."

Music Steering technology can be used in handheld devices, music players and computers, but its automatic music analysis module requires high-intensity computation, so Lu and his team are trying to improve its speed, applicability, and integrity without losing accuracy.

Putting technical achievements into Microsoft products is something that Lu and his colleagues take pride in. The Microsoft Product Group is now very interested in Music Steering, so Lu is frequently exchanging ideas with counterparts in other teams for product integration tests. During this process, cells of innovation, one after another, are being implanted into Microsoft’s dynamic future.

Translated From