Ivan Tashev - Projects

Ivan Tashev
PRINCIPAL SOFTWARE ARCHITECT
.

Audio pipeline for Kinect

This was enormous challenge and opportunity for me. Capturing sound hands free from up to four meters away when the loudspeakers blast surround sound required very good design, starting from the microphones and ending with the last digital signal processing block. The overall parameters of the pipeline should be way better than normally we have in such devices. I had the opportunity to work with an exceptional engineering team, such as Xbox, on the acoustical, mechanical, and electrical design of the audio part of the device. While most of the core technologies were created before the work on Kinect to start, putting them together into one end-to-end solution required a lot of engineering efforts. During the design process I acted as the audio architect of the device and spent four months embedded in the Xbox team helping them to finalize it.

Kinect become the first industrial device with surround sound echo cancellation, capable to capture human voices from up to four meters with enough good quality for speech recognition and communication, the first open microphone device for voice command and control (i.e. no push to talk button), and the fastest selling consumer electronic device in the history of mankind.

Microphone Array Project

Providing a hands free sound capture in modern computers for the needs of real-time communication and speech recognition (voice commands and dictation) is a technologically difficult and challenging task. Designing a real-time microphone array processing algorithm for optimal noise suppression, besides being an interesting research problem, provides additional challenges when the time comes for productizing of the software and hardware. After building the first prototypes of USB linear four element array for office use and circular eight element array for the center of the conference room table started the difficult transition from "microphone array algorithms" to "algorithms for manufactureable microphone arrays". The whole process includes working with product teams inside Microsoft and evangelization of this technology to laptop, tablet and computer monitor manufacturers outside Microsoft by publishing white papers and giving talks. The project is in very advanced phase for shipping as integrated in Windows Vista microphone array support. Article about the project is posted here, more details can be found on the Microphone Array Project web page.

Control system for Universal Laser Ranging System ULIS-630

In the pre-GPS era the way to measure precisely the geographical coordinates of given point was by watching satellites and measure the distance to them using laser ranging systems. Universal Laser Ranging System ULIS-630 was a large international project with participation of research and academic organizations from Eastern Europe (Academy of Sciences of the ex-Soviet Union in Moscow, Lithuanian Technical University in Riga, Bulgarian Academy of Sciences in Sofia, Moscow Electro-technical Institute, Technical University of Sofia, others). The optical system is with so called horizontal mount and consists of two paired telescopes: transmitting (Galileo type, focal length of 12 meters, green laser with 1 J energy of the 1ns long pulse) and receiving (Cassegrain type, 630 mm main mirror diameter, 11.5 meters focal length and photo-multiplier tube as receiver). It is controlled by a distributed computer system for tracking the visible trajectory of the satellite, firing the laser, collecting and processing the results. My work here become the core of my Ph.D. thesis. The system itself, besides everything else, offered decent view of planets and celestial objects (see picture of Jupiter here).

Application Center 2000 Pre-Flight Checks

Microsoft Application Center is a web and component clusters management software. Targeting middle range web sites it offers set of innovative features. The "Application Center 2000 Pre-Flight Checks" was designed to improve out-of-the-box experience of the IT personnel, but become the unofficial tutorial of the product. I enjoyed finding the easiest way to demonstrate the product features with minimum additional software. Parts of this document were used in the Application Center Resource Kit.

Distributed Meetings System

This was my first project in Microsoft Research. It was a big, complex system for meetings capture and recording. The capturing devices were 360 degrees RingCam, eight element circular microphone array, overview camera, whiteboard camera, etc. The project combined technologies, designed in the Collaboration, Communication and Multimedia Team and included deployment of ten of these systems in various conference rooms in Microsoft to study the users behavior and to get some feedback. See our ACM paper for more details.

Dereverberation Project

The project started with simple goal to reduce the word error rate for speech recognition purposes for distances of up to 1.5 meters. This was the summer project of my intern Daniel Allred, Ph.D. student in GeorgiaTech. During his the second internship in Microsoft Research the project was extended and re-scoped to cover some perceptual scenarios as well. See his final presentation here. More details can be found in the Dereverberation project page.

Speaker Array Project

This is pure research project for now. With my colleagues Jasha Droppo and Mike Seltzer, we decided to see what can be reused from our experience in beamforming design for microphone arrays. The loudspeaker array consists of sixteen inexpensive speakers and has linear geometry. The project was demonstrated during Microsoft Research TechFest 2007 as "Personal Audio Space" and definitely had the "Wow!" effect among the visitors in our booth. We demonstrated focusing the sound in given area and dual beam mode when you hear one music channel in one place and a second music channel in another. The attending journalists liked the demo and it was widely published in the press: WIRED Blog Network, Seattle PI, MIT Technology Review, MSR web site, many others, in different languages and from different countries. Currently we are exploring various scenarios and potential applications for this technology. See further the project page here.