Michiel Bacchiani

Michiel Bacchiani


Employement:
Senior Staff Research Scientist, Google Inc., New York, NY (2010 - present)
Focus on novel algorithms for speech recognition. Manage a team of nine scientists and software engineers with focus on transcription technology. Support automatic captioning for YouTube videos New York Times, New York Times, AP, ZDNet, Wall Street Journal, Businessweek, PC World, Telegraph and Voicemail Transcription supporting Google Voice Pogue's New York Times.
As of May 2013, manage the acoustic modeling team of the speech group in Google. Responsible for developing new algorithms and infrastructure for the acoustic models backing all Google speech properties, including its flagship VoiceSearch application. Manage a team of 16 research scientists and software engineers.

Staff Research Scientist, Google Inc., New York, NY (2005 - 2010)
Tech-lead manager as of 2008 for the large vocabulary continuous speech transcription effort. Principal architect of the acoustic modeling training infrastructure for automatic speech recognition.

Member of Technical Staff, IBM Research, Yorktown Heights, NY (2004-2005)
Lead the development for a competitive automatic speech recognition evaluation within the TC-STAR speech-to-speech translation project in English and Spanish.

Senior Technical Staff Member, AT&T Labs -- Research, Florham Park, NJ (1999-2004)
Designed, trained and implemented an LVCSR system for voicemail transcription. Combined transcription, information extraction and information retrieval to create the SCANMAIL voicemail navigation prototype. Adapted the SCANMAIL system for a network center customer care application. Developed an LVCSR system for Broadcast News transcription and processed the 600 hour corpus for the Spoken Document Retrieval track of the DARPA TREC-8 evaluation.

Research Associate/Visiting Researcher, Advanced Telecommunication Research laboratories, Kyoto, Japan (1993, 1995, 1996, 1997, 1998)
Worked on simultaneous training of a feature extractor and neural network classifier using the minimum classification error criterion. Initial development of an automatic unit design algorithm. Design and implementation of algorithms for LVCSR such as multi-pass search using lattice re-scoring and vocal tract length normalization.

Professional Activities
Elected Member of the Speech Technical Committee of the IEEE Signal Processing Society (first term 2003-2006, second term 2013-2016)Editorial Board Member of Speech Communication

Workshop Co-organizer
Organized the International Speech Communication Association (ISCA) tutorial and research workshop on ``Prosody in Speech Recognition and Understanding'', October 22-24, 2001, together with Julia Hirschberg and Diane Litman.

Journal Reviewer
  • IEEE Transactions on Speech and Audio Processing
  • Speech Communication
  • Pattern Analysis and Applications
  • Knowledge and Information Systems

Invited Talks
Research Lab at I/O, 2014
Life of a Mobile Interaction

Keynote at ISCSLP 2014
Large Scale Neural Network Optimization for Mobile Speech Recognition Applications

Research and Teaching
Summer Intern Mentor, Google Inc., 2014
Focus on use of articulatory feature bottleneck feature extraction for speech recognition.

Summer Intern Mentor, Google Inc., 2007
Focus on use of confidence scores in acoustic model adaptation resulted in an ICASSP publication.

Summer Intern Mentor, AT&T Labs, 2003
Focus on name modeling within a voicemail transcription system resulted in an ICASSP publication.

Research Assistant, Boston University, 1996-1999
Focus on LVCSR algorithms, in particular automatic, data-driven design of acoutic units.

Invited Participant, Summer Workshop on Speech Recognition at Johns Hopkins University, 1996
Focus on hidden speaking mode conditional modeling for automatic speech recognition.

Intern Researcher, Institute for Perceptual Research (IPO), Eindhoven, The Netherlands, 1994
Focus on automatic tracking of formants in simulated response patterns of neurons in the cochlear nucleus using hidden Markov models.


Education
Ph. D., Boston University, Boston, 1999
Advisor: Mari Ostendorf, Thesis: Speech recognition system design based on automatically derived units.

Ir. (MS Electrical Engineering), Technical University Eindhoven, Endhoven, The Netherlands, 1994
Thesis: Automatic detection of formant tracks in simulated response patterns of neurons in the cochlear nucleus.


Publications

M. Bacchiani, A. Senior and G. Heigold, ``Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition,'' In Proceedings of the European Conference on Speech Communication and Technology September 2014, Singapore. [pdf]

E. McDermott, G. Heigold, P. Moreno, A. Senior and M. Bacchiani, ``Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data,'' In Proceedings of the European Conference on Speech Communication and Technology September 2014, Singapore. [pdf]

C. Kim, K. Chin, M. Bacchiani and R. Stern, ``Robust speech recognition using temporal masking and thresholding algorithm,'' In Proceedings of the European Conference on Speech Communication and Technology September 2014, Singapore. [pdf]

M. Bacchiani and D. Rybach, ``Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models, '' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2014, Florence, Italy. [pdf]

G. Heigold, E. McDermott, V. Vanhoucke, A. Senior and M. Bacchiani, ``Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2014, Florence, Italy. [pdf]

A. Senior, G. Heigold, M. Bacchiani and H. Liao, ``GMM-Free DNN Training'', In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2014, Florence, Italy. [pdf]

O. Siohan and M. Bacchiani, ``iVector-based Acoustic Data Selection,'' In Proceedings of the European Conference on Speech Communication and Technology September 2013, Lyon, France. [pdf]

M. Bacchiani, ``Rapid Adaptation for Mobile Speech Applications,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2014, Vancouver, Canada. [pdf]

C. Alberti and M. Bacchiani, ``Discriminative Features for Language Identification,'' In Proceedings of the European Conference on Speech Communication and Technology September 2011, Florence, Italy. [pdf]

Z. Liu and M. Bacchiani, ``TechWare: Mobile Media Search Resources [Best of the Web],'' In IEEE Signal Processing Magazine, Volume 28, Issue 4, pp. 142-145, July 2011. [pdf]

H. Liao, C. Alberti, M. Bacchiani and O. Siohan, ``Decision Tree State Clustering with Word and Syllable Features,'' In Proceedings of the European Conference on Speech Communication and Technology September 2010, Chiba, Japan. [pdf]

C. Alberti, M. Bacchiani, A. Bezman, C. Chelba, A. Dorfa, H. Liao, P. Moreno, T. Power, A. Sahuguet, M. Shugrina and O. Siohan, ``An Audio Indexing System for Election Video Material,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2009, Taipei, Taiwan. [pdf]

A. Gravano, M. Jansche and M. Bacchiani, ``Restoring Punctuation and Capitalization in Transcribed Speech,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2009, Taipei, Taiwan. [pdf]

M. Bacchiani, F. Beaufays, J. Schalkwyk, M. Schuster, B. Strope, ``Deploying GOOG-411: Early Lessons in Data, Measurement and Testing,''In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2008, Las Vegas, USA. [pdf]

C. Gollan and M. Bacchiani, ``Confidence Scores for Acoustic Model Adaptation,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2008, Las Vegas, USA. [pdf]

M. Bacchiani, M. Riley, B. Roark and R. Sproat, ``MAP adaptation of stochastic grammars,'' In Computer Speech and Language Vol. 20, Issue 1, January 2006, pp. 41-68. [pdf]

O. Siohan and M. Bacchiani, ``Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing,'' to appear in Proceedings of the European Conference on Speech Communication and Technology September 2005, Lisbon, Portugal. [pdf]

M. Bacchiani, B. Roark and M. Saraclar, ``Language model adaptation with MAP estimation and the perceptron algorithm,'' In Proceedings of the HLT-NAACL May 2004, Boston, USA. [pdf]

M. Bacchiani and B. Roark,``Meta-data Conditional Language Modeling,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2004, Montreal, Canada. [pdf]

S. Maskey, M. Bacchiani, B. Roark and R. Sproat,``Improved name recognition with meta-data dependent name networks,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2004, Montreal, Canada. [pdf]

B. Roark and M. Bacchiani, ``Supervised and unsupervised PCFG adaptation to novel domains,'' In Proceedings of the HLT-NAACL pp. 205-212, July 2003, Edmonton, Canada. [pdf]

M. Bacchiani and B. Roark, ``Unsupervised Language Model Adaptation,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing 2003. [pdf]

M. Bacchiani, ``Combining Maximum Likelihood and Maximum A Posteriori Estimation for Detailed Acoustic Modeling of Context Dependency,'' In Proceedings of the International conference on Spoken Language Processing pp. 2593-2596, September 2002, Denver, USA. [pdf]

S. Whittaker, J. Hirschberg, B. Amento, L. Stark, M. Bacchiani, P. Isenhour, L. Stead, G. Zamchick and A. Rosenberg, ``SCANMail: a voicemail interface that makes speech browsable, readable and searchable,'' In Proceedings of the conference on Computer Human Interaction April 2002, Minneapolis, USA. [pdf]

Julia Hirschberg, Michiel Bacchiani, Phil Isenhour, Aaron Rosenberg, Larry Stead, Steve Whittaker, Gary Zamchick, ``Audio Browsing and Search in the Voicemail Domain,'' In Proceedings of NLPRS-2001 November 2001, Tokyo, Japan. [pdf]

J. Hirschberg, M. Bacchiani, D. Hindle, P. Isenhour, A. Rosenberg, L. Stark, L. Stead, S. Whittaker and G. Zamchick,``SCANMail: Browsing and Searching Speech Data by Content,'' In Proceedings of the European Conference on Speech Communication and Technology September 2001, Aalborg, Denmark. [pdf]

A. Rosenberg, J. Hirschberh, M. Bacchiani, S. Parthasarathy, P. Isenhour and L. Stead,``Caller Identification for the SCANMail Voicemail Browser,'' In Proceedings of the European Conference on Speech Communication and Technology September 2001, Aalborg, Denmark. [pdf]

M. Bacchiani,``Automatic Transcription of Voicemail at AT\&T,'' In Proceedings of the International Conference on Acoustics,Speech and Signal Processing May 2001, Salt Lake City, USA. [pdf]

M. Bacchiani, J. Hirschberg, A. Rosenberg, S. Whittaker, D. Hindle. P. Isenhour, M. Jones and L. Stark, ``SCANMail: Audio Navigation in the Voicemail Domain,'' In Proceedings of the workshop on Human Language Technology March 2001, San Diego, USA. [pdf]

M. Bacchiani,``Using Maximum Likelihood Linear Regression for Segment Clustering and Speaker Identification,'' In Proceedings of the International conference on Spoken Language Processing Vol. IV, pp. 536-539, October 2000, Beijing, China. [pdf]

A.Singhal, S.Abney, M.Bacchiani, M.Collins, D.Hindle and F.Pereira, ``AT&T at TREC-8,'' In Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 317-330, November 1999, Gaitherburg, USA. [pdf]

M.Bacchiani and M. Ostendorf, ``Joint Lexicon, Acoustic Unit Inventory and Model Design,'' In Speech Communication no. 29, pp. 99-114, 1999. [pdf]

M.Bacchiani and M. Ostendorf, ``Using Automatically-Derived Acoustic Sub-word Units in Large Vocabulary Speech Recognition,'' In Proceedings of the International conference on Spoken Language Processing November 1998, Sydney, Australia. [pdf]

M. Bacchiani and M. Ostendorf, ``Joint Acoustic Unit Design and Lexicon Generation,'' In Proceedings ESCA Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition pp. 7-12, May 1998, Kerkrade, The Netherlands. [pdf]

M. Ostendorf, B. Byrne, M. Bacchiani, M. Finke, A. Gunawardana, K. Ross, S. Roweis, E. Shriberg, D. Talkin, A. Waibel, B. Wheatley and T. Zeppenfeld, ``Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode,'' In Proceedings of the International conference on Spoken Language Processing October 1996, Philadelphia, USA. [pdf]

T. Fukada, M. Bacchiani, K. Paliwal and Y. Sagisaka, ``Speech Recognition Based on Acoustically Derived Segment Units,'' In Proceedings of the International conference on Spoken Language Processing pp. 1077-1080, October 1996, Philadelphia, USA. [pdf]

M. Bacchiani, M. Ostendorf, Y. Sagisaka and K. Paliwal, ``Design of a Speech Recognition System based on Non-Uniform Segmental Units,'' In Proceedings of the International Conference on Acoustics, Speech and Signal Processing pp. 443-446, May 1996, Atlanta, USA. [pdf]

M. Bacchiani, M. Ostendorf, Y. Sagisaka and K. Paliwal, ``Unsupervised Learning of Non-Uniform Segmental Units for Acoustic Modeling in Speech Recognition,'' In Proceedings of the IEEE workshop on Automatic Speech Recognition pp. 141-142, December 1995, Snowbird, USA. [pdf]

K.K. Paliwal, M. Bacchiani and Y. Sagisaka, ``Minimum Classification Error Training Algorithm for Feature Extractor and Pattern Classifier in Speech Recognition,'' Eurospeech '95, Vol. 1, pp. 541-544, September 1995, Madrid, Spain. [pdf]

K.K. Paliwal, M. Bacchiani and Y. Sagisaka, ``Simultaneous Design of Feature Extractor and Pattern Classifier using the Minimum Classification Error Training Algorithm,'' In Proceedings of the IEEE workshop on Neural Networks for Signal Processing pp. 67-76, September 1995, Boston, USA. [pdf]

Michiel Bacchiani and Kiyoaki Aikawa, ``Optimization of time-frequency masking filters using the minimum error classification criterion,'' In Proceedings of the International Conference on Acoustics, Speech and Signal Processing Vol. 1, pp 485-488, April 1994, Adelaide, Australia. [pdf]