Research references
For more information about the research behind the IBM Watson® Text to Speech service, see the following documents. IBM® researchers wrote or contributed to all of these papers.
- Eide, Ellen M., and Raul Fernandez. Database Mining for Flexible Concatenative Text-to-Speech. Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 4 (2007): pp. 697-700.
- Eide, Ellen, Raul Fernandez, Ron Hoory, Wael Hamza, Zvi Kons, Michael Picheny, Ariel Sagi, Slava Shechtman, and Zhi Wei Shuang. The IBM Submitted to the 2006 Blizzard Text-to-Speech Challenge. Blizzard Challenge Workshop 2006.
- Fernandez, Raul, David Daws, Guy Lorberdam, Slava Shechtman, and Alexander Sorin. Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis. Proceedings Interspeech (2022): publication pending.
- Fernandez, Raul, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory. Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System. Proceedings Interspeech (2015), pp. 1606-1610.
- Fernandez, Raul, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory. Prosody Contour Prediction with Long Short-Term Memory, Bi-directional, Deep Recurrent Neural Networks. Proceedings Interspeech (2014), pp. 2268-2272.
- Fernandez, Raul, Zvi Kons, Slava Shechtman, Zhi Wei Shuang, Ron Hoory, Bhuvana Ramabhadran, and Yong Qin. The IBM Submitted to the 2008 Text-to-Speech Blizzard Challenge. Blizzard Challenge Workshop 2008.
- Fernandez, Raul, and Bhuvana Ramabhadran. Automatic Exploration of Corpus-Specific Properties for Expressive Text-to-Speech: A Case Study in Emphasis. Proceedings of the Sixth ISCA Workshop on Speech Synthesis (August 2007): pp. 34-39.
- Fernandez, Raul, Raimo Bakis, Ellen Eide, Wael Hamza, John Pitrelli, and Michael A. Picheny. The 2006 TC-STAR Evaluation of the IBM Expressive Text-to-Speech Synthesis System. Speech-to-Speech Translation Workshop, Barcelona, Spain (2006), pp. 175-180.
- Kons, Zvi, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, and Ron Hoory. High quality, lightweight and adaptable TTS using LPCNet. Submitted to Interspeech (2019).
- Pitrelli, John F., Raimo Bakis, Ellen M. Eide, Raul Fernandez, Wael Hamza, and Michael A. Picheny. The IBM Expressive Text-to-Speech Synthesis System for American English. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14(4) (July 2006): pp. 1099-1108.
- Rendel, Asaf, Raul Fernandez, Ron Hoory, and Bhuvana Ramabhadran. Using Continuous Lexical Embeddings to Improve Symbolic-Prosody Prediction in a Text-to-Speech Front End. Proceedings ICASSP (2016), pp. 5655-5659.
- Shechtman, Slava. Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System. Proceedings of the Sixth ISCA Workshop on Speech Synthesis (August 2007): pp. 234-239.
- Shuang, Zhi-Wei, Raimo Bakis, Slava Shechtman, Dan Chazan, and Yong Qin. Frequency warping based on mapping formant parameters. Proceedings of the Ninth International Conference on Spoken Language Processing (ICSLP), Interspeech (2006): pp. 2290-2293.