Mark A. Hasegawa-Johnson
(he/him/his)
Professor
Primary Affiliation
Biologically Informed Artificial IntelligenceAffiliations
Status Full-time Faculty
Home Department of Electrical and Computer Engineering
Phone 333-0925
Email jhasegaw@illinois.edu
Address 2011 Beckman Institute, 405 North Mathews Avenue
-
Biography
Mark Hasegawa-Johnson received his Ph.D. from MIT in 1996. He is an assistant professor in the University of Illinois department of Electrical and Computer Engineering and a full-time faculty member in the Artificial Intelligence group at the Beckman Institute. His field of interest is speech production and recognition by humans and computers, including landmark-based speech recognition, integration of prosody in speech recognition and understanding, audiovisual speech recognition, computational auditory scene analysis, and biomedical imaging of the muscular and neurological correlates of speech production and perception.
-
Honors
Member, Articulograph International Steering Committee; CLSP Workshop leader, "Landmark-Based Speech Recognition" (2004), Invited paper; NAACL workshop on Linguistic and Higher-Level Knowledge Sources in Speech Recognition and Understanding (2004); List of faculty rated as excellent by their students (2003); NSF CAREER award (2002); NIH National Research Service Award (1998).
-
Research
Human speech perception brings together abilities that have evolved either biologically or culturally over very long time periods in order to simultaneously extract semantic, phonemic, and paralinguistic information from a robust but complicated time-frequency code. Machine learning techniques excel at finding the optimum setting of parameters for a pre-specified speech recognition model structure, but machine learning techniques are not very good at choosing the right model structure. Dr. Hasegawa-Johnson's research seeks to apply higher-level knowledge from linguistics and psychology in order to specify the structure of machine learning models for automatic speech recognition. For example, machine learning models are capable of learning the class-conditional distributions of acoustic parameter vectors, but in speech recognition, it is not always clear how the "class" should be defined. The landmark-based speech recognition theory of Ken Stevens, based on several decades of linguistics research, suggests that phoneme boundaries form more acoustically invariant classes than do phoneme segments. Based on Stevens' theory, one of Dr. Hasegawa-Johnson's current research programs seeks to develop large vocabulary speech recognition algorithms using phoneme boundaries rather than phoneme segments as the fundamental phonological class. Likewise, several centuries of linguistic research clearly demonstrate that prosody (the melody and rhythm of natural language) influences the acoustic implementation of speech, but use of prosody in automatic speech recognition has been difficult because of the vast number of variables that have been proposed to bear salient information. By collaborating with University of Illinois linguists (Jennifer Cole and Chilin Shih), Dr. Hasegawa-Johnson has been able to select two binary prosodic distinctions considered by linguists to have the most dramatic acoustic and syntactic impact, and to show that explicit encoding of these prosodic distinctions into an automatic speech recognizer leads to reduced word error rate.
-
2014
- Kim, K.; Lin, K. H.; Walther, D. B.; Hasegawa-Johnson, M. A.; Huang, T. S., Automatic Detection of Auditory Salience with Optimized Linear Filters Derived from Human Annotation. Pattern Recognition Letters 2014, 38, 78-85, DOI: 10.1016/j.patrec.2013.11.010.
2013
- Bharadwaj, S.; Hasegawa-Johnson, M.; Ajmera, J.; Deshmukh, O.; Verma, A.; Sparse Hidden Markov Models for Purer Clusters, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 3098-3102.
- Huang, P. S.; Deng, L.; Hasegawa-Johnson, M.; He, X. D.; Random Features for Kernel Deep Convex Network, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 3143-3147.
- King, S.; Hasegawa-Johnson, M., Accurate Speech Segmentation by Mimicking Human Auditory Processing, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 8096-8100.
- Lin, K. H.; Zhuang, X. D.; Goudeseune, C.; King, S.; Hasegawa-Johnson, M.; Huang, T. S., Saliency-Maximized Audio Visualization and Efficient Audio-Visual Browsing for Faster-Than-Real-Time Human Acoustic Event Detection. ACM Transactions on Applied Perception 2013, 10, (4), DOI: 10.1145/2536764.2536773.
- Mertens, R.; Huang, P.-S.; Gottlieb, L.; Friedland, G.; Divakaran, A.; Hasegawa-Johnson, M., On the Application of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks. International Journal of Multimedia Data Engineering and Management 2013, 3, (3), 1-19.
- Sharma, H. V.; Hasegawa-Johnson, M., Acoustic Model Adaptation Using in-Domain Background Models for Dysarthric Speech Recognition. Computer Speech and Language 2013, 27, (6), 1147-1162, DOI: 10.1016/j.csl.2012.10.002.
2012
- Mahrt, T.; Cole, J.; Fleck, M.; Hasegawa-Johnson, M. F0 and the Perception of Prominence, Proceedings of Interspeech 2012, Portland, Oregon, 2012.
- Mahrt, T.; Cole, J.; Fleck, M.; Hasegawa-Johnson, M. Modeling Speaker Variation in Cues to Prominence Using the Bayesian Information Criterion, Proceedings of Speech Prosody 2012, Shanghai, 2012.
- Mathur, S.; Poole, M. S.; Feniosky, P. M.; Hasegawa-Johnson, M.; Contractor, N., Detecting Interaction Links in a Collaborating Group Using Manually Annotated Data. Social Networks 2012, DOI: doi:10.1016/j.socnet.2012.04.002.
- Mathur, S.; Poole, M. S.; Pena-Mora, F.; Hasegawa-Johnson, M.; Contractor, N., Detecting Interaction Links in a Collaborating Group Using Manually Annotated Data. Social Networks 2012, 34, (4), 515-526.
- Nam, H.; Mitra, V.; Tiede, M.; Hasegawa-Johnson, M.; Espy-Wilson, C.; Saltzman, E.; Goldstein, L., A Procedure for Estimating Gestural Scores from Speech Acoustics. Journal of the Acoustical Society of America 2012, 132, (6), 3980-3989.
- Ozbek, I. Y.; Hasegawa-Johnson, M.; Demirekler, M., On Improving Dynamic State Space Approaches to Articulatory Inversion with Map-Based Parameter Estimation. IEEE Transactions on Audio Speech and Language Processing 2012, 20, (1), 67-81.
- Rong, P. Y.; Loucks, T.; Kim, H.; Hasegawa-Johnson, M., Relationship between Kinematics, F2 Slope and Speech Intelligibility in Dysarthria Due to Cerebral Palsy. Clinical Linguistics & Phonetics 2012, 26, (9), 806-822.
- Tang, H.; Chu, S. M.; Hasegawa-Johnson, M.; Huang, T. S., Partially Supervised Speaker Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 2012, 34, (5), 959-971.
2011
- Kim, H.; Hasegawa-Johnson, M.; Perlman, A., Vowel Contrast and Speech Intelligibility in Dysarthria. Folia Phoniatrica Et Logopaedica 2011, 63, (4), 187-194.
- Lobdell, B. E.; Allen, J. B.; Hasegawa-Johnson, M. A., Intelligibility predictors and neural representation of speech. Speech Communication 2011, 53, (2), 185-194.
- Ozbek, I. Y.; Hasegawa-Johnson, M.; Demirekler, M., Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (Gmm) with Audio-Visual Information Fusion and Dynamic Kalman Smoothing. IEEE Transactions on Audio Speech and Language Processing 2011, 19, (5), 1180-1195.
- Zhuang, X. D.; Zhou, X.; Hasegawa-Johnson, M. A.; Huang, T. S., Efficient Object Localization with Variation-Normalized Gaussianized Vectors, In Intelligent Video Event Analysis and Understanding; Zhang, J., Shao, L., Zhang, L., Jones, G. A., Eds. 2011; Vol. 332, 93-109.
2010
- Kim, H.; Martin, K.; Hasegawa-Johnson, M.; Perlman, A., Frequency of Consonant Articulation Errors in Dysarthric Speech. Clinical Linguistics & Phonetics 2010, 24, (10), 759-770.
- Tang, H.; Hasegawa-Johnson, M.; Huang, T. S., Non-frontal View Facial Expression Recognition Based on Ergodic Hidden Markov Model Supervectors, IEEE International Conference on Multimedia & Expo, Singapore, 2010.
- Tang, H.; Hasegawa-Johnson, M.; Huang, T., A Novel Vector Representation of Stochastic Signals Based on Adapted Ergodic HMMs. IEEE Signal Processing Letters 2010, 17, (8), 715-718.
- Zhuang, X. D.; Zhou, X.; Hasegawa-Johnson, M. A.; Huang, T. S., Real-World Acoustic Event Detection. Pattern Recognition Letters 2010, 31, (12), 1543-1551.
- Zu, Y. H.; Hasegawa-Johnson, M.; Perlman, A.; Yang, Z., A Mathematical Model of Swallowing. Dysphagia 2010, 25, (4), 397-398.
2009
- Huang, T. S.; Hasegawa-Johnson, M. A.; Chu, S. M.; Zeng, Z.; Tang, H., Sensitive Talking Heads. IEEE Signal Processing Magazine 2009, 26, (4), 67-72.
- Yoon, P.; Huensch, A.; Juul, E.; Perkins, S.; Sproat, R.; Hasegawa-Johnson, M., Construction of a rated speech corpus of L2 learners' speech. CALICO Journal 2009, 26, (3), 662-673.
2008
- Chang, S. E.; Erickson, K. I.; Ambrose, N. G.; Hasegawa-Johnson, M. A.; Ludlow, C. L., Brain anatomy differences in childhood stuttering. Neuroimage 2008, 39, (3), 1333-1344.
- Kim, L. H.; Hasegawa-Johnson, M.; Lim, J. S.; Sung, K. M., Acoustic model for robustness analysis of optimal multipoint room equalization. Journal of the Acoustical Society of America 2008, 123, (4), 2043-2053.
- Tang, H.; Fu, Y.; Tu, J. L.; Hasegawa-Johnson, M.; Huang, T. S., Humanoid Audio-Visual Avatar With Emotive Text-to-Speech Synthesis. IEEE Transactions on Multimedia 2008, 10, (6), 969-981.
- Yoon, T.; Cole, J.; Hasegawa-Johnson, M. Detecting non-modal phonation in telephone speech, In Proceedings of Speech Prosody 2008, Campinas, Brazil, 2008.
2007
- Chen, K.; Hasegawa-Johnson, M.; Cole, J., A Factored Language Model for Prosody-Dependent Speech Recognition. In Speech Synthesis and Recognition, Kordic, V., Ed. Advanced Robotic Systems: 2007.
- Cole, J.; Kim, H.; Choi, H.; Hasegawa-Johnson, M., Prosodic effects on acoustic cues to stop voicing and place of articulation: Evidence from Radio News speech. Journal of Phonetics 2007, 35, (2), 180-209.
- Yoon, T.; Cole, J.; Hasegawa-Johnson, M. On the edge. Acoustic cues to layered prosodic domains, In Proceedings of the International Conference on Phonetic Sciences, Saarbrucken, Germany, 2007.
2006
- Zhang, T.; Hasegawa-Johnson, M.; Levinson, S. E., Cognitive state classification in a spoken tutorial dialogue system. Speech Communication 2006, 48, (6), 616-632.
- Zhang, T.; Hasegawa-Johnson, M.; Levinson, S. E., Extraction of pragmatic and semantic salience from spontaneous spoken English. Speech Communication 2006, 48, (3-4), 437-462.