|
|
| Research article summary (published 30 Mar 2002): |
Toward a model for lexical access based on acoustic landmarks and distinctive features.
Full Abstract
This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps:
(1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of "articulator-bound" features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.
Learn Faster Today Improve your study skills
Author information
Author/s: Stevens, Kenneth N (KN);
Affiliation: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge 02139-4307, USA. stevens(-atsign-)speech.mit.edu
Grants: DC02978 (Agency:United States NIDCD)
Journal and publication information
Publication Type: Journal Article; Research Support, U.S. Gov't, P.H.S.
Journal: The Journal of the Acoustical Society of America (J Acoust Soc Am), published in United States. (Language: eng)
Reference: 2002-Apr; vol 111 (issue 4) : pp 1872-91
Dates: Created 2002/05/10; Completed 2002/06/19; Revised 2007/11/14;
PMID: 12002871, status: MEDLINE (last retrieval date: 11/6/2008)
Sourced from the National Library of Medicine. Abstract text and other information may be subject to copyright.
External Links for this article (including full text providers, if available):
Click Electronic Full-text Provider Links to see options for finding the electronic full text links to this article. Note there may be a subscription or fee required for access to the full text. See our FAQ for information on finding FREE full text articles.
This article may also be located in paper journal collections available in many libraries. Use the Journal and Publication Information above to find the full article.
MeSH headings (categories)
This article was linked to the MESH Headings shown below.
Related articles
This article has not been indexed for related articles as yet, however you can still use the live related article search links below.
See a large map of 100+ related articles.