Key Takeaways
1. The “cocktail party problem” explains how people focus on one voice amid background noise.
2. MIT researchers developed an artificial neural network to simulate this human hearing ability using multiplicative feature gains.
3. The model effectively highlighted target voices in noisy environments, achieving results similar to human listeners.
4. It mimicked common human errors, such as difficulty distinguishing similar-sounding voices.
5. The findings could lead to better cochlear implants, improving concentration in noisy settings.
For many years, scientists who study the brain have been looking into the “cocktail party problem,” which describes how people can focus on one voice even when there is a lot of noise around them. Although it has been known that the brain can do this by enhancing the activity of certain neurons that respond to specific sounds, there was no effective computational model to show that this method worked in real-life situations.
New Findings from MIT
Recently, a group of researchers at the Massachusetts Institute of Technology created an artificial neural network that simulates this human hearing skill. In a paper published in Nature Human Behavior, the findings show that the brain employs a method called multiplicative feature gains. Simply put, the brain functions like a precise volume control. When listening to a specific voice, the brain increases the neural signals that correspond to that voice’s distinct features, like its tone, while lowering the volume of other sounds competing for attention.
Testing the Model’s Effectiveness
To check how well their model worked, the MIT researchers provided it with a brief audio snippet of a particular voice, followed by a noisy mix of many speakers. The model effectively highlighted the target voice, achieving results similar to human listeners in various situations. It even mimicked common errors that people make, such as difficulty in distinguishing between two similar-sounding voices.
“None of our models have had the capability that humans possess, to be alerted to a specific object or sound and then base their reaction on that object or sound. That has been a real restriction.” — Josh H. McDermott, leading author of this research paper.
Implications for Future Technology
The model also enabled the researchers to quickly examine how the position of speakers influences listening. It predicted that it is much easier to tell voices apart when they are placed side by side horizontally rather than one above the other vertically — a finding that was later validated in tests with human participants. The researchers are optimistic that this model could lead to the development of improved cochlear implants that would assist people in concentrating better in noisy settings.
Nature Human Behavior via MIT News
Source:
Link

