Although a whole band playing together may make a song what it is, sometimes it's interesting to know what an individual instrument within a band sounds like on its own. Thanks to a new system developed at MIT, viewers of musical performance videos should soon be able to find out.
Known as PixelPlayer, the artificial intelligence-based system was trained on over 60 hours of videos. By analyzing the telltale movements and distinct groupings of pixels associated with images of specific instruments being played, it was gradually able to get better and better at identifying those instruments in videos. At the same time, it also learned to recognize the unique sound waves associated with each of those instruments.
As a result, the current version of PixelPlayer is able to analyze a video that it's never seen before (that hasn't been digitally annotated in any way), and automatically identify the appearance and corresponding sound of over 20 commonly-used instruments within it. Users just click on any of those instruments onscreen, and the program will isolate its sound from those of the other instruments – it's then possible to increase or decrease the volume of that instrument as desired, or even alter its sound.
Once perfected and trained to identify more instruments, it is hoped that PixelPlayer could be used for applications such as editing the musical mix of songs, or even hearing what a song would sound like if a different type of instrument were used – an acoustic guitar instead of an electric one, for example.
The technology could also conceivably be used by robots, to differentiate between noise-making objects such as vehicles and animals.
"We expected a best-case scenario where we could recognize which instruments make which kinds of sounds," says PhD student Hang Zhao, lead author of a paper on the research. "We were surprised that we could actually spatially locate the instruments at the pixel level. Being able to do that opens up a lot of possibilities."
PixelPlayer is demonstrated in the following video.
Source: MIT CSAIL