New cloud gaming tech from MIT and Microsoft keeps video and audio in sync
Researchers have developed a new cloud gaming system that uses low-level white noise to accurately synchronize separated audio and video streams. The unique approach let gamers see and hear things at the right time, even with poor microphone quality or in the presence of background noise.
Cloud gaming really took off when COVID-19 entered the world stage, and many of us were required to stay home. According to Statista, the number of global users in 2019 was 45.9 million; so far, in 2023, users total 295 million.
In a typical cloud gaming setup, a server receives gaming inputs and audio chat streams from gaming accessories such as controllers and headsets. In response, it simultaneously generates two separate media streams for the player. The first is a game-screen stream comprising game audio and video intended for a screen device such as a TV or tablet. The second is a game-accessory stream intended for controllers and gaming audio headsets, comprising game audio mixed with chat from fellow players and haptic feedback such as controller vibrations.
These two streams are usually conveyed over separate networks, which can lead to a lack of synchronization – inter-stream delay – between the two, resulting in video lag, a sluggish haptic response, and a poor gaming experience. Researchers from MIT teamed up with Microsoft Research to develop Ekho, a system that uses a unique technique to address inter-stream delay. They’ll present a paper describing their system at the 2023 ACM Special Interest Group on Data Communication (SIGCOMM) conference at Columbia University, New York City, from the 10th to the 14th of September.
The researchers began by looking at the problem at the heart of inter-stream delay: clock synchronization.
“If the controller and the screen could look at their watches and at the same time see the same thing, then we could synchronize everything to the clock,” said Pouya Hamadanian, lead author of the paper. “But a lot of theoretical work on clock synchronization shows that there are certain bounds you can never overcome.”
A common method of addressing clock synchronization issues is ping-pong messaging, where a device sends a ping message to the server, which responds with a pong; the time it takes for the message’s round trip is used to calculate network latency. However, this method can be unreliable because it may take more time for the message to reach the server than it does for the return message. The researchers say that humans can perceive inter-stream delay once it reaches 10 ms.
“So, if something happens on the screen, we want it to happen within 10 milliseconds on the controller, as well,” Hamadanian said.
To improve synchronization, they designed Ekho to add ‘pseudo-noise’ – low-volume white noise inaudible to humans – to the game audio before it’s streamed to the player’s screen. The Ekho-Estimator module adds identical sequences of pseudo-noise to the game audio; then, when it receives recorded game audio from the controller, it listens for the sequences and tries to line up the streams. The Ekho-Estimator sends that information to the Ekho-Compensator module, which either skips a few milliseconds of sound or adds a few milliseconds of silence to the game audio sent by the server to synchronize the streams.
When the researchers tested the Ekho system on real cloud streaming sessions, they found that it could calculate inter-stream delay with sub-millisecond accuracy. Even when microphone quality was poor or background noise was picked up, 86.6% of the time, Ekho limited inter-stream delay to less than 10 ms.
“The traditional way of doing this, which involves trying to measure the synchronization error using the underlying network, the errors are significantly larger,” said Krishna Chintalapudi, one of the paper’s co-authors. “When we started this project, we weren’t sure whether this could even be done. But the accuracy we can get down to with Ekho, at sub-millisecond levels, it is unheard of.”
Encouraged by their findings, the researchers plan to see how Ekho performs synchronizing five controllers to the same screen device. At the moment, because Ekho was designed for use in cloud gaming, its range is limited. Future work may be geared towards improving the system’s range so that it can be used over longer distances.
“Using inaudible white noise as a sort of ‘timekeeper’ is a great example of how out-of-the-box thinking can produce unexpected results,” said Mohammad Alizadeh, a co-author of the study. “The technique could improve user experience, not just in cloud gaming but potentially in any multidevice streaming scenario.”