Jump to content

Audio-to-video synchronization

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 83.187.182.233 (talk) at 02:09, 14 July 2020. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In id as an amount otnegative number indicates the audio lags the viweb|ti-1, Relative TSouE.pdf|pussdat=BT.1359-1|date=19This terminology and standardiznumeric lip sync error is profbroadcast industry as evidenced by the various professional papers,Cite error: A <ref> tag is missing the closing </ref> (see the help page). The necessary RTCP packets might be lost (since RTP/RTCP does not guarantee delivery) or not sent until at least several seconds after the stream has begun. Many software clients do not send RTCP at all or send non-compliant data.[citation needed]

Effect of no explicit AV-sync timing

When a digital or analog audio video stream does not have some sort of explicit AV-sync timing these effects will cause the stream to become out of sync:

  • In film movies these timing errors are most commonly caused by worn films skipping over the movie projector sprockets because the film has torn sprocket holes.
  • Errors can also be caused by the projectionist misthreading the film in the projector, although this is rare with competent projectionists.
  • Audio to Video Synchronization is commonly corrected and maintained with an audio synchronizer. Television industry standards organizations have established acceptable amounts of audio and video timing error and suggested practices related to maintaining acceptable timing.[1][2]
  • A/V sync errors are becoming a significant problem in the digital television industry because of the use of large amounts of video signal processing in television production, television broadcasting and pixelated television displays such as LCD, DLP and plasma displays.
  • In the television field, audio video sync problems are commonly caused when significant amounts of video processing is performed on the video part of the television program.
  • Typical sources of significant video delays in the television field include video synchronizers and video compression encoders and decoders. Particularly troublesome encoders and decoders are used in MPEG compression systems utilized for broadcasting digital television and storing television programs on consumer and professional recording and playback devices.
  • A source of significant video delay is found in pixelated television displays (LCD, Plasma display, DLP) which utilize complex video signal processing to convert the resolution of the incoming video signal to the native resolution of the pixelated display, for example converting standard definition video to be displayed on a high definition display. "Lip-flap" may exceed 200 ms at times.
  • In broadcast television, it is not unusual for lip-sync error to vary by over 100 ms (several video frames) from time to time.
  • The EBU Recommendation R37 “The relative timing of the sound and vision components of a television signal” states that end-to-end audio/video sync should be within +40ms and -60ms (audio before / after video, respectively) and that each stage should be within +5ms and -15ms.[3]

Viewer experience of incorrectly synchronized AV-sync

The result typically leaves a filmed or televised character moving his or her mouth when there is no spoken dialog to accompany it, hence the term "lip flap" or "lip-sync error". The resulting audio-video sync error can be annoying to the viewer and may even cause the viewer to not enjoy the program, decrease the effectiveness of the program or lead to a negative perception of the speaker on the part of the viewer.[4] The potential loss of effectiveness is of particular concern for product commercials and political candidates. Television industry standards organizations, such as the Advanced Television Systems Committee, have become involved in setting standards for audio-video sync errors.[1]

Because of these annoyances, AV-sync error is a concern to the television programming industry, including television stations, networks, advertisers and program production companies. Unfortunately, the advent of high-definition flat-panel display technologies (LCD, DLP and plasma), which can delay video more than audio, has moved the problem into the viewer's home and beyond control of the television programming industry alone. Consumer product companies now offer audio-delay adjustments to compensate for video-delay changes in TVs and A/V receivers, and several companies manufacture dedicated digital audio delays made exclusively for lip-sync error correction.

Recommendations

For television applications, the Advanced Television Systems Committee recommends that audio should lead video by no more than 15 milliseconds and audio should lag video by no more than 45 milliseconds.[1] However, the ITU performed strictly controlled tests with expert viewers and found that the threshold for detectability is -125ms to +45ms.[2] For film, acceptable lip sync is considered to be no more than 22 milliseconds in either direction.[3][5]

The Consumer Electronics Association has published a set of recommendations for how digital television receivers should implement A/V sync.[6]

SMPTE ST2064

SMPTE standard ST2064, published in 2015,[7] provides technology to reduce or eliminate lip-sync errors in digital television. The standard utilizes audio and video fingerprints taken from a television program. The fingerprints can be recovered and used to correct the accumulated lip-sync error. When fingerprints have been generated for a TV program, and the required technology is incorporated, the viewer's display device has the ability to continuously measure and correct lip-sync errors.[8][9]

Timestamps

Presentation time stamps (PTS) are embedded in MPEG transport streams to precisely signal when each audio and video segment is to be presented, to avoid AV-sync errors. However, these timestamps are often added after the video undergoes frame synchronization, format conversion and preprocessing, and thus the lip sync errors created by these operations will not be corrected by the addition and use of timestamps.[10][11][12][13]

The Real-time Transport Protocol clocks media using origination timestamps on an arbitrary timeline. A real-time clock such as one delivered by the Network Time Protocol and described in the Session Description Protocol[14] associated with the media may be used to syntonize media. A server may then be used to for final synchronization to remove any residual offset.[15]

See also

References

  1. ^ a b c IS-191: Relative Timing of Sound and Vision for Broadcast Operations, ATSC, 2003-06-26, archived from the original on 2012-03-21
  2. ^ a b Cite error: The named reference BT1359 was invoked but never defined (see the help page).
  3. ^ a b "The relative timing of the sound and vision components of a television signal" (PDF).
  4. ^ Byron Reeves; David Voelker (October 1993). "Effects of Audio-Video Asynchrony on Viewer's Memory, Evaluation of Content and Detection Ability" (PDF). Archived from the original (PDF) on 2 October 2008. Retrieved 2008-10-19.
  5. ^ Sara Kudrle; et al. (July 2011). "Fingerprinting for Solving A/V Synchronization Issues within Broadcast Environments". Motion Imaging Journal. SMPTE. Appropriate A/V sync limits have been established and the range that is considered acceptable for film is +/- 22 ms. The range for video, according to the ATSC, is up to 15 ms lead time and about 45 ms lag time
  6. ^ Consumer Electronics Association. "CEA-CEB20 R-2013: A/V Synchronization Processing Recommended Practice". Archived from the original on 2015-05-30.
  7. ^ ST 2064:2015 - SMPTE Standard - Audio to Video Synchronization Measurement, SMPTE, 2015
  8. ^ SMPTE Standards Update: The Lip-Sync Challenge, SMPTE, 10 December 2013
  9. ^ SMPTE Standards Update: The Lip-Sync Challenge (PDF), SMPTE, 10 December 2013
  10. ^ "MPEG-2 Systems FAQ: 19. Where are the PTSs and DTSs inserted?". Archived from the original on 2008-07-26. Retrieved 2007-12-27.
  11. ^ Arpi (7 May 2003). "MPlayer-G2-dev: mpeg container's timing (PTS values)".
  12. ^ "birds-eye.net: DTS - Decode Time Stamp".
  13. ^ "SVCD2DVD: Author and burn DVDs: AVI to DVD, DivX to DVD, Xvid to DVD, MPEG to DVD, SVCD to DVD, VCD to DVD, PAL to NTSC conversion, HDTV2DVD, HDTV to DVD, BLURAY". www.svcd2dvd.com.
  14. ^ RFC 7273
  15. ^ RFC 7272

Further reading