Time is expressed as integer multiples of arbitrary units of time called a time_base. There are different contexts that have different time bases: Stream has Stream.time_base, CodecContext has CodecContext.time_base, and Container has av.TIME_BASE.

>>> fh = av.open(path)
>>> video = fh.streams.video[0]

>>> video.time_base
Fraction(1, 25)

>>> video.codec_context.time_base
Fraction(1, 50)

Attributes that represent time on those objects will be in that object’s time_base:

>>> video.duration
>>> float(video.duration * video.time_base)

Packet has a Packet.pts (“presentation” time stamp), and Frame has a Frame.pts and Frame.dts (“presentation” and “decode” time stamps). Both have a time_base attribute, but it defaults to the time base of the object that handles them. For packets that is streams. For frames it is streams when decoding, and codec contexts when encoding (which is strange, but it is what it is).

In many cases a stream has a time base of 1 / frame_rate, and then its frames have incrementing integers for times (0, 1, 2, etc.). Those frames take place at pts * time_base or 0 / frame_rate, 1 / frame_rate, 2 / frame_rate, etc..

>>> p, f = get_nth_packet_and_frame(fh, skip=1)

>>> p.time_base
Fraction(1, 25)
>>> p.dts

>>> f.time_base
Fraction(1, 25)
>>> f.pts

For convenince, Frame.time is a float in seconds:

>>> f.time

FFMpeg Internals


Time in FFmpeg is not 100% clear to us (see Authority of Documentation). At times the FFmpeg documentation and canonical seeming posts in the forums appear contradictory. We’ve experiemented with it, and what follows is the picture that we are operating under.

Both AVStream and AVCodecContext have a time_base member. However, they are used for different purposes, and (this author finds) it is too easy to abstract the concept too far.

When there is no time_base (such as on AVFormatContext), there is an implicit time_base of 1/AV_TIME_BASE.


For encoding, you (the PyAV developer / FFmpeg “user”) must set AVCodecContext.time_base, ideally to the inverse of the frame rate (or so the library docs say to do if your frame rate is fixed; we’re not sure what to do if it is not fixed), and you may set AVStream.time_base as a hint to the muxer. After you open all the codecs and call avformat_write_header, the stream time base may change, and you must respect it. We don’t know if the codec time base may change, so we will make the safer assumption that it may and respect it as well.

You then prepare AVFrame.pts in AVCodecContext.time_base. The encoded AVPacket.pts is simply copied from the frame by the library, and so is still in the codec’s time base. You must rescale it to AVStream.time_base before muxing (as all stream operations assume the packet time is in stream time base).

For fixed-fps content your frames’ pts would be the frame or sample index (for video and audio, respectively). PyAV should attempt to do this.


Everything is in AVStream.time_base because we don’t have to rebase it into codec time base (as it generally seems to be the case that AVCodecContext doesn’t really care about your timing; I wish there was a way to assert this without reading every codec).