PES Packets and Elementary Streams

A video ES, or video elementary stream, consists of all the video data for a sequence, including the sequence header and all the subparts of a sequence. An ES carries only one type of data (video or audio) from a single video or audio encoder.

A packetized elementary stream, or PES, consists of a single ES which has been made into packets, each starting with an added packet header.  A PES stream contains only one type of data from one source, e.g. from one video or audio encoder.

PES packets have variable length, not corresponding to the fixed packet length of transport packets, and may be much longer than a transport packet.  When transport packets are formed from a PES stream, the PES header is always placed at the beginning of a transport packet payload, immediately following the transport packet header.  The remaining PES packet content fills the payloads of successive transport packets until the PES packet is all used.  The final transport packet is filled to a fixed length by stuffing with  bytes = 0xFF (all ones).

Each PES packet header includes an 8-bit stream ID identifying the source of the payload.  Among other things, the PES packet header may also contain timing references: PTS (presentation time stamp, the time at which a decoded audio or video access unit is to be presented by the decoder); DTS (decoding time stamp, the time at which an access unit is decoded by the decoder); ESCR (elementary stream clock reference).  

ATSC further constrains PES packets for video:

 

MPEG 2 Video Data Structures Topics:

- PIXEL

- BLOCK

- MACROBLOCK

- Field DCT Coding and Frame DCT Coding

- SLICE

- PICTURE

- GROUP of PICTURES (GOP)

- SEQUENCE

- PACKETIZED ELEMENTARY STREAM

Up to MPEG 2 Video Data Structures