AVI – Audio Video Interleave
It is a multimedia container format introduced by Microsoft in November 1992 as part of its Video for Windows technology. AVI files can contain both audio and video data in a file container that allows synchronous audio-with-video playback. Like the DVD video format, AVI files support multiple streaming audio and video, although these features are seldom used. Most AVI files also use the file format extensions developed by the Matrox OpenDML group in February 1996. These files are supported by Microsoft, and are unofficially called “AVI 2.0”. Files of this type have a .avi extension.
There is slight overhead when used with popular MPEG-4 codecs (Xvid and DivX, for example), increasing file size more than necessary. The AVI container has no native support for modern MPEG-4 features like B-Frames. Hacks are sometimes used to enable modern MPEG-4 features and subtitles, however, this is the source of playback incompatibilities.AVI files do not contain pixel aspect ratio information. Microsoft confirms that “many players, including Windows Media Player, render all AVI files with square pixels. Therefore, the frame appears stretched or squeezed horizontally when the file is played back.”[1] There are other video container formats that allow irregular shaped pixels.More modern container formats (such as QuickTime, Matroska, Ogg and MP4) offer more flexibility, however, the age of the AVI format, being widely supported on a vast range of operating systems and devices.
MPEG – Moving Picture Experts Group
It was formed by the ISO to set standards for audio and video compression and transmission. Its first meeting was in May 1988 in Ottawa, Canada. As of late 2005, MPEG has grown to include approximately 350 members per meeting from various industries, universities, and research institutions. MPEG’s official designation is ISO/IEC JTC1/SC29 WG11.
The MPEG standards consist of different Parts. Each part covers a certain aspect of the whole specification. The standards also specifies Profiles and Levels. Profiles are intended to define a set of tools that are available, and Levels define the range of appropriate values for the properties associated with them. MPEG has standardized the following compression formats and ancillary standards:
MPEG-1: is the first compression standard for audio and video. It was basically designed to allow moving pictures and sound to be encoded into the bitrate of a Compact Disc. To meet the low bit requirement, MPEG-1 downsamples the images, as well as using picture rates of only 24-30 Hz, resulting in a moderate quality. It includes the popular Layer 3 (MP3) audio compression format.
MPEG-2: Transport, video and audio standards for broadcast-quality television. MPEG-2 standard was considerably broader in scope and of wider appeal–supporting interlacing and high definition. MPEG-2 is considered important because it has been chosen as the compression scheme for over-the-air digital television ATSC, DVB and ISDB, digital satellite TV services like Dish Network, digital cable television signals, SVCD, and DVD.
MPEG-3: Developments in standardizing scalable and multi-resolution compression which would have become MPEG-3 were ready by the time MPEG-2 was to be standardized; hence, these were incorporated into MPEG-2 and as a result there is no MPEG-3 standard. MPEG-3 is not to be confused with MP3, which is MPEG-1 Audio Layer 3.
MPEG-4: MPEG-4 uses further coding tools with additional complexity to achieve higher compression factors than MPEG-2. In addition to more efficient coding of video, MPEG-4 moves closer to computer graphics applications. In more complex profiles, the MPEG-4 decoder effectively becomes a rendering processor and the compressed bitstream describes three-dimensional shapes and surface texture. MPEG-4 also provides Intellectual Property Management and Protection (IPMP) which provides the facility to use proprietary technologies to manage and protect content like digital rights management. Several new higher-efficiency video standards (newer than MPEG-2 Video) are included (an alternative to MPEG-2 Video), notably: MPEG-4 Part 2 (or Advanced Simple Profile) and MPEG-4 Part 10 (or Advanced Video Coding or H.264). MPEG-4 Part 10 may be used on HD DVD and Blu-ray discs, along with VC-1 and MPEG-2.
FLV – Flash Video
It is a file format used to deliver video over the Internet using Adobe Flash Player (initially produced by Macromedia) versions 6–10. Until version 9 update 2 of the Flash Player, Flash Video referred to a proprietary file format, having the extension FLV. The most recent public release of Flash Player supports H.264 video and HE-AAC audio. Flash Video content may also be embedded within SWF files. Notable users of the Flash Video format include YouTube, Google Video, Yahoo! Video, Reuters.com, and many other news providers.
The format has quickly established itself as the format of choice for embedded video on the web. For instance, the standards documentation for BBC Online deprecates the use of other formats previously in use on its sites such as RealVideo or WMV.
MP4
MPEG-4 Part 14, formally ISO/IEC 14496-14:2003, is a multimedia container format standard specified as a part of MPEG-4. It is most commonly used to store digital audio and digital video streams, especially those defined by MPEG, but can also be used to store other data such as subtitles and still images. Like most modern container formats, MPEG-4 Part 14 allows streaming over the Internet. The official filename extension for MPEG-4 Part 14 files is .mp4, thus the container format is often referred to simply as MP4.
Some devices advertised as “MP4 players” are simply MP3 players that also play AMV video and/or some other video format, and do not play MPEG-4 part 14 format. To the makers of these players, the “MP4” designation simply means that they play more than just MP3. This can become rather confusing for potential buyers.
MPEG-4 Part 14 is based upon ISO/IEC 14496-12:2005 which is directly based upon Apple’s QuickTime container format. MPEG-4 Part 14 is essentially identical to the MOV format, but formally specifies support for Initial Object Descriptors (IOD) and other MPEG features.
The existence of two different file extensions for naming audio-only MP4 files has been a source of confusion among users and multimedia playback software. Since MPEG-4 Part 14 is a container format, MPEG-4 files may contain any number of audio, video, and even subtitle streams, making it impossible to determine the type of streams in an MPEG-4 file based on its filename extension alone. In response, Apple Inc. started using and popularizing the .m4a file extension. Software capable of audio/video playback should recognize files with either .m4a or .mp4 file extensions, as would be expected, as there are no file format differences between the two. Most software capable of creating MPEG-4 audio will allow the user to choose the filename extension of the created MPEG-4 files.
While the only official file extension defined by the standard is .mp4, various file extensions are commonly used to indicate intended content:
MPEG-4 files with audio and video generally use the standard .mp4 extension.
Audio-only MPEG-4 files generally have a .m4a extension. This is especially true of non-protected content.
MPEG-4 files with audio streams encrypted by FairPlay Digital Rights Management as sold through the iTunes Store use the .m4p extension.
Audio book and podcast files, which also contain metadata including chapter markers, images, and hyperlinks, can use the extension .m4a, but more commonly use the .m4b extension. An .m4a audio file cannot “bookmark” (remember the last listening spot), whereas .m4b extension files can.
The Apple iPhone uses MPEG-4 audio for its ringtones but uses the .m4r extension rather than the .m4a extension.
Raw MPEG-4 Visual bitstreams are named .m4v.
Mobile phones use 3GP, a simplified version of MPEG-4 Part 12 (a.k.a MPEG-4/JPEG2000 ISO Base Media file format), with the .3gp and .3g2 extensions. These files also store non-MPEG-4 data (H.263, AMR, TX3G).
The common but non-standard use of the extensions .m4a and .m4v is due to the popularity of Apple’s iPod, iPhone, and iTunes Store, and Microsoft’s Xbox 360 and Zune. Without mods, Sony’s PSP can also play M4A. M4A generally delivers better audio quality than the older MP3 format at the same bit rate.
Almost any kind of data can be embedded in MPEG-4 Part 14 files through private streams; the widely-supported codecs and additional data streams are:
Video: MPEG-4 Part 10 (or H.264, also known as MPEG-4 AVC), MPEG-4 Part 2, MPEG-2, and MPEG-1.
Audio: AAC (also known as MPEG-2 Part 7), Apple Lossless, MP3 (also known as MPEG-1 Audio Layer 3), MPEG-4 Part 3, MP2 (also known as MPEG-1 Audio Layer 2), MPEG-1 Audio Layer 1, CELP (speech), TwinVQ (very low bitrates), SAOL (MIDI).
Subtitles: MPEG-4 Timed Text (also known as 3GPP Timed Text).
3GP – Third Generation Partnership Project
3GP is a multimedia container format defined by the Third Generation Partnership Project (3GPP) for use on 3G mobile phones but can also be played on some 2G and 4G phones.
3GP is a simplified version of the MPEG-4 Part 14 (MP4) container format, designed to decrease storage and bandwidth requirements in order to accommodate mobile phones. It stores video streams as MPEG-4 Part 2 or H.263 or MPEG-4 Part 10 (AVC/H.264), and audio streams as AMR-NB, AMR-WB, AMR-WB+, AAC-LC or HE-AAC. A 3GP file is always big-endian, storing and transferring the most significant bytes first. It also contains descriptions of image sizes and bitrate. There are two different standards for this format:
- 3GPP (for GSM-based Phones, may have filename extension .3gp)
- 3GPP2 (for CDMA-based Phones, may have filename extension .3g2)
Both are based on MPEG-4 and H.263 video, and AAC or AMR audio.
Device support:
- Some cell phones use the .mp4 extension for 3GP video
- Most 3G capable mobile phones support the playback and recording of video in 3GP format (memory, maximum filesize for playback and recording, and resolution limits exist and vary).
- Some newer/higher-end phones without 3G capabilities may also playback and record in this format (again, with said limitations).
- Audio imported from CD onto a PlayStation 3 when it is set to encode to the MPEG-4 AAC codec will copy onto USB devices in the 3GP format.
The format has quickly established itself as the format of choice for embedded video on the web. For instance, the standards documentation for BBC Online deprecates the use of other formats previously in use on its sites such as RealVideo or WMV.
WMV – Windows Media Video
It is a compressed video file format for several proprietary codecs developed by Microsoft. The original codec, known as WMV, was originally designed for Internet streaming applications, as a competitor to RealVideo. Through standardization from the Society of Motion Picture and Television Engineers (SMPTE), WMV has gained adoption for physical-delivery formats such as HD DVD and Blu-ray Disc.
A WMV file is in most circumstances encapsulated in the Advanced Systems Format (ASF) container format. The file extension .WMV typically describes ASF files that use Windows Media Video codecs. The audio codec used in conjunction with Windows Media Video is typically some version of Windows Media Audio, or in rarer cases, the deprecated Sipro ACELP.net audio codec. Microsoft recommends that ASF files containing non-Windows Media codecs use the generic .ASF file extension.
Although WMV is generally packed into the ASF container format, it can also be put into the AVI or Matroska container format. The resulting files claim the .AVI, or .MKV file extensions, respectively. WMV can be stored in an AVI file when using the WMV 9 Video Compression Manager (VCM) codec implementation.
ASF – Advanced Systems Format
Advanced Systems Format (formerly Advanced Streaming Format, Active Streaming Format) is Microsoft’s proprietary digital audio/digital video container format, especially meant for streaming media. ASF is part of the Windows Media framework.
ASF is based on serialized objects which are essentially byte sequences identified by a GUID marker.
The format does not specify how (i.e. with which codec) the video or audio should be encoded; it just specifies the structure of the video/audio stream. This is similar to the function performed by the QuickTime, AVI, or Ogg container formats. One of the objectives of ASF was to support playback from digital media servers, HTTP servers, and local storage devices such as hard disk drives.
The most common filetypes contained within an ASF file are Windows Media Audio (WMA) and Windows Media Video (WMV). Note that the file extension abbreviations are different from the codecs which have the same name. Files containing only WMA audio can be named using a .WMA extension, and files of audio and video content may have the extension .WMV. Both may use the .ASF extension if desired.
ASF files can also contain objects representing metadata, such as the artist, title, album and genre for an audio track, or the director of a video track, much like the ID3 tags of MP3 files. It supports scalable media types and stream prioritization; as such, it is a format optimized for streaming.
The ASF container provides the framework for digital rights management in Windows Media Audio and Windows Media Video. An analysis of an older scheme used in WMA reveals that it is using a combination of elliptic curve cryptography key exchange, DES block cipher, a custom block cipher, RC4 stream cipher and the SHA-1 hashing function.
ASF container-based media is usually streamed on the internet either through the MMS protocol or the RTSP protocol.
MOV – QuickTime Video
The QuickTime (.mov) file format functions as a multimedia container file that contains one or more tracks, each of which stores a particular type of data: audio, video, effects, or text (e.g. for subtitles). Each track either contains a digitally-encoded media stream (using a specific codec) or a data reference to the media stream located in another file. Tracks are maintained in a hierarchal data structure consisting of objects called atoms. An atom can be a parent to other atoms or it can contain media or edit data, but it cannot do both.
The ability to contain abstract data references for the media data, and the separation of the media data from the media offsets and the track edit lists means that QuickTime is particularly suited for editing, as it is capable of importing and editing in place (without data copying). Other later-developed media container formats such as Microsoft’s Advanced Systems Format or the open source Ogg and Matroska containers lack this abstraction, and require all media data to be rewritten after editing.
Other file formats that QuickTime supports natively (to varying degrees) include AIFF, WAV, DV, MP3, and MPEG-1. With additional QuickTime Extensions, it can also support Ogg, ASF, FLV, MKV, DivX Media Format, and others.
On February 11, 1998 the ISO approved the QuickTime file format as the basis of the MPEG-4 Part 14 (.mp4) container standard. By 2000, MPEG-4 Part 14 became an industry standard, first appearing with support in QuickTime 6 in 2002. Accordingly, the MPEG-4 container is designed to capture, edit, archive, and distribute media, unlike the simple file-as-stream approach of MPEG-1 and MPEG-2.
QuickTime 6 added limited support for MPEG-4; specifically encoding and decoding using Simple Profile (SP). Advanced Simple Profile (ASP) features, like B-frames, were unsupported (in contrast with, for example, encoders such as XviD or 3ivx). QuickTime 7 supports the H.264 encoder and decoder.
Because both the MOV and MP4 containers can use the same MPEG-4 codecs, they are mostly interchangeable in a QuickTime-only environment. However, MP4, being an international standard, has more support. This is especially true on hardware devices, such as the Sony PSP and various DVD players; on the software side, most DirectShow / Video for Windows codec packs include an MP4 parser, but not one for MOV.
OGG
It is a free, open standard container format maintained by the Xiph.Org Foundation. The Ogg format is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high quality digital multimedia.
The name ‘Ogg’ refers to the file format which can multiplex a number of separate independent free and open source codecs for audio, video, text (such as subtitles), and metadata.
In the Ogg multimedia framework, Theora provides a lossy video layer, while the music-oriented Vorbis codec most commonly acts as the audio layer. The human speech compression codec Speex, lossless audio compression codec FLAC, and OggPCM may also act as audio layers.
The term ‘Ogg’ is commonly used to refer to audio file format Ogg Vorbis, that is, Vorbis-encoded audio in the Ogg container. Previously, the .ogg file extension was used for any content distributed within Ogg, but as of 2007, the Xiph.Org Foundation requests that .ogg be used only for Vorbis due to backward compatibility concerns. The Xiph.Org Foundation decided to create a new set of file extensions and media types to describe different types of content such as .oga for audio only files, .ogv for video with or without sound (including Theora), and .ogx for applications.
The current version of the Xiph.Org Foundation’s reference implementation, released on 27 November 2005, is libogg 1.1.3. Another version, libogg2, is also available from the Xiph.Org Foundation’s SVN repositories.
Because the format is free, and its reference implementation is non-copylefted, Ogg’s various codecs have been incorporated into a number of different free and proprietary media players, both commercial and non-commercial, as well as portable media players and GPS receivers from different manufacturers.
Ogg is only a container format. The actual audio or video encoded by a codec will be stored inside an Ogg container. Ogg containers may contain streams encoded with multiple codecs, for example, a video file with sound contains data encoded by both an audio codec and a video codec.
Being a Container format, Ogg can embed audio and video in various formats (such as MPEG-4, Dirac, MP3 and others) but Ogg was intended and usually is used with the following free codecs:
- Audio codecs
lossy
Speex: handles voice data at low bitrates (~8-32 kbit/s/channel)
Vorbis: handles general audio data at mid- to high-level variable bitrates (~16-500 kbit/s/channel)
lossless
FLAC: handles archival and high fidelity audio data - Text codecs
Writ: a text codec designed to embed subtitles or captions
CMML: a text/application codec for timed metadata, captioning, and formatting - Video codecs
Theora: based upon On2’s VP3, it is targeted at competing with MPEG-4 video (for example, encoded with DivX or Xvid), RealVideo, or Windows Media Video.
Tarkin: an experimental codec utilizing discrete wavelet transforms in the three dimensions of width, height, and time. It has been put on hold since February 2000, with Theora becoming the main focus for video encoding.
Dirac: an experimental codec developed by the BBC as the basis of a new codec for the transmission of video over the Internet. The Schrödinger project aims to provide portable libraries, written in C, that implement the Dirac codec. It also allows to embed Dirac inside the Ogg container format.
OggUVS: a draft codec for storing uncompressed video. - Subtitle structures
Annodex: A free and open source set of standards developed by CSIRO to annotate and index networked media.
RM/RMVB – RealMedia/RealMedia Variable Bitrate
RealMedia is a multimedia container format created by RealNetworks. Its extension is “.rm”. It is typically used in conjunction with RealVideo and RealAudio and is used for streaming content over the Internet.
Typically these streams are in CBR (constant bitrate).
RealMedia Variable Bitrate (RMVB) is a variable bitrate extension of the RealMedia multimedia container format developed by RealNetworks.
As opposed to the more common RealMedia container, which holds streaming media encoded at a constant bit rate, RMVB is typically used for multimedia content stored locally. Files using this format have the file extension “.rmvb”.
RealMedia uses compression similar to MPEG-4 Part 10 codecs, such as x264.
RMVB files are extremely popular for distributing Asian content, especially anime and Chinese television episodes. For this reason, they have become noticeably present (though not entirely popular) on file sharing platforms such as BitTorrent, eDonkey and Gnutella.
DivX
DivX is not a video format, DivX is a brand name of products created by DivX, Inc. (formerly DivXNetworks, Inc.), including the DivX Codec which has become popular due to its ability to compress lengthy video segments into small sizes while maintaining relatively high visual quality. The DivX codec uses lossy MPEG-4 compression, where quality is balanced against file size for utility. It is one of several codecs commonly associated with “ripping”, whereby audio and video multimedia are transferred to a hard disk and transcoded. Many newer “DivX Certified” DVD players are able to play DivX encoded movies, although the Qpel and global motion compensation features are often omitted to reduce processing requirements. They are also excluded from the base DivX encoding profiles for compatibility reasons
Xvid
Xvid is not a video format, it is a video codec library following the MPEG-4 standard, specifically MPEG-4 Part 2 Advanced Simple Profile (ASP). It uses ASP features such as b-frames, global and quarter pixel motion compensation, lumi masking, trellis quantization, and H.263, MPEG and custom quantization matrices.
Xvid is a primary competitor of the DivX Pro Codec (Xvid being DivX spelled backwards). In contrast with the DivX codec, which is proprietary software developed by DivX, Inc., Xvid is free software distributed under the terms of the GNU General Public License. This also means that unlike the DivX codec, which is only available for a limited number of platforms, Xvid can be used on all platforms and operating systems for which the source code can be compiled.
H.264/AVC – H.264/MPEG-4 Advanced Video Coding
Both are not video format, H.264 is a standard for video compression, and is equivalent to MPEG-4 Part 10, or MPEG-4 AVC (for Advanced Video Coding). As of 2008[update], it is the latest block-oriented motion-compensation-based codec standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG), and it was the product of a partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard (formally, ISO/IEC 14496-10) are jointly maintained so that they have identical technical content. The final drafting work on the first version of the standard was completed in May 2003.
H.264/AVC/MPEG-4 Part 10 contains a number of new features that allow it to compress video much more effectively than older standards and to provide more flexibility for application to a wide variety of network environments.
H.264/AVC can often perform radically better than more video codec, such as more than MPEG-2 video—typically obtaining the same quality at half of the bit rate or less, especially on high bit rate and high resolution situations. H.264/AVC covering all profiles of H.264. A profile for a codec is a set of features of that codec identified to meet certain set of specifications of intended applications.
H.263
H.263 is a video codec standard originally designed as a low-bitrate compressed format for videoconferencing. It was developed by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996 as one member of the H.26x family of video coding standards in the domain of the ITU-T.
H.263 has since found many applications on the internet: much Flash Video content (as used on sites such as YouTube, Google Video, MySpace, etc.) is encoded in this format, though many sites now use VP6 encoding, which is supported since Flash 8. The original version of the RealVideo codec was based on H.263 up until the release of RealVideo 8.
The codec was first designed to be utilized in H.324 based systems (PSTN and other circuit-switched network videoconferencing and videotelephony), but has since also found use in H.323 (RTP/IP-based videoconferencing), H.320 (ISDN-based videoconferencing), RTSP (streaming media) and SIP (Internet conferencing) solutions.
H.263 was developed as an evolutionary improvement based on experience from H.261, the previous ITU-T standard for video compression, and the MPEG-1 and MPEG-2 standards. Its first version was completed in 1995 and provided a suitable replacement for H.261 at all bitrates. It was further enhanced in projects known as H.263v2 (also known as H.263+ or H.263 1998) and H.263v3 (also known as H.263++ or H.263 2000).
The next enhanced codec developed by ITU-T VCEG (in partnership with MPEG) after H.263 is the H.264 standard, also known as AVC and MPEG-4 part 10. As H.264 provides a significant improvement in capability beyond H.263, the H.263 standard is now considered primarily a legacy design (although this is a recent development). Most new videoconferencing products now include H.264 as well as H.263 and H.261 capabilities.