Jump to content

User:Cloudeeo/sandbox

From Wikipedia, the free encyclopedia

MPEG-G (ISO / IEC 23092) is an ISO/IEC standard jointly developed by ISO/IEC JTC 1/SC 29/WG 11 (MPEG) and ISO TC 276 "Biotechnology" Work Group 5 to enable efficient and cost-effective handling of genomic information generated by High-throughput sequencing machines. MPEG-G aims to provide genomic data compression and transport together with specifications on how to associate metadata to the genomic content and how to expose Application Programming Interfaces (APIs) for building an ecosystem of interoperable applications and services.

Main characteristics

[edit]

MPEG-G utilizes technology already validated in digital media to compress and transport genome sequencing data for complex use cases involving access to large amounts of possibly distributed data.

Use cases addressed by MPEG-G include[1]:

  • Selective access to compressed data
  • Data streaming
  • Compressed file concatenation
  • Genomic studies aggregation
  • Enforcement of privacy rules
  • Selective encryption of sequencing data and metadata
  • Annotation and linkage of genomic segments
  • Interoperability with main existing technologies and legacy formats
  • Incremental update of sequencing data and metadata

The ISO/IEC 23092 Standard series is composed by 6 parts.

Part 1 - Transport and Storage of Genomic Information

[edit]

This part of the standard deals with data formats for both Transport and Storage of Genomic Information, with reference conversion process and informative annexes. The main topics covered by this part are genomic data streaming and file format.

Part 2 - Coding of Genomic Information (Compression)

[edit]

This part provides specifications for the normative representation of genomic sequence reads identifiers, genomic sequence reads (both unaligned and aligned reads), reference sequences and quality values. This is the part where compression is specified in terms of normative bitstream syntax and decoding behaviour.

Part 3 - APIs (Interfaces, Metadata and Protection)

[edit]

This part of the standard specifies information metadata, SAM interoperability, protection metadata and programming interfaces to access genomic information. The main goals are to enable (controlled) access to MPEG-G data from external applications and to add metadata to compressed genomic information.

Part 4 - Reference Software

[edit]

This part of the standard is a support and guide for implementers of MPEG-G and it is distributed in source code. It is normative in the sense that any conforming implementation of the decoder, taking the same conformant compressed bitstreams, using the same normative output data structures, will output the same data as the Reference Software.

Part 5 - Conformance

[edit]

This part of the standard specifies a normative procedure to assess conformity of bitstreams and decoders to the standard and it is based on an exhaustive dataset of compressed data and corresponding test procedures. Conformance testing is fundamental to validate the correct implementation of the MPEG-G technology in different devices and applications and to enable interoperability among systems.

Part 6 - Genomic Annotations

[edit]

This part of the standard series specifies a compressed representation of genomic annotations linked to the compressed representation of raw sequencing data and metadata.

MPEG-G Parts
Part Number First public release date (First edition) Latest public release date (edition) Latest amend- ment Title Description
Part 1 ISO/IEC 23092-1 2019 2019 Transport and Storage of Genomic Information Specification of file format, streaming and indexing
Part 2 ISO/IEC 23092-2 2019 2019 Coding of Genomic Information Compression of unmapped (raw) and aligned genome sequencing data
Part 3 ISO/IEC 23092-3 2020 2020 APIs Specification of standard interfaces, syntax for metadata and description of content protection mechanisms
Part 4 ISO/IEC 23092-4 (2020) Reference Software Open source implementation of a normative decoder and informative encoder together with bitstreams
Part 5 ISO/IEC 23092-5 (2020) Conformance testing Testing procedure and reference bistream to test the conformance with MPEG-G of a decoder implementation
Part 6 ISO/IEC 23092-6 (2021) Coding of genomic annotations Representation of genomic variants and associated annotations

Filename extensions

[edit]

To be defined.

See also

[edit]

References

[edit]
  1. ^ MPEG-G. "White paper on White paper on the objectives and benefits of the MPEG-G standard". MPEG. MPEG.
[edit]


Category:ISO/IEC standards Category:Open standards covered by patents