Binary_BCF_versus_VCF_format.png


Summary

Description
English: BCF, the binary representation of the VCF format, is much faster to process than VCF for two reasons. First, it avoids the expensive conversion from text to the internal binary representation. Second, the fields in BCF are rearranged to allow rapid access to specific value of any sample. This is achieved by storing values in blocks of the same type rather than per-sample, with offsets to blocks determined on the fly.
Date
Source HTSlib: C library for reading/writing high-throughput sequencing data, GigaScience, Volume 10, Issue 2, February 2021, giab007, (see the supplemental files, all published open access0. https://doi.org/10.1093/gigascience/giab007 and https://academic.oup.com/gigascience/article/10/2/giab007/6139334?searchresult=1
Author James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies

Licensing

w:en:Creative Commons
attribution share alike
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

BCF is a faster binary representation of the VCF format, with BCFtools from the SAMtools suite designed for the reading and writing of these files.

Items portrayed in this file

depicts

17 February 2021

image/png

92c730610d04a6f2011e79d1a9526d9fe1824e7c

291,408 byte

542 pixel

1,340 pixel