GFF3

General feature format

General feature format

File format for genomic features


In bioinformatics, the general feature format (gene-finding format, generic feature format, GFF) is a file format used for describing genes and other features of DNA, RNA and protein sequences.

Quick Facts Filename extensions, Internet media type ...

GFF Versions

The following versions of GFF exist:

GFF2/GTF had a number of deficiencies, notably that it can only represent two-level feature hierarchies and thus cannot handle the three-level hierarchy of gene → transcript → exon. GFF3 addresses this and other deficiencies. For example, it supports arbitrarily many hierarchical levels, and gives specific meanings to certain tags in the attributes field.

The GTF is identical to GFF, version 2.[1]

GFF general structure

All GFF formats (GFF2, GFF3 and GTF) are tab delimited with 9 fields per line. They all share the same structure for the first 7 fields, while differing in the content and format of the ninth field. Some field names have been changed in GFF3 to avoid confusion. For example, the "seqid" field was formerly referred to as "sequence", which may be confused with a nucleotide or amino acid chain. The general structure is as follows:

More information Position index, Position name ...

The 8th field: phase of CDS features

Simply put, CDS means "CoDing Sequence". The exact meaning of the term is defined by Sequence Ontology (SO). According to the GFF3 specification:[2][3]

For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon.

Meta Directives

In GFF files, additional meta information can be included and follows after the ## directive. This meta information can detail GFF version, sequence region, or species (full list of meta data types can be found at Sequence Ontology specifications).

GFF software

Servers

Servers that generate this format:

More information Server, Example file ...

Clients

Clients that use this format:

More information Name, Description ...

Validation

The modENCODE project hosts an online GFF3 validation tool with generous limits of 286.10 MB and 15 million lines.

The Genome Tools software collection contains a gff3validator tool that can be used offline to validate and possibly tidy GFF3 files. An online validation service is also available.

See also


References

  1. "GFF/GTF File Format". Ensembl. Archived from the original on 2022-06-15. Retrieved 2023-11-04.
  2. "GFF3 specification". GitHub. 2018-11-24. Archived from the original on 2023-07-04.
  3. "GFF3". GMOD. 2016-07-12. Archived from the original on 2023-08-25.

Share this article:

This article uses material from the Wikipedia article GFF3, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.