CATH database

CATH
Content
Description	Protein Structure Classification
Contact
Research center	University College London
Laboratory	Institute of Structural and Molecular Biology
Primary citation	Dawson et al. (2016)
Release date	1997
Access
Website	cathdb.info
Download URL	cathdb.info/download
Miscellaneous
Data release; frequency	CATH-B is released daily. Official releases are approximately annual.
Version	4.3

Last updated January 27, 2026

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones,^[2] and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.^[3]^[4]^[5]^[6]

Hierarchical organization

Experimentally determined protein three-dimensional structures are obtained from the Protein Data Bank (PDB) and split into their consecutive polypeptide chains, where applicable. Protein domains are identified ("chopped") within these chains using a mixture of automatic methods and manual curation.^[7]

The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution^[2] i.e. they are homologous.

The main levels of the CATH hierarchy:
#	Level	Description	SCOP equivalent
1	Class	the overall secondary-structure content of the domain.	Class
2	Architecture	high structural similarity but no evidence of homology	(None)
3	Topology/fold	a large-scale grouping of topologies which share particular structural features	Fold
4	Homologous superfamily	indicative of a demonstrable evolutionary relationship.	Superfamily

Each homologous superfamily (H) is broken down into structural clusters (SC), which are in turn broken down into functional families (FunFam). Inside each FunFam are a number of domains obtained by "chopping" PDB structures.

The CATH classification is expanded to domains with no experimentally determined structure by sister resources:

For each SC and each FunFam, Gene3D produces a hidden Markov model using the sequence of the domains. This allows domains with sequence homology to be identified from sequences.^[8]
The Encyclopedia of Domains (TED) applies the automated CATH methodlogy to 188 million unique structures from the AlphaFold Protein Structure Database, identifying nearly 365 million domains, which is 100 million more than what Gene3D could identify. Using structual comparison, 194 million domains were matched to the CATH database at the superfamily (H) level, with an extra 46 million matched to the topology (T) level. The remaining domains have structures totally new to CATH.^[9]

Releases

The CATH team releases new data both as daily snapshots, and official releases approximately annually. The latest release of CATH-Gene3D (v4.3) was released in December 2020 and consists of:^[10]

500,238 structural protein domain entries
151 mln non-structural protein domain entries
5,481 homologous superfamily entries
212,872 functional family entries

Open-source software

CATH is an open source software project, with developers developing and maintaining a number of open-source tools,^[11] which are available publicly on GitHub.^[12]

References

↑ Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. (January 2017). "CATH: an expanded resource to predict protein function through structure and sequence". Nucleic Acids Research. 45 (D1): D289–D295. doi:10.1093/nar/gkw1098. PMC 5210570 . PMID 27899584.
1 2 3 Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (August 1997). "CATH--a hierarchic classification of protein domain structures". Structure. 5 (8). London, England: 1093–108. doi: 10.1016/s0969-2126(97)00260-8 . PMID 9309224.
↑ "CATH: Protein Structure Classification Database at UCL". Cathdb.info. Retrieved 9 March 2017.
↑ "CATH". Cathdb.info. Retrieved 9 March 2017.
↑ "CATH Database (@CATHDatabase)". Twitter . Retrieved 9 March 2017.
↑ Pearl FM, Bennett CF, Bray JE, Harrison AP, Martin N, Shepherd A, et al. (January 2003). "The CATH database: an extended protein family resource for structural and functional genomics". Nucleic Acids Research. 31 (1): 452–455. doi:10.1093/nar/gkg062. PMC 165509 . PMID 12520050.
↑ "CATH". cathdb.info. Retrieved 14 September 2024.
↑ Lees, J; Yeats, C; Perkins, J; Sillitoe, I; Rentzsch, R; Dessailly, BH; Orengo, C (January 2012). "Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis". Nucleic acids research. 40 (Database issue): D465-71. doi:10.1093/nar/gkr1181. PMID 22139938.
↑ Lau, Andy M.; Bordin, Nicola; Kandathil, Shaun M.; Sillitoe, Ian; Waman, Vaishali P.; Wells, Jude; Orengo, Christine A.; Jones, David T. (November 2024). "Exploring structural diversity across the protein universe with The Encyclopedia of Domains". Science. 386 (6721). doi:10.1126/science.adq4946.
↑ "CATH". cathdb.info. Retrieved 14 September 2024.
↑ "Tools". cathdb.info. Retrieved 18 December 2016.
↑ UCLOrengoGroup/cath-tools, UCLOrengoGroup, 9 September 2024, retrieved 14 September 2024

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Cathv4.1-1] Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. (January 2017). "CATH: an expanded resource to predict protein function through structure and sequence". Nucleic Acids Research. 45 (D1): D289–D295. doi:10.1093/nar/gkw1098. PMC 5210570 . PMID 27899584.

[Orengo_1997-2] 1 2 3 Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (August 1997). "CATH--a hierarchic classification of protein domain structures". Structure. 5 (8). London, England: 1093–108. doi: 10.1016/s0969-2126(97)00260-8 . PMID 9309224.

[3] "CATH: Protein Structure Classification Database at UCL". Cathdb.info. Retrieved 9 March 2017.

[4] "CATH". Cathdb.info. Retrieved 9 March 2017.

[5] "CATH Database (@CATHDatabase)". Twitter . Retrieved 9 March 2017.

[Pearl2003-6] Pearl FM, Bennett CF, Bray JE, Harrison AP, Martin N, Shepherd A, et al. (January 2003). "The CATH database: an extended protein family resource for structural and functional genomics". Nucleic Acids Research. 31 (1): 452–455. doi:10.1093/nar/gkg062. PMC 165509 . PMID 12520050.

[7] "CATH". cathdb.info. Retrieved 14 September 2024.

[8] Lees, J; Yeats, C; Perkins, J; Sillitoe, I; Rentzsch, R; Dessailly, BH; Orengo, C (January 2012). "Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis". Nucleic acids research. 40 (Database issue): D465-71. doi:10.1093/nar/gkr1181. PMID 22139938.

[9] Lau, Andy M.; Bordin, Nicola; Kandathil, Shaun M.; Sillitoe, Ian; Waman, Vaishali P.; Wells, Jude; Orengo, Christine A.; Jones, David T. (November 2024). "Exploring structural diversity across the protein universe with The Encyclopedia of Domains". Science. 386 (6721). doi:10.1126/science.adq4946.

[10] "CATH". cathdb.info. Retrieved 14 September 2024.

[11] "Tools". cathdb.info. Retrieved 18 December 2016.

[12] UCLOrengoGroup/cath-tools, UCLOrengoGroup, 9 September 2024, retrieved 14 September 2024

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

CATH database

Contents

Hierarchical organization

Releases

Open-source software

References