Claudia
Andreini
ab,
Valeria
Putignano
a,
Antonio
Rosato
ab and
Lucia
Banci
*ab
aMagnetic Resonance Center, University of Florence, Via Luigi Sacconi 6, Sesto Fiorentino 50019, Italy. E-mail: banci@cerm.unifi.it; Fax: +39 055 4574253; Tel: +39 055 4574273
bDepartment of Chemistry, University of Florence, Sesto Fiorentino 50019, Italy
First published on 10th August 2018
Organisms from all kingdoms of life use iron-proteins in a multitude of functional processes. We applied a bioinformatics approach to investigate the human portfolio of iron-proteins. We separated iron-proteins based on the chemical nature of their metal-containing cofactors: individual iron ions, heme cofactors and iron–sulfur clusters. We found that about 2% of human genes encode an iron-protein. Of these, 35% are proteins binding individual iron ions, 48% are heme-binding proteins and 17% are iron–sulfur proteins. More than half of the human iron-proteins have a catalytic function. Indeed, we predict that 6.5% of all human enzymes are iron-dependent. This percentage is quite different for the various enzyme classes. Human oxidoreductases feature the largest fraction of iron-dependent family members (about 37%). The distribution of iron proteins in the various cellular compartments is uneven. In particular, the mitochondrion and the endoplasmic reticulum are enriched in iron-proteins with respect to the average content of the cell. Finally, we observed that genes encoding iron-proteins are more frequently associated to pathologies than the all other human genes on average. The present research provides an extensive overview of iron usage by the human proteome, and highlights several specific features of the physiological role of iron ions in human cells.
Significance to metallomicsIron is one of the most ancient and abundant metal ions in living organisms: it participates in fundamental biological processes, such as photosynthesis, and respiration. It is an essential metal ion for humans. Here, we applied a bioinformatics approach to predict the entire set of human proteins that use iron as cofactor. We found that about 2% of human genes encode an iron-protein. In particular, 35% are proteins binding individual iron ions, 48% are heme-binding proteins and 17% are iron–sulfur proteins. Most of these proteins are enzymes: 37% of the human oxidoreductases need an iron ion to perform their catalytic mechanisms. The analysis of the subcellular location highlighted that some organelles are enriched in iron-proteins, in particular about 7% of the proteins localized in the endoplasmic reticulum and in the mitochondrion bind iron. Finally, our data show that mutations in genes encoding iron-binding proteins are more likely to be associated with pathology than all human genes on average. |
Heme and iron–sulfur clusters are cofactors featuring a high chemical complexity. Therefore, their biosynthesis as well as the biosynthesis of the final holo-proteins containing these cofactors involve a significant number of different protein components, some of which are iron-binding proteins. In the human cell, these biosynthetic processes have multiple pathways, related also to cellular compartmentalization. Nevertheless, some components may move across different compartments; furthermore, the various pathways can communicate with one another via the exchange of biosynthetic intermediates.
While iron is essential for life, it can catalyze the formation of potentially toxic reactive oxygen species (ROS). This process is unavoidable in the present oxygen-rich environment, and iron and ROS are increasingly recognized as important initiators and mediators of cell death in various organisms as well as in pathological conditions in humans.10 Therefore, biological systems must control iron metabolism by providing the adequate amount of iron for proper cellular function while limiting iron toxicity.11,12 Iron has also a role in pathogen virulence. The growth of microbial pathogens within the host usually requires iron as an essential nutrient.13,14 Heme-containing proteins, such as hemoglobin, and transferrin are the preferential iron sources for human pathogens.15,16 Therefore, another crucial reason for the cell to maintain a strict control on iron homeostasis is to restrict its access by pathogens.
In this paper, we carried out a systematic prediction of iron-binding proteins encoded in the human genome, extending our previous analysis on iron–sulfur proteins.17 By integrating this prediction with information on heme and individual iron ions, we achieved a complete landscape of the iron handling by proteins in human, thus providing a framework for the understanding of physiological iron metabolism and of its dysfunction in diseases.
The coordination spheres of the three different iron-containing cofactors are quite diverse; we refer to the pattern of the protein residues coordinating the iron ion(s) of the cofactor as the iron-binding pattern (IBP). The IBP is a regular expression defined by the identity of the amino acids coordinating the metal and by their spacing along the protein sequence (e.g. CX4CX25C). Thus, the coordination sphere of each iron ion corresponds to a single IBP.
In IBPs of human iron-proteins binding individual iron ions, histidine is by far the most common residue. His is present in 94% of these IBPs, each of which contains on average two His (Fig. 1). Aspartate, glutamate and tyrosine are found in 53%, 30% and 10% of the identified patterns, respectively. On average, only one Asp and one Tyr are found in each IBP, whereas there can be one (such as in most iron-dependent enzymes) or two (such as in ferritins) Glu residues. All iron–sulfur binding proteins use on average three-four cysteines to coordinate the cluster. Cys is absolutely required in the IBPs of these proteins. In particular, in human iron–sulfur proteins the coordination sphere of the Fe4S4 clusters is always and only composed by cysteines whereas the IBPs of Fe2S2 clusters sometimes (37% of Fe2S2 IBPs) include one or two His residues. In human heme-binding proteins, IBPs commonly contain one or two His with the exception of catalytic heme sites (such as in cytochrome P450) where Cys is more common (83% of IBPs).
The function of the metal cofactor within the protein is correlated also to the number of coordinating residues provided by the protein (i.e. the number of residues in the IBP). Indeed, the coordination sphere of the metal ion is not always completed by atoms of the protein. 64% of the sites that bind individual iron ions contain three protein residues in the IBP, whereas the others contain four protein residues. Similarly, most of the iron ions in heme cofactors have only one ligand provided by the protein (about 58%), which allows the substrate to occupy the second heme axial position. The remaining 42% heme sites have two coordinating residues provided by the protein. In iron–sulfur proteins, the most common number of protein ligands is 4; however, all the iron–sulfur clusters that perform a catalytic function have only three Cys ligands in the IBP. It is thus evident that there is a trend for human iron-proteins to have a lower number of residues in their IBPs when the metal-binding site performs a catalytic function, in order to allow the iron ion to coordinate directly to the substrate as already observed for other metal containing proteins.18
Fig. 2 Distribution of iron-proteins in different cellular organelles of the human cell (heme-proteins: blue; iron–sulfur proteins: grey; individual iron ions: orange). |
Our dataset (iron-proteins for which cellular localization is known) is composed by 45% heme-binding proteins, 34% proteins binding individual iron ions, and 21% proteins binding iron–sulfur clusters. From Fig. 2, we can readily identify compartments that differ appreciably in the distribution of the types of iron-proteins. The nucleus is highly depleted of heme-binding proteins, whereas it features a relatively high number of proteins binding individual iron ions. On the other hand, the mitochondrion is the compartment most enriched in iron–sulfur proteins, with respect to both the two other types, whereas the endosome is mostly enriched in heme-binding proteins and does not contain any iron–sulfur protein. In addition, the endoplasmic reticulum is enriched in heme-binding proteins and depleted in iron–sulfur proteins. The distribution of the three types of iron-proteins in the cytoplasm closely resembles that of the overall dataset. It should be noted that in this respect, we are referring to the number of proteins and not to their relative quantity, which depends on their expression levels. We did not analyze such levels in this work.
The mitochondrion and the endoplasmic reticulum are the compartments with the largest percentage of iron-proteins. As mentioned, the mitochondrion is significantly enriched in iron–sulfur proteins (about 2.5 times the average fraction for the whole cell), whereas the endoplasmic reticulum is enriched in heme-binding proteins (1.6 times the cell average). The nucleus is the only compartment where proteins binding individual iron ions are the majority of iron-proteins (1.7 times the cell average).
We then checked whether there is a relationship between cellular localization and protein function in order to rationalize the patterns reported in Fig. 2. To do this we examined the lists of the iron-proteins localized to the various compartments and identified all the processes, as defined by the Gene Ontology (GO27,28), associated with the corresponding genes. Seven processes involve 81% of the genes coding for iron-proteins localized to the endoplasmic reticulum (Table 1). The process involving more iron-proteins is lipid metabolism, which is a key cellular role played by cytochromes P450; only one tenth of the genes involved in lipid metabolism codes for proteins binding individual iron ions. Xenobiotic metabolic process and drug metabolism are common processes which involve exclusively heme-binding proteins and are essentially associated to cytochromes P450, which are involved in the modification of exogenous molecules, from drugs to pollutants. Proteins binding individual iron ions are involved in different pathways, such as peptidyl amino acid hydroxylation. These pathways do not involve any heme-binding protein. Overall, 92% of the iron-proteins localized to the endoplasmic reticulum are oxidoreductases, as directly observed from their Enzyme Commission (EC) numbers, and these are either members of the cytochrome P450 family (heme-containing enzymes) or iron-dependent hydroxylases (typically harboring two iron ions in their active site). The functional role of the iron-proteins in the endoplasmic reticulum is thus tightly linked to their catalytic activity, most commonly in biosynthetic or metabolic processes.
All | iron_ion | iron_heme | iron_sulfur | |
---|---|---|---|---|
Endoplasmic reticulum | ||||
Drug metabolism | 14 | 0 | 14 | 0 |
Peptidyl amino acid hydroxylation | 6 | 6 | 0 | 0 |
Lipid metabolic process | 43 | 5 | 38 | 0 |
Cell proliferation | 12 | 4 | 8 | 0 |
Response_to_stress | 9 | 0 | 9 | 0 |
Vitamin metabolism | 8 | 0 | 8 | 0 |
Xenobiotic metabolic process | 20 | 0 | 20 | 0 |
Nucleus | ||||
Cell death/apoptotic process | 20 | 10 | 5 | 5 |
Gene expression | 46 | 33 | 9 | 4 |
Cell proliferation | 20 | 11 | 5 | 4 |
Peptidyl amino acid hydroxylation | 8 | 8 | 0 | 0 |
Response to stress | 25 | 9 | 6 | 10 |
Mitochondrion | ||||
Cell death/apoptotic process | 13 | 4 | 5 | 4 |
Iron ion homeostasis | 11 | 4 | 4 | 3 |
Iron sulfur cluster biosynthesis | 6 | 0 | 0 | 6 |
Cellular respiration | 18 | 1 | 7 | 10 |
Response to drug | 9 | 1 | 5 | 3 |
Response to stress | 16 | 3 | 5 | 8 |
In the nucleus, 5 processes involve about 89% of the iron-proteins present in this cell compartment. Gene expression is the process associated to most of these proteins, because several genes encode iron-proteins involved in the regulation of transcription e.g. through DNA binding or histone modification. Many iron-proteins in the nucleus are also involved in response to stress, for instance by repairing damaged DNA, in apoptosis17 and in cell proliferation. About half of the nuclear iron-enzymes are oxydoreductases; transferases and hydrolases are relatively common.
In the mitochondrion, 6 processes involve about 63% of all iron-proteins within this cellular compartment. The process involving the largest number of iron-proteins is cellular respiration, which leverages both heme-binding and iron–sulfur proteins (6 vs. 10 genes, respectively). Other processes involving more than 10 genes are cell death, iron ion homeostasis and response to stress (which is mainly response to oxidative stress), half of which are iron–sulfur proteins. The biosynthesis of iron–sulfur clusters comprises genes encoding require iron–sulfur proteins. At the functional level, the observed enrichment of the mitochondrion in iron–sulfur proteins (Fig. 2) is largely accounted for by the involvement of these proteins in the respiratory chain, in stress response and in the assembly of iron–sulfur clusters themselves. For the latter, the clusters are transiently bound by various proteins along the biosynthetic pathway, also depending upon the final target for cluster insertion.25,26,29 The electron transfer capabilities of iron–sulfur proteins are important but not the only determinant of the higher abundance in the mitochondrion of iron–sulfur proteins with respect to all iron-proteins.
Fig. 4 (A) Superposition of RORα (pdb code: 1n83, in blue) and REV-ERB (pdb code: 3cqv, in red). Only the relative positions of the putative ligands of RORα and the iron ligands of REV-ERB are reported. The side chain of Cys 323 is rotated to bring it closer to the heme iron. In this configuration the distance between the potential sulfur donor and the iron ion is 3.4 Å. (B) Putative iron-binding site in the structural model of HSPB1-associated protein 1. |
In Table 2 we broke down the cumulative data reported in the previous paragraph for the whole human cell by looking at specific compartments. In particular, we took into consideration the compartments with the highest number of iron-proteins. In the mitochondrion, 36% of all proteins are associated to pathologies, whereas as many as 60% of mitochondrial iron-proteins are disease-related, with the main contribution of heme-proteins and iron–sulfur proteins. Similarly, in the cytoplasm and in the nucleus, heme-proteins and iron–sulfur proteins are more commonly associated to pathologies than all other human genes (Table 2).
Heme | Individual iron-ions | Iron–sulfur clusters | Total iron-proteins | All human proteins | |
---|---|---|---|---|---|
Cytoplasm | 13/27 (48%) | 10/34 (29%) | 8/19 (42%) | 31/80 (39%) | 1413/5569 (25%) |
Endoplasmic reticulum | 15/60 (25%) | 9/17 (53%) | 0/3 (0%) | 24/80 (30%) | 362/1163 (31%) |
Mitochondrion | 20/28 (72%) | 5/15 (33%) | 23/37 (62%) | 48/80 (60%) | 420/1174 (36%) |
Nucleus | 7/17 (41%) | 10/52 (19%) | 11/20 (55%) | 28/89 (31%) | 1180/5389 (22%) |
Of the 398 human iron-proteins, 48% are heme-binding proteins, 35% are proteins binding individual iron ions and 17% are iron–sulfur proteins. The intracellular distribution of these proteins is uneven, with some organelles containing a larger share of iron-proteins than others do. In particular, 7% of all the proteins localized in the endoplasmic reticulum and in the mitochondrion are iron-proteins. Thus these two organelles are significantly enriched (in comparative terms) in iron-proteins with respect to the average of the entire human cell (2%, as mentioned above). Within heme-binding proteins, 90% bind heme b and 61% are membrane-associated.
The three types of iron-proteins feature highly diverse preferences in the coordination sphere of the bound iron ions (i.e. IBPs). Cys is always present in the IBPs of iron–sulfur proteins, whereas it is practically absent from the coordination sphere of individual iron ions. Conversely, His, which is nearly always present in the IBPs of proteins binding individual iron ions, is observed rarely in the IBPs of iron–sulfur proteins. Asp is the second most common ligand in proteins binding individual iron ions. Heme-proteins have a similar preference for His and Cys in their IBPs. Cys is particularly common in the IBPs of heme-proteins that have catalytic function. This is presumably linked to the role of Cys in promoting the heterolytic breakage of the O–O bond of the iron-bound peroxide intermediate that forms along the catalytic cycle of cytochromes P450 or of nitric oxide synthase.35–37 This feature is independent of the overall protein fold, and is defined by the coordination chemistry properties of the sites.
6.5% of the human enzymes are iron-proteins. Unsurprisingly, this percentage is not the same for all enzyme classes. In particular, 37% of human oxidoreductases use a catalytic iron ion. 56% of all human iron-proteins have a catalytic function (Fig. 3). Proteins that bind individual iron ions mainly represent them: 86% of these proteins (119 out of 139) are iron-dependent enzymes. The large majority of these enzymes are oxidoreductases, in particular dioxygenases, where the iron ion is directly involved in the transfer of electron from/to the substrate. Also, about half of the heme-sites in the human proteins have a catalytic function. These enzymes are primarily members of the human cytochrome P450 family, whose isoforms are significantly differentiated in terms of expression but have typically broad and overlapping substrate specificities.
Iron-binding enzymes are commonly located in the nucleus and cytoplasm, followed by the mitochondrion and endoplasmic reticulum. The latter features the highest number of heme-binding proteins as it is the most common localization for cytochromes P450. Consistently with this, we observed that processes such as drug metabolism, lipid metabolism or xenobiotic stimulus are the most common processes associated with iron-proteins localized to the endoplasmic reticulum (Table 1). In the mitochondrion, 63% of all iron-proteins are involved in only 6 processes; the process involving the largest number of iron-proteins is respiration, which leverages both heme-binding and iron–sulfur proteins. The mitochondrion is the most likely localization for iron–sulfur proteins (Fig. 2), whose primary processes within this compartment are, besides respiration, the biosynthesis of iron–sulfur clusters and the response to oxidative stress. The biosynthesis of iron–sulfur clusters is among the most common functional roles of iron–sulfur proteins at the level of the whole cell,17,38 owing to the chemical complexity of this group of cofactors. Within the nucleus, iron-proteins are largely involved in various aspects of the regulation of protein expression, such as histone modification. In addition, also DNA binding, DNA biosynthesis and DNA replication involve several iron-proteins, especially iron–sulfur proteins.
We identified three human members of the retinoid-related orphan receptor (ROR) family as potentially harbouring a heme-binding site similar to those observed in proteins of the REV-ERB family. In the absence of experimental evidence in the literature, our hypothesis is supported by the strict conservation of the two potential heme ligands. The experimental structures of RORα, RORβ, and RORγ, feature a His and a Cys residue in a spatial position corresponding to His and Cys ligands of iron in REV-ERBβ. Another putative human iron-binding protein is the HSPB1-associated protein 1. A structural model of this proteins shows that the reciprocal position in 3D space of the putative ligands is completely consistent with our prediction (Fig. 4).
As an important aspect of the present study, we analysed how many pathologies are associated to human genes encoding iron-proteins, based on the occurrence of disease-associated mutations reported in the Swiss-Prot database. The percentage of pathologies associated to genes encoding iron-proteins is almost 40%, which is higher than the percentage of pathologies associated to all human genes (about 20%). In practice, two genes out of 10 are associated with pathogenic mutations in the human genome, whereas this percentage is essentially doubled if we take into account specifically the genes encoding iron-proteins. Interestingly, this percentage peaks at 72% for all heme-binding proteins in the mitochondrion.
In summary, this work provided an extensive overview of iron usage by human proteins, spanning from iron coordination properties to biochemical/cellular function and compartmentalization, and addressing the interplay between these aspects. We observed that the distribution of the type of iron cofactors and of their catalytic properties is quite uneven, with some organelles such as the mitochondrion or the nucleus displaying higher occurrence than the others. The main localization of iron-dependent enzymes, which constitute 6.5% of all human enzymes, is the endoplasmic reticulum, where they catalyze the modification of both endo- and exogenous molecules and metabolites. Human iron-enzymes have a lower number of protein residues in their IBPs, in order to allow the iron ion to coordinate directly to the substrate.
Using the approach described in ref. 39 as implemented in the RDGB program,40 we predicted all iron-binding proteins (IBPs) encoded by the human genome. RDGB is a computational tool written in Python. The approach of RDGB exploits the protein domains of the Pfam database to identify putative homologues of the proteins of interest in any desired genome or list of genomes. Thus, the input to RDGB is a list of Pfam domains of interest (in our case, domains associated with iron-binding capability) and a list of genomes to be analyzed (in our case only the human genome).
The input list of Pfam domains is created by merging two lists: first, the list of all Pfam domains annotated as iron-binding, retrieved by mining the text of the annotations in the database; second, from the analysis of the sequence of iron-binding proteins with known 3D structure that are available from the Protein Data Bank (PDB). In the latter case, we extract from the PDB database also the pattern of amino acids that are responsible for metal binding (i.e. the metal binding pattern, MBP) and its position within the domain sequence. The MBP is defined by the identity and spacing of the amino acids, e.g., CX4CX20H, where X is any amino acid. This pattern provides a way to filter the initial results in order to reduce the number of false positives39 (i.e., of the proteins containing a Pfam domain annotated as iron-binding but which in reality are unable to bind it) by rejecting the proteins that lack the MBP or that have the MBP in the wrong position within the domain. The MBP filter cannot be applied in the absence of a relevant 3D structure available from the PDB. The MetalPDB database contains information on all the MBPs and the Pfam domains found in structurally characterized metalloproteins.9 Our search started from 352 Pfam domains: 261 with an associated iron-containing 3D structure (102 binding individual iron ions, 80 binding iron–sulfur clusters, and 79 binding heme) and 91 annotated as iron-binding domains.
This search was integrated by locally searching from MBPs within all human protein sequences. This is done by extracting from the HMM representing the Pfam domain that contains the binding site of interest only the regions around the MBP. This “trimmed domain” provides a convenient way to search for a MBP regardless of the agreement with the whole Pfam domain, thus affording a better sensitivity in the detection of MBPs in divergent sequences.41
In total we retrieved 363 human iron-proteins. As a qualitative indicator of reliability of our dataset, we checked whether one of the following conditions applied (in decreasing order of reliability):
(1) A 3D structure of the human protein in the iron-bound form is available (105 proteins).
(2) A 3D structure of a close homolog (sequence identity ≥50%) of the human protein in the iron-bound form is available (76 proteins).
(3) The predicted protein contains an iron-binding Pfam domain with a conserved MBP (147 proteins).
(4) The predicted protein contains a conserved MBP (based on local search) (22 proteins).
(5) The predicted protein contains an iron-binding Pfam domain, but the occurrence of the MBP cannot be verified due to the lack of a 3D structure for that domain family (13 proteins).
We integrated these predictions by adding the proteins annotated in the Uniprot database, a public comprehensive resource of protein sequence and functional information, as “iron-binding”, “iron–sulfur-binding”, or “heme-binding”. This contributed 35 additional iron-proteins.
For each predicted iron-protein, we retrieved the following annotations from UniProt:42 intracellular location, EC number, biological processes as reported in the Gene Ontology database,43 involvement in diseases. Further annotation such as the cofactor role and type were manually added by inspecting the literature. We used the Swiss-Prot database (at February 2018 contained 20259 entries)34 to compare the iron-protein dataset with all human proteins. For the latter dataset, annotations were retrieved from Uniprot in the same way as for the iron-protein dataset.
The 3D structural model of the HSPB1-associated protein 1 was built using MODELER v.9.244 and energy-refined using the AMBER45 web server provided by the WeNMR platform.46
IBP | Iron-binding pattern |
ROS | Reactive oxygen species |
ROR | Retinoid-related orphan receptor |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c8mt00146d |
This journal is © The Royal Society of Chemistry 2018 |