Citation: Gong Y, Behera G, Erber L, Luo A, Chen Y (2022) HypDB: A functionally annotated web-based database of the proline hydroxylation proteome. PLoS Biol 20(8):
e3001757.
https://doi.org/10.1371/journal.pbio.3001757
Academic Editor: Sui Huang, Institute for Systems Biology, UNITED STATES
Received: February 2, 2022; Accepted: July 13, 2022; Published: August 26, 2022
Copyright: © 2022 Gong et al. This is an open entry article distributed beneath the phrases of the Creative Commons Attribution License, which permits unrestricted use, distribution, and duplicate in any medium, supplied the distinctive author and provide are credited.
Data Availability: All associated information are contained in the paper and its Supporting Information recordsdata.
Funding: This work was supported by the National Institute of Health (R35GM124896 to Y.C.). The funders had no perform in analysis design, information assortment and analysis, willpower to publish, or preparation of the manuscript.
Competing pursuits: The authors have declared that no competing pursuits exist.
Abbreviations::
ASA,
accessible ground space; ccRCC,
clear cell renal cell carcinoma; DDA,
data-dependent acquisition; DIA,
data-independent acquisition; HIF,
hypoxia-induced concern; Hyp,
proline hydroxylation; KNN,
k-nearest neighbor; PGD,
6-phosphogluconate dehydrogenase; P4HA,
prolyl 4-hydroxylase; PHD,
prolyl hydroxylase space; PTM,
posttranslational modification; RSA,
relative solvent accessibility
1. Introduction
Proline hydroxylation (Hyp), first present in 1902, is a crucial protein posttranslational modification (PTM) pathway in cell physiology and metabolism [1–4]. As a simple addition of a hydroxyl group to the imino aspect chain of proline residue, the modification is found to be evolutionarily conserved from micro organism to individuals. In mammalian cells, Hyp is principally mediated by way of the enzymatic actions of two foremost households of prolyl hydroxylases—collagen prolyl 4-hydroxylases (P4HAs) [5–7] and hypoxia-induced concern (HIF) prolyl hydroxylase space (PHD) proteins [8–12], whereas there should not any acknowledged enzymes in a position to eradicating protein-bound Hyp however. Since the train of prolyl hydroxylases depends on the cell collaboration of quite a lot of co-factors, along with oxygen and iron, along with quite a lot of metabolites, akin to alpha-ketoglutarate, succinate, and ascorbate, the Hyp pathway is a crucial metabolic-sensing mechanism throughout the cells and tissues.
The most well-characterized Hyp targets are collagen proteins and HIFα family of transcription elements. Hyp on collagens mediated by P4Hs is essential to sustaining the triple-helical building of the collagen polymer and enabling the correct protein folding after translation. Indeed, together with an electronegative oxygen on the proline 4R place promotes the trans-conformation and stabilizes the secondary building of collagen [1]. Inhibition of collagen Hyp destabilizes the collagen and prevents its export from the ER, subsequently inducing cell stress and dying [13–15]. HIFα transcription elements are necessary to mediate hypoxia-response in mammalian cells [16–18]. Hyp of HIFα proteins mediated by PHD proteins beneath normoxia scenario is acknowledged by pVHL throughout the Cullin 2 E3 ligase superior, which ends up in quick ubiquitination and degradation of HIFα proteins [19,20]. Hypoxia scenario inhibits HIFα Hyp and degradation, enabling the transcriptional activation of over 100 hypoxia-responding genes [21–23].
In the earlier 2 a very long time, fairly just a few analysis pushed by advances in mass spectrometry-based proteomics experience have reported the identification and characterization of quite a few new Hyp targets and the important roles of the modification in physiological capabilities [24–29]. Hyp has been well-known to affect protein homeostasis and the fundamental occasion is the PHD-HIF-pVHL regulatory axis. The comparable mechanism moreover regulates the turnover of quite a few key transcriptional, metabolic, and signaling proteins, along with β2AR, NDRG3, ACC2, EPOR, G9a, and SFMBT1, and plenty of others. [30–34]. In addition to pVHL-mediated protein degradation, Hyp moreover regulates substrate degradation by affecting its interaction with deubiquitinases. For occasion, the hydroxylation of Foxo3a promotes substrate degradation by inhibiting the interaction with deubiquinase Usp9x, and hydroxylation of p53 enhances its interaction with deubiquitinases Usp7/Usp10 to forestall its quick degradation [35,36]. P4H-mediated Hyp has moreover been acknowledged to manage the soundness of quite a few substrates along with AGO2 and Carabin [37,38]. In addition to protein degradation, Hyp can also affect protein–protein interaction to manage signaling and transcriptional actions. For occasion, PKM2 hydroxylation promotes its binding with HIF1A for transcriptional activation, Hyp of AKT enhances the interaction with pVHL to inhibit the kinase train of AKT, and PHD1-mediated hydroxylation of Rpb1 is crucial for its translocation and phosphorylation [39–42]. More currently, TBK1 hydroxylation was acknowledged and situated to induce pVHL and phosphatase binding, which decreases its phosphorylation and enzyme train, whereas the shortage of pVHL hyperactivates TBK1 and promotes tumor progress in clear cell renal cell carcinoma (ccRCC) [27,43].
Despite these advances, there is a lack of an built-in and annotated knowledgebase devoted for Hyp, which underappreciates the helpful vary and physiological significance of this evolutionarily conserved metabolic-sensing PTM pathway. To fill the information gap, we developed a publicly accessible Hyp database, HypDB (http://www.HypDB.site) (S1 Fig). The progress of the HypDB affords 3 main choices—first, a classification-based algorithm for assured identification of Hyp substrates; second, built-in sources based on exhaustive information literature mining, large-scale LC-MS analysis, and curated public database; and third, a set of a giant spectral library for LC-MS-based site-specific identification from various cell strains and tissues. Furthermore, stoichiometry-based quantification of Hyp web sites permits quantitative comparability of site abundance all through quite a few proteins and tissues, and the extensively annotated Hyp proteome permits deep bioinformatic analysis, along with neighborhood connectivity, structural space enrichment, and tissue-specific distribution analysis. The on-line database system permits the community-driven submission of LC-MS datasets to be included in HypDB annotation and the direct export of precursor and fragmentation with spectral library that permits the occasion of centered quantitative proteomics and data-independent analysis workflow. We hope that the HypDB will current essential insights into the helpful vary and neighborhood of the Hyp proteome and assist in further mechanistic analysis on the physiological roles of the metabolic-sensing PTM pathway in cells and sicknesses.
2. Results
2.1. Database growth and analysis workflow
To assemble a bioinformatic helpful useful resource for metabolic-sensing Hyp targets, we developed HypDB, a MySQL-based relational database on a public-accessible internet server (Figs 1 and S2). It was constructed based on 3 main sources to comprehensively annotate human Hyp proteome (Fig 1). First, information curation of literature by way of PubMed (trying time interval: “proline hydroxylation” and time prohibit between 2000 and 2021) was carried out by 2 neutral curators, which yielded 1,287 evaluation journal articles. Site identification was extracted from each journal article, and its corresponding protein was mapped to UniProt protein ID if potential. Manual curation of the evaluation articles focused on the web sites which were biochemically investigated with quite a lot of proof along with mass spectrometry, mutagenesis, western blotting along with in vitro or in vivo enzymatic assays. Analyzed Hyp site identifications have been then matched in opposition to the prevailing information throughout the database to chop again redundancy. Second, the database included intensive LC-MS-based direct proof of Hyp site identifications based on the built-in analysis of over 100 LC-MS datasets of various human cell strains and tissues (see Experimental methods). The datasets have been each downloaded from publicly accessible server or produced in-house. Each dataset was analyzed by way of a standardized workflow using MaxQuant search engine, and the Hyp site identifications have been filtered and imported into the HypDB with a streamlined bioinformatic analysis pipeline specified by particulars beneath. Our assortment of MS-based proof of Hyp identifications from cell strains and tissues potential revealed an excellent portion of Hyp web sites that could be in all probability acknowledged by deep proteomic analysis as evidenced by our commentary that the pace of distinctive Hyp site addition from each dataset decreased significantly whatever the elevated assortment of datasets throughout the database (S2B Fig). Third, the HypDB moreover built-in Hyp identification annotated throughout the public UniProt database. For increased clarification, the database information level out whether or not or not the positioning was uniquely reported by the UniProt database or by every UniProt annotation and proof from large-scale LC-MS analysis.
Fig 1. Workflow of constructing HypDB database and webserver.
HypDB was constructed by way of deep proteome profiling analysis of human tissues and cell strains, information literature mining, and integration with UniProt information provide. Classification-based algorithm was utilized to extract assured identifications, and site-specific bioinformatic analysis with stoichiometry-based quantification revealed the biochemical pathways involved with human Hyp proteome. MS-based Hyp library further enabled DIA-MS quantification of Hyp proteome in cells and tissues. DIA, data-independent acquisition; Hyp, proline hydroxylation.
We utilized stringent requirements for information importing and classification from LC-MS-based identifications. To import information into the HypDB, LC-MS-based identification of Hyp site from database search analysis was first analyzed by a classification-based algorithm to search out out the boldness of Hyp site identification and localization (Fig 2A). The classification was carried out using the proper scored MS/MS spectrum of a Hyp site in each dataset analysis. The algorithm labeled Hyp identifications that could be utterly localized to proline residue based on consecutive b- or y-ions as Class I web sites. The algorithm labeled the Hyp identifications that may not be utterly localized based on MS/MS spectrum analysis nonetheless is perhaps distinguished from 5 widespread sorts of oxidation artifacts (methionine, tryptophan, tyrosine, histidine, phenylalanine) primarily induced all through sample preparation as Class II web sites. Other Hyp identifications which were reported by the MaxQuant database search software program program (with 1% false-discovery charge on the site-level and a minimal Andromeda ranking of 40) have been grouped as Class III web sites. We further developed a site-localization ranking using the relative intensities of key fragment ion to index the extent of confidence in site localization with MS/MS spectrum analysis for Class I and Class II web sites (Experimental methods). Each dataset was analyzed by the classification algorithm individually, and the proper classification proof for each Hyp site was chosen and reported on the HypDB site to level the boldness of site localization. The classification-based algorithm affords the specificity and reliability required for an exactly annotated database whereas sustaining all potential identifications as searchable information. And the localization credit score rating ranking distribution of Class I and Class II web sites have been confirmed in S2C and S2D Fig.
Fig 2. Substrate vary of the human Hyp proteome.
(A) Illustration of classification-based algorithm to find out assured Hyp web sites. (B)Venn diagram of Class I, II, III Hyp web sites acknowledged from MS analysis and manually curated UniProt web sites. (C) PTM regulatory enzymes acknowledged as Hyp substrates. (D) Kinase tree classification exhibiting the distributions of kinases as Hyp substrates in quite a few kinase households, along with AGC (named after PKA, PKG, PKC households), CAMK (leaded by calcium/calmodulin-dependent protein kinases), CK1 (cell kinase 1), CMGC (named after CDKs, MAPK, GSK, CLK households), STE (homologs of the yeast STE counterparts), TK (tyrosine kinases), and TKL (tyrosine kinase-like). (E) Hydroxyproline proteins that work along with EGLN1. (F) Hydroxyproline proteins that work along with P4HA2. Refer to Sheet A in S2 Table and Sheet A–G in S3 Table for the underlying information of Fig 2B–2F. Hyp, proline hydroxylation; PTM, posttranslational modification.
To contemplate the site-specific prevalence of Hyp, a stoichiometry-based quantification method was built-in into the analysis workflow using the beforehand established concepts [27,44]. Briefly, the Hyp stoichiometry was calculated by dividing the summed intensities of the peptides containing the Hyp site identification with the total intensities of the peptides containing the equivalent proline site throughout the dataset. HypDB recorded all accessible site-specific Hyp stoichiometry analysis from quite a few cell strains and tissues, which allowed site-specific quantitative analysis of modification abundance all through cell and tissue varieties. And the median stoichiometry of all stoichiometry measurements for any explicit site was calculated and reported on the HypDB site.
To further uncover the helpful affiliation of Hyp proteome, quite a lot of bioinformatic annotation strategies have been built-in into the analysis workflow as a part of the knowledge importing course of. These stand-alone workflows embrace evolutionary conservation analysis, solvent accessibility analysis, and protein–protein interface analysis. Evolutionary conservation analysis in distinction the conservation of Hyp web sites with totally different proline web sites on the equivalent protein and carried out a statistical test to search out out if the Hyp site is further evolutionarily conserved than non-Hyp web sites. Solvent accessibility analysis analyzed the sequence of the substrate protein with DSSP bundle and calculated the prospect of solvent accessibility for each Hyp web sites. Protein–protein interaction interface analysis extracted the world interaction residues from the 3DID database based on PDB building analysis and matched them in opposition to the Hyp site throughout the database to find out the Hyp site that is localized throughout the interface and additional vulnerable to intervene with protein–protein interaction.
All information above was built-in into quite a lot of tables and linked by way of worldwide keys as a result of the schema in S2A Fig. Complete information on all Hyp web sites was organized in 2 foremost tables along with a redundant site desk (S1 Table), which saved all Hyp web sites acknowledged in quite a few tissues and cell strains along with annotated MS/MS spectra, site-specific abundance and sample provide information, and a nonredundant site desk (Sheet A in S2 Table), which merged the LC-MS-based proof from completely totally different sources on the site-specific diploma and likewise built-in with the web sites collected from UniProt and information curation of literatures.
2.2. Validation of the Hyp site classification method
To validate our classification-based method for confidence Hyp site identification, we carried out comparative analysis of Hyp site identifications from each class with manually curated UniProt Hyp identifications. Our analysis confirmed that the Class I web sites alone coated over 60% web sites annotated throughout the UniProt, and a mixture of Class I and II web sites coated about 63% of the UniProt web sites, whereas just a few UniProt annotated web sites overlapped with the Class III web sites (Fig 2B), suggesting that our Hyp site localization and classification algorithm allowed the gathering of extraordinarily assured Hyp identification and significantly improved the reliability of LC-MS-based Hyp site analysis. To further probe the current state of the Hyp proteome, we carried out intensive bioinformatic analysis for helpful annotation of the Hyp proteome based on further assured Hyp site identifications in HypDB, which excluded Class III solely Hyp web sites whose LC-MS proof cannot distinguish them from potential oxidation artifacts.
2.3 Mapping human proline hydroxylation proteome
HypDB at current collected 14,413 nonredundant Hyp web sites out of 59,436 Hyp site information by way of large-scale deep proteomics analysis of varied tissue, cell strains, information curation of literatures, and integration with UniProt database. Among 14,413 nonredundant Hyp web sites, 3,382 web sites have been categorized as Class I web sites, 4,335 web sites have been categorized as Class II web sites, and 6,432 have been categorized as Class III web sites (Fig 2B). In addition, the database contained 55 web sites from literature mining and 209 web sites which were built-in from the UniProt database. We utilized enrichment analysis with Gene Ontology molecular function annotation and situated that Hyp substrates are broadly involved in quite a few cell actions, from nucleotide binding and cell adhesion to enzymatic actions akin to oxidoreductase and ligases (S3 Fig and Sheet C in S4 Table). Excluding Class III Hyp web sites, we acknowledged a total of 113 kinases (260 web sites), 32 phosphatases (59 web sites), 23 E3 ligases (47 web sites), and 9 deubiquitinases (19 web sites) as Hyp substrates (Fig 2C and Sheet A–D in S3 Table). Statistical analysis confirmed a selected enrichment of kinases in Hyp proteome (p = 0.037), suggesting a in all probability broad crosstalk between Hyp and kinase signaling pathways (Fig 2D and Sheet E in S3 Table). Comparing Hyp substrates with the interactome of prolyl hydroxylases in BioGRID [45], we acknowledged 22 Hyp proteins with 68 web sites which were acknowledged to work along with EGLN1/PHD2, 17 Hyp proteins with 34 web sites which were acknowledged to work along with EGLN2/PHD1, 416 Hyp proteins with 861 web sites which were acknowledged to work along with EGLN3/PHD3, 58 Hyp proteins with 156 web sites which were acknowledged to work along with P4HA1, 31 Hyp proteins with 296 web sites which were acknowledged to work along with P4HA2, and 26 Hyp proteins with 66 web sites which were acknowledged to work along with P4HA3 (Fig 2E and 2F and Sheet F and G in S3 Table). The numbers of Class I, Class II Hyp web sites, and Hyp proteins that work along with each prolyl hydroxylase have been collected in S4 Fig.
To resolve if Hyp site is further accessible to solvent, we collected 3D buildings of proteins from PDBe and UniProt and calculated the relative solvent accessibility (RSA) of each proline residual on proteins with hydroxyproline web sites with the DSSP bundle [46,47]. To examine if there could also be an RSA distinction between Hyp web sites and non-Hyp web sites on protein with Hyp web sites, we carried out a 2-tail t test and situated no important distinction throughout the distribution of solvent accessibility, suggesting that Hyp would not basically purpose solvent accessible proline residues (S5A Fig and Sheet C in S2 Table). To resolve if Hyp targets proline web sites that are further evolutionarily conserved, we carried out evolutionary conservation analysis by way of intensive sequence alignment of protein orthologs all through species based on EggNOG database [48] and statistically in distinction the conservation of Hyp web sites with the conservation of all proline on the equivalent protein. Our information confirmed that about 49% web sites have been evolutionarily conserved with statistical significance (p < 0.05) (S5B Fig). To resolve if Hyp could play a potential perform in space–space interactions, we analyzed information of acknowledged domain-based interactions of 3D protein buildings from HypDB nonredundant site database. We acknowledged 168 distinctive Hyp web sites which were located on the interface of the interaction. These information instructed potential involvement of Hyp in instantly regulating protein–protein interaction. For occasion, Hyp at place 14 on Superoxide dismutase (SOD1) will form a hydrogen bonding with a neighboring chain Gln16 in a dimeric building and possibly promote the stabilization of the dimer (S5C Fig).
2.4. Functional choices of proline hydroxylation proteins
We carried out GO enrichment exams and totally different helpful annotations on proteins that comprise Class I, II, literature, or UniProt web sites (Fig 3A and Sheet A in S4 Table). Our analysis revealed that Hyp substrates are extraordinarily enriched in metabolic processes akin to response to toxic substances (p < 10−26) and pure cyclic compound catabolic course of (p < 10−14), mRNA splicing (p < 10−26) and structural capabilities akin to NABA collagens (p < 10−35), supramolecular fiber group (p < 10−41), and cell morphogenesis involved in differentiation (p < 10−18). To resolve if the Hyp proteome prefers to be involved in protein–protein interactions, we extracted a human protein interaction database from STRING with a cutoff ranking of 0.7, after which, extracted the entire interactions containing 2 Hyp proteins based on the STRING database. Based on these information, we carried out neighborhood connectivity analysis by evaluating the number of interactions of Hyp proteins with the distribution of the number of interactions from randomly chosen human proteins with 10,000 events of repeats. Our information confirmed that Hyp substrates are significantly involved throughout the protein–protein interaction neighborhood (p < 0.0001) (Fig 3B and Sheet D in S2 Table). We further carried out protein superior enrichment analysis using manually curated CORUM database, and our analysis confirmed that Hyp proteome is significantly enriched with many acknowledged protein complexes (S6 Fig and Sheet D in S4 Table), akin to TNF-alpha/NF-kappa B signaling superior 6 (S7A Fig and Sheet B in S4 Table), TLE1 corepressor superior (S7B Fig and Sheet B in S4 Table), DGCR8 multiprotein superior (S7C Fig and Sheet B in S4 Table), Nop56p-associated pre-rRNA superior (S7D Fig and Sheet B in S4 Table), and PA700-20S-PA28 superior (S7E Fig and Sheet B in S4 Table), suggesting that Hyp targets proteins in quite a lot of pathways that impacts signaling and gene expression. Using MCODE clustering analysis, we extracted significantly enriched clusters from Hyp proteome interaction neighborhood, and these extraordinarily associated clusters of Hyp substrates instructed that Hyp targets important cell actions along with regulation of mRNA splicing, hypoxia response, and focal adhesion (Fig 3C–3E and Sheet B in S4 Table).
Fig 3. Gene enrichment and connectivity analysis of HypDB.
(A) Interaction neighborhood of excessive 20 enriched helpful annotation clusters of HypDB proteins. (B) Bootstrapping-based analysis of hydroxyproline protein interactions evaluating to a distribution of protein interactions from random samples with the equivalent number of human proteins. (C) Hydroxyproline proteins enriched throughout the regulation of RNA splicing. (D) Hydroxyproline proteins enriched throughout the response to hypoxia. (E) Hydroxyproline proteins enriched in focal adhesion. Refer to Sheet A in S4 Table, Sheet D in S2 Table, and Sheet B in S4 Table for the underlying information of Fig 3.
2.5. Structural and motif choices of proline hydroxylation web sites
We analyzed the native sequence context spherical Hyp web sites (excluding Class III web sites) using the MoMo software program program system [49]. As we anticipated, Hyp web sites with PG motif and GPPG motif have been extraordinarily enriched (p < 10−10) which is attribute for collagen protein households (Figs 4A and S8A and Sheet A and B in S5 Table). In addition to collagen, we acknowledged 33 proteins with comparable motif to collagen, and these proteins is also potential substrates of prolyl-4-hydroxylases. Other than the collagen-like motif, we moreover acknowledged CP motif (p < 10−6) (Fig 4A and Sheet C in S5 Table), and proteins containing CP motifs are extraordinarily enriched in focal adhesion (FDR < 0.05). To take away the extreme background of websites with collagen-like Hyp motifs, we filtered out web sites with native sequence contexts in PG motif. Our re-analysis acknowledged that acidic amino acids have been enriched on the +1 place to form PD motif (Fig 4A and Sheet D in S5 Table). PD motif containing proteins have been extraordinarily enriched in metabolic pathways (FDR < 0.05). The amount and proportion of Hyp web sites represented throughout the HypDB proteome that appeared throughout the motifs above are confirmed in S8B Fig. As Hyp web sites may need crosstalk with totally different protein, our analysis revealed 2,386 phosphorylation web sites and 535 ubiquitination web sites which have been acknowledged very close to the Hyp web sites (S8C Fig).
Fig 4. Motif and protein perform analysis of HypDB.
(A) Motif enrichment analysis with the flanking sequences of Hyp web sites acknowledged PG, GPPG and CP motifs (adj p < 10−6) and repeated analysis with the flanking sequences of Hyp web sites after filtering out PG motif sequences acknowledged PD motif (adj p < 10−6). (B) Secondary building enrichment of Hyp web sites based on PDB protein buildings (*p < 0.05). (C) Functional space enrichment analysis of Hyp web sites based on space localizations on proteins in UniProt (***p < 0.001). (D) Functional space enrichment analysis of Hyp web sites based on space localizations on proteins in UniProt (***p < 0.001, **p < 0.01). Refer to the S5 and S6 Tables for the underlying information of Fig 4.
To resolve the structural choices of Hyp web sites, we extracted all Hyp proteins with acknowledged secondary buildings. These proteins comprise 2,279 Hyp web sites and 27,159 non-Hyp web sites on sequences which have experimentally determined PDB building. We then labeled building choices into helix, sheet, flip, and non-structure areas and carried out statistical analysis to match the secondary building choices of Hyp web sites and non-Hyp web sites. We found that Hyp definitely preferentially targets proline residues that are localized throughout the helix (p < 0.05) and swap secondary buildings (p < 0.05) (Fig 4B left panel). Accordingly, we seen a depletion of Hyp web sites open air of a secondary building perform (Fig 4B correct panel).
As secondary buildings won’t completely symbolize helpful structural choices, we developed an identical statistical analysis method to search out out the site-specific enrichment of Hyp web sites on helpful domains or structural areas. In distinction to the usual space enrichment analysis using Pfam or Interpro for protein-level analysis, our method enabled site-specific enrichment analysis of domains or areas based on UniProt annotation. Application of this method revealed quite a few acknowledged and novel structural choices which were extraordinarily enriched with Hyp, such as a result of the triple-helical space, which is attribute for collagen protein family (Fig 4C). In addition to the triple-helical space, our analysis revealed higher than 10 helpful areas and domains which were extraordinarily enriched with Hyp, along with p-domain (p < 10−6), NBD space (p < 10−6), thioredoxin space (p < 10−2), and ferritin-like space (p < 10−6) (Fig 4C and 4D). These information revealed beforehand stunning perform of Hyp concentrating on helpful domains in quite a few cell pathways.
2.6. Site-specific stoichiometric quantification of Hyp proteome
Comparing to relative quantification, stoichiometry analysis measures the prevalence and dynamics of the modification in a physiologically important technique [27,50,51]. Our mass spectrometry-based deep proteome profiling permits site-specific quantification of Hyp stoichiometries all through quite a lot of tissues and cell strains. Our information confirmed that site-specific abundance of Hyp varies broadly from beneath 1% to nearly 100% with an total median stoichiometry of seven.89% (Fig 5A and Sheet A in S8 Table). Indeed, a bulk portion of the Hyp web sites have each very low or very extreme stoichiometries. To look at the helpful variations between web sites with completely totally different stoichiometry, we divided proteins into 5 quantiles based on widespread stoichiometry measurement for the same site all through all cells and tissues (Fig 5B and Sheet B in S8 Table). The 4 cutoffs 5%, 20%, 80%, and 95% have been chosen so that each quantile contained an identical number of Hyp web sites. We then carried out GO enrichment and helpful annotation on the 5 quantiles respectively and carried out hierarchical clustering with correlation coefficient. Our information confirmed that proteins in immune response and neutrophil activation pathways are enriched with low to medium stoichiometry, and proteins in cell adhesion and system progress are enriched with medium to extreme stoichiometry (Fig 5B). We moreover seen a giant enrichment of proteins involved in chromatin assembly and RNA processing nonetheless the stoichiometry of hydroxylation on these proteins gave the impression to be very low (Fig 5B). Combining site-specific helpful perform annotation and stoichiometry analysis, we carried out stoichiometry-based clustering of Hyp-targeted helpful domains. Our information confirmed that ODD space that is acknowledged to manage hydroxylation-mediated protein degradation of HIFα was enriched with medium stoichiometry, and triple-helical space on collagen, whose hydroxylation is required for its maturation, was enriched with extreme stoichiometry (Fig 5C and Sheet C in S8 Table). Furthermore, our analysis revealed stoichiometry-based enrichment of kinase domains at medium stoichiometry, GATA1 interaction domains at extreme stoichiometry, nucleotide-binding domains at low to medium stoichiometry, and histone-binding domains at low stoichiometry (Fig 5C).
Fig 5. Stoichiometry-based helpful enrichment analysis of the Hyp proteome.
(A) Stoichiometry distribution of the Hyp web sites divided into 5 quantiles—Q1, Q2, Q3, This fall, and Q5, from low to extreme stoichiometry with 4 cutoffs of 5%, 20%, 80%, and 95% respectively. (B, C) Hierarchical clustering of GO natural processes enrichment of Hyp proteins (B) and helpful space enrichment of Hyp web sites on proteins in UniProt (C) all through the 5 quantiles. Refer to Sheet A–C in S8 Table for the underlying information of Fig 5.
2.7. Tissue-specific distribution of Hyp proteome
The assortment of mass spectrometry-based identification of Hyp proteome enabled cross-tissue comparative analysis (Sheet A in S8 Table). Indeed, at explicit particular person protein diploma, we seen a big distribution of Hyp abundance for the same site and between completely totally different web sites all through completely totally different tissue (Figs 6A and S9). For occasion, Fibrillin-1 (FBN1) was acknowledged with 22 Hyp web sites of which 17 have been Class I or II web sites. Hyp1090 on EGF_CA repeat confirmed fixed extreme Hyp stoichiometry (71% to 96%) all through 4 completely totally different tissues (testis, colon, coronary coronary heart, and rectum), whereas Hyp1453 on one different EGF_CA repeat confirmed diversified Hyp stoichiometry (3% to 50.5%) all through the equivalent 4 tissues (testis, colon, coronary coronary heart, and rectum) (Fig 6A). In one different occasion, 6-phosphogluconate dehydrogenase (PGD) was acknowledged with 8 Hyp web sites with half of them belonging to Class I or II web sites. Hyp169 on the NAD-binding space confirmed comparatively low stoichiometries in coronary coronary heart, liver, and ovary (7.6% to 11.6%) nonetheless loads elevated stoichiometries in gut and B cell (21.9% and 75.6%) (S9B Fig). We carried out pathway enrichment analysis of Hyp and clustering of the enrichment all through the tissues. Our information confirmed that Hyp proteome diversified dramatically in the case of pathway and abundance amongst tissues (Fig 6B and 6C and Sheet D in S8 Table). For occasion, in lung, the Hyp proteome is principally involved in collagen synthesis and tissue progress, and it has comparatively low portion of distinctive Hyp web sites, nonetheless in liver, the Hyp proteome is intently involved in quite a few metabolic and translational processes with many liver-specific Hyp targets (Fig 6B and 6C). Interestingly, clustering analysis confirmed that tissues sharing comparable physiological capabilities are more likely to share comparable Hyp profiles and are subsequently clustered collectively. Testis and ovary, for example, have comparable enrichment of Hyp proteins related to chromosome group, DNA restore, and totally different DNA-related metabolic processes (Fig 6D and Sheet E in S8 Table). Hyp proteomes in urinary bladder and prostate are co-enriched in regulation of proteolysis and morphogenesis of varied tissues. CD4 T cells and CD8 T cells are enriched with Hyp proteins related to chromatin transforming and immune system progress. Liver confirmed a selected enrichment pattern evaluating to totally different tissues, and its Hyp proteome is strongly enriched in quite a few metabolic and catabolic processes. Meanwhile, 4 of these tissues: ovary, testis, liver, and prostate, co-enriched in neutrophil activation involved in immune response (Fig 6D).
Fig 6. Hyp proteome distributions in quite a few tissues.
(A) An occasion exhibiting numerous stoichiometries of Hyp web sites all through a number of forms of tissue for FBN1 with protein domains labeled in colored containers. (B) Correlation plot of Hyp proteins in 5 completely totally different tissues: coronary coronary heart, liver, lung, ovary, and urinary bladder with the dimensions of arc reveals relative amount and the purple curved strains exhibiting overlap proteins (C) Heat map of the very best 20 enriched helpful annotations of the Hyp proteins in 5 tissues. (D) GO natural course of enrichment heat map of the Hyp proteins all through 7 tissues. Refer to Sheet A in S2 Table, Sheet D-E in S8 Table for the underlying information of Fig 6B–6D. CD4, CD4 T cells; CD8, CD8 T cells; FBN1, Fibrillin-1; Hyp, proline hydroxylation; P, prostate; UB, urinary bladder.
2.8. Data-independent acquisition (DIA) analysis of Hyp targets with HypDB-generated spectral library
DIA has been developed before now 10 years as a powerful method for reliable and setting pleasant quantification of proteins and PTM web sites [52–60]. Our intensive assortment of the MS-based proof for human Hyp web sites supplied a brilliant helpful useful resource to find out a DIA workflow for world, site-specific quantification of Hyp targets in cells and tissues. To this end, our internet server has built-in capabilities for the direct export of annotated MS/MS identification of Hyp web sites for chosen proteins, cell line, tissue, or at a proteome scale. The Export function supplied 2 decisions—exporting the peptide precursor m/z solely or exporting formatted MS/MS spectra. The former risk can generate purpose m/z guidelines that may be utilized as an inclusion guidelines for centered quantification of Hyp web sites on chosen proteins or web sites. The latter risk can instantly generate spectral library used for DIA analysis. Using the Export function, the current HypDB allowed the period of a whole Hyp spectral library throughout the NIST Mass Search format (msp) consisting of 6,000 precursor ions, 5,307 peptides, representing 7,717 Class 1 and a few web sites from 3,022 proteins. The webserver was moreover built-in with the various decisions for selective exporting. To present the applicability of our helpful useful resource in DIA analysis workflow, we analyzed 2 currently revealed large-scale DIA analysis datasets [55,56]. Both datasets utilized DIA analysis to quantify protein dynamics throughout the quite a lot of replicates of paired common and tumor samples.
The analysis by Kitata and colleagues analyzed world protein profiles of lung most cancers with 5 pairs of tumor and common tissues in triplicate analysis for a total of 30 DIA-based LC-MS runs [55]. As a routine course of in DIA analysis, we first carried out database trying of data-dependent acquisition (DDA) information throughout the dataset. Then, using the spectral library generated from the DDA information within the equivalent analysis, we carried out DIA analysis of all tumor and common tissues with replicates. The analysis quantified 1,339 Class 1 and a few Hyp web sites from Kitata and colleagues analysis (1% FDR). Next, we utilized the HypDB-generated spectral library and repeated the DIA analysis. Our finish consequence confirmed that using the HypDB-generated spectral library led to higher than double the total number of Hyp web sites using a DDA-based spectral library with 3,015 Hyp web sites acknowledged whereas defending higher than 83% of the nonredundant Hyp web sites acknowledged using the 2 spectral libraries, suggesting that the making use of of the HypDB-generated spectral library was sufficient to cowl majority of the Hyp identifications and significantly elevated the sensitivity of Hyp proteome safety (Fig 7A). DIA analysis with a blended library generated by every HypDB and DDA acknowledged 3,651 Hyp web sites and 1,249 Hyp proteins (1% FDR). To resolve the reproducibility of the quantification, we calculated the distribution of the proportion of coefficient variance (%CV) for DIA analysis of Hyp web sites. Our information confirmed that %CV diversified between 2% and 15% with a median value spherical 5% (Fig 7B), very like the %CV distribution seen throughout the DIA analysis of proteins and phosphoproteins [55]. Given the extreme reproducibility of the quantification, we filtered the Hyp web sites with a worldwide 1% q-value cutoff (2,283 web sites) and carried out hierarchical clustering analysis of Hyp web sites quantified with normalized depth in tumor and common lung tissues (Fig 7C). Our information clearly confirmed that site-specific Hyp quantification was sufficient to cluster and distinguish tumor versus common tissue. To decide significantly up- or down-regulated Hyp web sites in tumor tissues, we carried out a 2-sample t test and analyzed the knowledge throughout the volcano plot (Fig 7D). The analysis allowed us to find out 142 Hyp web sites which were significantly up-regulated and 178 Hyp web sites which were significantly down-regulated in tumor tissue (5% permutation-based FDR). The dynamically regulated Hyp web sites confirmed sturdy traits which were distinct between tumor and common tissue. Interestingly, we seen subtype-dependent Hyp dynamics on collagen proteins. Collagen subtypes IV and VI confirmed significantly down-regulated Hyp diploma all through quite a lot of web sites in tumor samples, whereas collagen subtype X confirmed significantly elevated Hyp (Fig 7D). Since Hyp promotes the structural stability of collagens, such changes potential indicated a significantly enhance in stability for collagen X and scale back in stability for collagen IV and VI in lung most cancers tissue compared with the normal tissue. Our discovering agreed successfully with a very newest publication indicating a pro-metastatic perform of up-regulated collagen X in lung most cancers growth [61]. In addition, we moreover acknowledged important up-regulation of Hyp on glycolysis enzymes pyruvate kinase (PKM), enolase (ENO1), and autophagy protein Parkin (PARK7) in tumor tissue (Fig 7D). P4HB, a member of the collagen prolyl 4-hydroxylase enzyme, moreover confirmed important enhance in Hyp (Fig 7D), potential as a consequence of elevated prolyl 4-hydroxylase train in lung most cancers [62].
Fig 7. Label-free quantification of the Hyp proteome in lung most cancers with DIA analysis.
(A) Venn diagram of DIA-based Hyp site identifications using HypDB-generated library and the library generated by the DDA in Kitata and colleagues analysis. (B) Distribution of %CV for Hyp web sites quantified with HypDB-generated library, DDA-generated library, or the hybrid library that blended every sources. (C, D) Hierarchical clustering (C) and volcano plot (D) of significantly up- or down-regulated Hyp web sites in common (blue) and tumor (crimson) tissues throughout the DIA analysis. (E, F) Significantly enriched GO natural processes amongst up-regulated (E) and down-regulated (F) Hyp proteins in tumor with at least 1-fold change after normalizing with protein abundance changes. Refer to Sheet A–E in S9 Table for the underlying information of Fig 7B–F. DDA, data-dependent acquisition; DIA, data-independent acquisition; Hyp, proline hydroxylation.
In one different analysis, Guo and colleagues utilized DIA analysis to quantitatively profile kidney most cancers proteome and the dataset consisted of an analysis of 18 common tissues and 18 tumor tissues [56]. Following the equivalent workflow, we first carried out DDA analysis after which utilized DDA-generated Hyp library to quantify Hyp substrates in tissues. The DDA library-based analysis solely quantified 387 Hyp web sites from all replicate analysis. Application of the HypDB-generated spectral library elevated the number of Hyp site quantifications by higher than 5 events, determining 2,510 web sites (S10A Fig). Our finish consequence confirmed that HypDB-generated library enormously elevated the Hyp sequence safety and analysis sensitivity. DIA analysis with a blended library generated by every HypDB and DDA analysis acknowledged 2,556 Hyp web sites and 981 Hyp proteins (1% FDR). To test the reproducibility amongst replicate tissues, we carried out a correlation matrix analysis using the corrplot bundle in R. Our information confirmed that quantitative analysis of Hyp substrates allowed setting pleasant clustering and segregation of tumor versus common tissues (S10B Fig). After world q-value filtering and depth normalization, we analyzed 1,160 Hyp web sites all through all samples with pair-wise t test, and our analysis acknowledged 12 up-regulated web sites and 24 down-regulated Hyp web sites in tumor (5% permutation-based FDR) (S10C Fig).
To understand whether or not or not the differential abundance of Hyp web sites between the normal and tumor tissues was as a consequence of changes throughout the abundance of corresponding proteins, we in distinction the log2 reworked widespread site ratios to the log2 reworked widespread protein ratios for every Kitata and colleagues and Guo and colleagues datasets (S11A and S11B Fig and S9 and S10 Tables). We found that higher than 82% of the Hyp web sites in Kitata and colleagues dataset and at least 37% of the Hyp web sites in Guo and colleagues dataset could very properly be quantified with the corresponding protein abundance (S9 and S10 Tables). From the correlative analysis between site ratios and protein ratios, we noticed a positive diploma of linearity, suggesting the changes throughout the abundance of some Hyp web sites have been definitely pushed by the changes throughout the abundance of corresponding proteins (S11A and S11B Fig). We moreover noticed {that a} good portion of Hyp site dynamics did not correlate with protein abundance changes. To this end, we calculated 95% confidence interval alongside the bisector correlation strains that symbolize equal ratios of Hyp site and protein abundance changes for all Hyp web sites with corresponding protein quantification ratios (S9 and S10 Tables). Our analysis confirmed that 78% of the Hyp web sites in Kitata and colleagues dataset and 35% of the Hyp web sites in Guo and colleagues dataset confirmed important deviation in site abundance changes from the corresponding protein abundance changes (S11A and S11B Fig). The correlation analysis subsequently acknowledged Hyp substrates that confirmed differential changes in abundances evaluating to the corresponding protein abundance changes. We further extracted solely the significantly up- or down-regulated Hyp web sites based on DIA analysis and in distinction their dynamics with corresponding protein abundance changes (S11C–S11F Fig). Notably, in Kitata and colleagues dataset, the protein abundance of COL1A2 and COL14A1 was comparable between tumor and common tissues, whereas the abundance of the Hyp web sites on each of those proteins have been successfully above or beneath the 95% confidence interval (S11C and S11E Fig). The correlation analysis moreover confirmed the down-regulation of Hyp abundance on collagen subtypes IV and VI in Kitata and colleagues lung most cancers dataset with the protein-level normalization, whereas exhibiting that the up-regulation of Hyp abundance on collagen subtype X in tumor was due to the up-regulation of the protein abundance (S11C and S11E Fig). In Guo and colleagues dataset, significantly modified Hyp web sites confirmed good correlation with corresponding protein dynamics, whereas the Hyp web sites of CRK and TPI1 confirmed loads larger enhance or decrease in abundance compared with these of their total proteins, suggesting differential actions of the Hyp pathways for each substrate (S11D and S11F Fig).
To reveal the helpful significance of up-regulated or down-regulated Hyp substrates in every datasets, we carried out helpful annotation enrichment analysis with Hyp substrates whose site ratios confirmed at least 1-fold enhance or decrease with protein abundance normalization. Analysis of Kitata and colleagues dataset confirmed that the natural processes related to homotypic cell–cell adhesion, coagulation, cell redox homeostasis, response to interleukin-12, and angiogenesis have been significantly enriched amongst up-regulated Hyp substrates (Fig 7E), whereas processes related to regulation of gene expression, neutrophil-mediated immunity, carbohydrate catabolism, collagen metabolic course of, and response to interleukin-7 have been significantly enriched amongst down-regulated Hyp substrates (Fig 7F) (BH corrected FDR < 0.05). The analysis of Guo and colleagues dataset confirmed that Hyp proteins up-regulated in kidney most cancers have been strongly enriched in KEGG pathways along with ECM-receptor interaction, focal adhesion, glyoxylate/dicarboxylate metabolism, and tryptophan metabolism (S10D Fig), whereas pathways along with biosynthesis of amino acids, fructose/mannose metabolism, pathgenic E. coli an an infection and PI3K-Akt signaling have been significantly enriched amongst down-regulated Hyp proteins in tumor tissue (S10E Fig) (BH corrected FDR < 0.05).
3. Conclusions
A grand downside in helpful analysis of PTM pathways is the scarcity of annotation sources to profile modification substrates and annotate enzyme-target relationships. Hyp is a key oxygen and metabolic-sensing PTM that governs the cell functions in response to the hypoxia microenvironment and micronutrient stress. Earlier analysis of Hyp primarily focused on its perform in structural stability and maturation of cytoskeletal proteins akin to collagens. In the earlier quite a lot of a very long time, intensive biochemical analysis on HIF pathways along with totally different new Hyp substrates implies that Hyp is broadly involved in regulating protein–protein interaction, protein stability, signal transduction, metabolism, and gene expression. Growing proof has moreover instructed that individual Hyp pathways play essential roles in most cancers progress, metastasis, coronary coronary heart sickness, and diabetes. Systematic categorization and helpful annotation of Hyp proteome will current full understanding and important physiological insights into Hyp-regulated cell pathways along with potential therapeutic strategies concentrating on metabolic-sensing Hyp pathways in sicknesses.
To deal with this need, we developed HypDB, an built-in on-line portal and publicly accessible server for helpful analysis of Hyp substrates and their interaction networks. HypDB collected quite a few information sources for full safety of Hyp proteome, along with information curation of revealed literature, deep proteomics analysis of tissues, and cell strains, along with integration with annotated UniProt database. The site-localization and classification algorithm enabled setting pleasant extraction of assured Hyp substrate identification from LC-MS analysis. Our identification of extraordinarily assured Hyp substrates expanded the current annotation of human Hyp targets in UniProt by over 40-fold. Streamlined information processing and stoichiometry-based Hyp quantification allowed site-specific comparative analysis of Hyp abundance all through 26 human organs and fluids along with 6 human cell strains. We collected 14,413 Hyp web sites from quite a few origins, and 86% of the very best 500 Hyp web sites with in all probability probably the most repeat identifications in quite a few MS datasets have been structural proteins, which matched successfully with definitely one in every of its most important molecular function.
Bioinformatic analysis of the first draft of human Hyp proteome present essential insights into the helpful and structural vary of the modification substrates. The analysis not solely revealed quite a few cell pathways enriched with Hyp proteins along with mRNA processing, metabolism, cell cycle, and signaling, however as well as demonstrated for the first time that Hyp preferentially targets protein complexes and protein–interaction networks, indicating important roles of Hyp in fine-tuning protein structural choices and mediating protein–protein interactions. Indeed, analysis of the expanded Hyp proteome with site-level secondary building enrichment analysis indicated a giant enrichment of Hyp web sites on the alpha-helix, whereas site-level enrichment analysis of helpful domains and areas revealed novel protein space choices that are preferentially centered by Hyp, akin to P-domain, NBD space, ferritin-like space, and thioredoxin. These findings instructed in all probability important roles for Hyp-mediated regulation of space stability or train that are worthy of further biochemical investigation.
MS-based analysis of Hyp proteome permits the stoichiometry-based quantification of Hyp abundance on the site-specific diploma. By classifying Hyp substrates based on stoichiometry dynamics, we revealed the enrichment of helpful domains and train with very extreme stoichiometry, indicating that Hyp on these domains is also required for the protein function, which is analogous to collagen. In comparability, the oxygen-sensing ODD space was enriched with median stoichiometry and nucleotide or histone-binding domains have been enriched with low stoichiometry. Such distinction may suggest differential actions of prolyl hydroxylases concentrating on quite a few helpful domains. Comparative analysis of Hyp stoichiometry all through tissues moreover indicated variations in modification abundance on the site-specific diploma. Such variation is also attributed to the differential metabolic and gene expression profiles in quite a few tissues.
The assortment of MS-based identification of Hyp proteome in HypDB established an annotated spectral library for Hyp-containing peptides which were acknowledged and site localized with extreme confidence. Such intensive spectral library enabled reliable and delicate analysis of deep proteomic analysis of human cells and tissues with DIA. Application of the HypDB-generated spectral library in DIA analysis demonstrated wonderful information reproducibility, significantly improved the safety of Hyp proteome in most cancers proteome analysis and revealed novel enrichment of Hyp web sites which were significantly up-regulated or down-regulated in most cancers tissues.
Although the current model of HypDB (v1.0) is restricted to the human proteome, future progress of HypDB will embrace Hyp proteome in numerous species. Comparative analysis of Hyp targets from quite a few species will allow evolutionary conservation analysis of Hyp web sites and decide functionally important Hyp targets in protein building and train. Further utility of the HypDB-generated spectral library in tissue analysis will permit the invention of novel Hyp targets in sickness animal fashions or affected particular person samples and possibly consequence within the occasion of clinically associated therapeutic strategies.
4. Experimental methods
4.1. MS raw information analysis
We collected MS information from the human proteome draft [63], deep proteome analysis of human cell strains [64], PHD interactome analysis [44,65], and Hyp proteome analysis [27] along with IP-MS analysis of Flag-tagged HIF1A. All MS raw information collected above have been searched with MaxQuant (mannequin 1.5.3.12) in opposition to the UniProt human database whereas having carbamidomethyl cystine as mounted modification and protein N-terminal acetylation, methionine oxidation, and Hyp as variable modification. Most of the raw information had trypsin as a result of the digestion enzyme, whereas quite a lot of samples used totally different digestion enzymes, for example, LysC and GluC, based on the experimental strategy of distinctive duties. Maximum missing cleavage amount was set to 2 and the identification threshold was set at 1% false discovery charge for concatenated reversed decoy database search at protein, peptide, and site ranges.
4.2. Site localization classification and scoring
To filter out low confidence web sites, we developed the positioning localization classification algorithm. Based on the experience that web sites are localized further exactly when further ion fragments are current in corresponding MS2 spectra serving to to localize the modification mass shift, our algorithm divided web sites into 3 classes in response to their modification localization confidence: distinctive localized web sites in Class I, web sites nonexclusive nonetheless distinguishable from comparable modifications in Class II, and the rest in Class III (Fig 2A).
For a site to be labeled as Class I site, a pair of b-ions or y-ions separating the proline from totally different amino acids need to be found to localize it utterly. In this fashion, a mass shift attributable to hydroxylation can solely occur on that individual proline. And we gave credit score to that ion pair throughout the scoring function for Class I web sites as follows:
the place CS stands for credit score rating ranking, I stand for depth of varied ion fragments, for example, stands for the depth of bm-ion, and l stands for peptide dimension. We gave credit score rating to the pair of b-ions and y-ions that localizes hydroxylation utterly. The one with lower depth contained in the pair is perhaps chosen, and we calculate the credit score rating ranking based on the ratio of their intensities to widespread ion depth on the equivalent peptide.
Hydroxylation that may not be utterly localized nonetheless distinguishable from occurring on totally different prion-to-oxidize amino acid residuals are labeled as Class II on account of we are going to infer that hydroxylation occurs on proline on this case. As all ions that separate proline from nearest amino acid may get oxidized merely, we gave credit score to all ions that help to separate them throughout the scoring function for Class II web sites as follows:
the place ll and lr for distance between hydroxylated proline and nearest prion-to-oxidation amino acid residual on the left aspect and correct aspect. Instead of solely giving credit score rating to the pair subsequent to the aspect, for Class II web sites, we gave credit score to all ions that contributed to separate Hyp with totally different prion-to-oxidation amino acid residues. We require that Hyp site incorporates at least 1 fragment ion on every left and correct flanking sequences excluding terminal fragment ions. After that, we moreover calculate the ratio between the widespread depth of chosen ions and all ions on either side, and the credit score rating ranking is about by the weaker aspect.
Sites that belong to neither Class I nor Class II are labeled as Class III web sites. There are prospects that Class III web sites are Hyp on totally different positions or totally different modifications that are acknowledged falsely. Due to their low credibility, we do not ranking them and solely use further assured Hyp identifications, which embrace Class I, Class II, UniProt, and literature web sites for bioinformatic analyses.
4.3. Stoichiometry calculation
We calculate the stoichiometry of each hydroxyproline site in response to the total peptide depth and modified peptide depth. For a selected site, we purchase all modified and unmodified peptides that comprise this site from MS information. Then, we get stoichiometries by dividing total modified peptide depth by total peptide depth. Site stoichiometries in quite a few samples are calculated individually, so there could also be quite a lot of distinctive stoichiometries for 1 site within the equivalent tissue or cell line. We take the widespread stoichiometry for analysis throughout the following steps on this case.
4.4. Statistical enrichment of pathways, helpful annotations, domains, and complexes
We use R packages along with “GO.db,” “GOstats,” and “org.Hs.eg.db” to hold out enrichment analysis along with Pfam, Kegg, and Gene Ontology—natural processes, molecular function, and cell compartment. We collected proteins of Class I, Class II, UniProt, and literature web sites from HypDB and carried out a hypergeometric test for each time interval throughout the annotations above. Enrichment significance is log reworked, and we used Benjamini–Hochberg correction to confirm the enrichment significance with a cutoff of 0.05.
Meanwhile, we carried out enrichment exams by sample and stoichiometry quantiles, respectively. For sample-specific enrichment exams, proteins with hyp web sites present in quite a few tissues and cell strains are analyzed, respectively. While throughout the totally different group, we divide proteins into 5 quantiles in response to the widespread stoichiometry of corresponding web sites all through all samples. The stoichiometry ranges for 5 quantiles are [0%, 5%), [5%, 20%), [20%, 80%), [80%, 95%), and [95%, 100%). We also perform the log transformation and cluster the samples or quantiles according to the enrichment difference in different terms.
We also used Metascape for functional annotations and visualizations.
4.5. Motif enrichment analysis
The protein sequences of the proteins represented in HypDB were downloaded from the UniProt database. In-house Python scripts were written to extract peptides that contained Hyp sites that passed our stringent filtering criteria. These peptides were extended to the length of 27 amino acids and centered around the hydroxylated proline residue. The prealigned peptides were uploaded to the MoMo (version 5.4.1) web application [49]. All protein sequences which were obtained from the UniProt database have been set as a result of the background for the analysis. Within the MoMo internet utility, the motif-x algorithm was chosen. The minimal number of occurrences for a motif was set to twenty. The sequence logos have been generated by the MoMo internet utility.
4.6. Secondary building analysis
The positions for the secondary buildings of the proteins represented in HypDB have been downloaded from the UniProt database. In-house Python scripts have been developed to search out out the number of Hyp web sites and non-Hyp web sites current in secondary building choices for areas of proteins which have a acknowledged PDB building.
4.7. Network connectivity analysis
All Class I, Class II, UniProt, and literature web sites in HypDB are collected and reworked into 7,321 ENSP IDs with UniProt. Then, we seek for interactions throughout the String database having every nodes throughout the ENSP guidelines, and there are 16,176 interactions in total. To test the connectivity significance, we randomly picked 7,321 ENSP IDs from UniProt proteins and counted interactions whose every nodes have been included by the randomly chosen sample throughout the String database. The select and rely course of are repeated 10,000 events, and these interaction counts from random samples are in distinction with the corresponding number of web sites from HypDB.
We moreover constructed a protein–protein interaction neighborhood with these hyp proteins. From which we then chosen some extraordinarily interconnected subnetworks that carry completely totally different natural capabilities with the help of Cytoscape software program program and the Mcode module.
4.8. Solvent accessibility analysis
With information from PDBe and UniProt, we matched hydroxyproline proteins with corresponding pdb ID and protein buildings in pdb recordsdata. Then, we use R bundle bio.PDB.DSSP to interpret pdb recordsdata that comprise structural information and calculate the solvent accessibility of each proline residual throughout the protein building using the Sander and Rost accessible ground space (ASA) values. Then, all accessibilities are divided by most accessibility of proline to get the relative accessibility amount between 0 and 1.
4.9. Protein–protein interface analysis
The interacting space pairs and circumstances of space–space interactions of 3D protein buildings have been downloaded from 3DID (https://3did.irbbarcelona.org/index.php). In-house Python scripts have been developed to analysis the number of Hyp web sites interacting with one different residue and the number of Hyp web sites inside 3 residues of an interacting residue.
4.10. Evolutionary conservation analysis
Evolutionary conservation analysis of Hyp web sites was carried out using EggNOG ortholog database (v5.0) and EggNOG-mapper on-line portal [48]. Briefly, first, using EggNOG-mapper, Hyp proteins have been mapped to the corresponding ortholog groups. Next, Hyp web sites and non-Hyp proline web sites on Hyp proteins have been aligned to ortholog sequences using MAFFT algorithm [66]. The number of matches a Hyp site or non-Hyp proline site to a proline for the same positions in ortholog sequences and the total number of sequences throughout the ortholog group have been recorded. Lastly, HyperG test was carried out for each Hyp site based on normalized number of matches to proline residues in ortholog sequences for Hyp web sites and non-Hyp web sites, along with the total number of any amino acid residues in ortholog sequences for the same place as a result of the Hyp web sites or non-Hyp web sites.
4.11. Development of site and MySQL database
The site serves as a front-end interactive interface of the database. It was developed using HTML, CSS, Javascript, and PHP and works on a Linux-Apache-MySQL-PHP (LAMP) server construction. The front-end was designed using the Bootstrap framework. Associated protein information are fetched using APIs from quite a lot of sources. Protein sequences, identifiers, and descriptions are fetched from entries throughout the UniProtKB/Swiss-Prot knowledgebase [67], protein secondary building information are fetched from PDBe [68], and domains are fetched from Pfam [69]. The protein sequences are displayed on the internet website using neXtProt Sequence Viewer (https://github.com/calipho-sib/sequence-viewer). The spectral graphs on the internet website are visualized using d3.js (https://d3js.org/). The backend of the site makes use of PHP to interface with a MySQL database that includes the knowledge as confirmed in S2A Fig.
4.12. Transfection and immunoprecipitation of HIF1A
Transfection and overexpression of Flag-tagged HIF1A was carried out following a course of as beforehand described [70]. Flag-tagged HIF1A plasmid (Sino Biological) was transfected into 293T cells with polyethylenimine. Cells have been dealt with with 10 μm proteasome inhibitor MG-132 (Apexbio) for 4 hours earlier to harvesting. Approximately 24 hours after transfection, cells have been washed with chilly PBS buffer and lysed in lysis buffer (150 mM NaCl, 50 mM Tris-HCL, 0.5% NP-40, 10% glycerol (pH 7.5), protease inhibitor cocktail (Roche)) on ice for 15 to twenty minutes. Then, the cell lysates have been clarified by centrifugation earlier to the incubation with anti-FLAG M2 affinity gel (Sigma) for six hours at 4°C. After incubation, the M2 gel was washed with wash buffer (cell lysis buffer with 300 mM NaCl) for 3 events after which eluted with 3× Flag peptide (ApexBio). The eluate have been blended with 4× SDS loading buffer and boiled, after which, loaded onto handmade SDS-PAGE gel and stained with Coomassie blue (Thermo Fisher).
4.13. In-gel digestion and LC-MS analysis of HIF1A
An enormous gel piece defending a big MW differ above 100 kDa was scale back out and subjected to low cost/alkylation and in-gel digestion with trypsin (Promega) as beforehand described [51]. Tryptic peptides have been desalted with handmade C18 StageTip and resuspended in HPLC Buffer A (0.1% formic acid) sooner than being loaded onto a capillary column (75 μm ID and 20 cm in dimension) in-house full of Luna C18 resin (5 μm, 100 Å, Phenomenex). The peptides have been separated with a linear gradient of seven% to 35% HPLC Buffer B (0.1% formic acid in 90% acetonitrile) at a circulation charge of 200 nl/min on Dionex Ultimate 3000 UPLC and electrosprayed proper right into a high-resolution Orbitrap Lumos mass spectrometer (Thermo Fisher). Peptide precursor ions have been acquired in Orbitrap with a choice of 120,000 at 200 m/z, and peptides have been fragmented with Electron Transfer/High Energy Collision Dissociation (EThcd) with calibrated charge-dependent ETD parameters and ETD Supplemental Activation and bought in Top12 data-dependent mode sort by highest price state and lowest m/z as priority settings. Raw information have been analyzed by Maxquant software program program following the equivalent course of and parameter setting as beforehand revealed dataset as described above.
4.14. Usage of HypDB site
A loyal site with built-in MySQL database was established to host the HypDB service. The database schema consists of 4 tables representing redundant Hyp site identifications, nonredundant Hyp site identifications, interaction interface analysis, evolutionary conservation analysis, and solvent accessibility analysis. Each doc throughout the site identification desk is assigned a singular HypDB site ID. The site was designed with the Bootstrap framework (v4.1.3) and choices quite a lot of key capabilities along with a Search bar, Protein information net web page, Site information net web page, Database summary, Upload/contribute net web page, and Download/export net web page.
The Search bar permits the particular person to enter a UniProt accession amount or Gene establish of the protein of curiosity, and the server will use the information to extract and present a ranked guidelines of most comparable entries in precise time. Clicking on an entry will carry the particular person to the protein information net web page the place protein identifiers, description, and protein sequence are displayed. All Hyp web sites are acknowledged on the protein sequence along with acknowledged acetylation and phosphorylation web sites from PhosphoSitePlus database [71] are highlighted by completely totally different colors. The guidelines of Hyp web sites is further displayed beneath the sequence throughout the desk that options the positioning properties along with localization class, localization ranking, stoichiometry, solvent accessibility, and evolutionary conservation information. Hyp site desk is adopted by properties of Hyp proteins along with protein–protein interaction, secondary building, helpful domains, and space–space interactions. Hyp web sites acknowledged with MS/MS proof throughout the HypDB have a “Details” button displayed for each site throughout the site desk on the protein information net web page. Clicking on the Details button will carry the particular person to the peptide information net web page the place the proper acknowledged MS/MS spectrum for the positioning is displayed with annotations of fragment ions.
The Contribute/Upload net web page permits the group to contribute raw MS/MS identifications to the HypDB by way of an embedded Google Form. Information regarding the raw information type, location, sample type, database trying parameters along with particular person information is perhaps entered into database. Raw information is perhaps downloaded and processed using the equivalent streamlined workflow. The information will transfer by way of the classification and site-localization analysis course of and annotated with the bioinformatic workflows as described above. The remaining information is perhaps deposited into the HypDB to share with the evaluation group.
The Export/Download net web page permits the group to acquire your complete dataset deposited throughout the HypDB along with every redundant and nonredundant modification site tables. In addition, the Export function permits clients to select an inventory of proteins, tissues of pursuits, filter web sites based on localization credit score rating class, MS fragmentation type, proteolytic enzyme utilized in proteomics analysis, along with specify the precursor ion m/z of Hyp proteins for export to rearrange centered quantification method when shopping for information or export the collected spectral libraries of Hyp web sites from the chosen Hyp proteins to hold out database trying with DIA.
4.15. Construction of DDA-based spectral libraries
To assemble the study-specific DDA-based spectral libraries from Kitata and colleagues and Guo and colleagues analysis, a database search of the DDA information from each analysis was carried out by MaxQuant (mannequin 1.5.3.12). The parameters for the search engine have been barely modified from the parameters reported by the authors of each analysis. The most number of cleavages was set to 2 and the brink for identification was set at 1% FDR. The variable modification of Hyp was included together with the variable modifications that the authors of each analysis reported. The spectral information for Hyp web sites have been compiled into an msp-formatted spectral library.
4.16. DIA information analysis
DIA information have been analyzed using DIA-NN (v1.8) [72]. The default workflow for analysis using a spectral library was adopted (https://github.com/vdemichev/diann). The DIA information from Kitata and colleagues and Guo and colleagues analysis have been analyzed individually with DIA-NN. FDR (q-value) for protein groups and Hyp site identification was set at 1.0%. The analysis of the DIA information from each analysis was carried out with spectral library from quite a few sources: HypDB Library, Study-Specific DDA-based Library, and Combined Library generated by every HypDB and Study-Specific DDA Analysis for every Hyp peptide identifications and non-Hyp peptide identifications. DIA-NN further utilized world q-value filtering and depth normalization to generate Hyp site matrix output for Hyp web sites which were confidently quantified all through all samples. Python scripts developed in-house to course of the output from DIA-NN to be Hyp site nonredundant. The matrix output from each analysis with nonredundant Hyp site quantification was used for clustering, annotation enrichment analysis, and visualization using the Perseus software program program platform [73]. Missing values have been imputed using a standard distribution, and the knowledge have been hierarchically clustered. The processed site-nonredundant Hyp depth information from DIA-NN was moreover analyzed and visualized using R. Missing values have been imputed using the k-nearest neighbor (KNN) method throughout the NAguideR system [74].