ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence

dc.catalogadoraba
dc.contributor.authorQiu, J.
dc.contributor.authorBernhofer, M.
dc.contributor.authorHeinzinger, M.
dc.contributor.authorKemper, S.
dc.contributor.authorNorambuena, T.
dc.contributor.authorMelo Ledermann, Francisco Javier
dc.contributor.authorRost, B.
dc.date.accessioned2025-02-05T20:35:18Z
dc.date.available2025-02-05T20:35:18Z
dc.date.issued2020
dc.description.abstractThe intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein–protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein–protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org
dc.format.extent16 páginas
dc.fuente.origenSIPA
dc.identifier.doi10.1016/j.jmb.2020.02.026
dc.identifier.eissn1089-8638
dc.identifier.issn0022-2836
dc.identifier.scopusid2-s2.0-85082481064
dc.identifier.urihttps://doi.org/10.1016/j.jmb.2020.02.026
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/102163
dc.identifier.wosidWOS:000532698400033
dc.information.autorucFacultad de Ciencias Biológicas; Melo Ledermann, Francisco Javier; 0000-0002-0424-5991; 82342
dc.issue.numero7
dc.language.isoen
dc.nota.accesocontenido parcial
dc.pagina.final2443
dc.pagina.inicio2428
dc.revistaJOURNAL OF MOLECULAR BIOLOGY
dc.rightsacceso restringido
dc.subjectBinding protein prediction
dc.subjectBinding residue prediction
dc.subjectProfile kernel SVM
dc.subjectProtVec
dc.subjectMachine learning
dc.subject.ddc570
dc.subject.deweyBiologíaes_ES
dc.subject.ods03 Good health and well-being
dc.subject.odspa03 Salud y bienestar
dc.titleProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence
dc.typeartículo
dc.volumen432
sipa.codpersvinculados82342
Files