ProtRank -- Go beyond protein value imputation!
How we deal with "missing values" may always be controversial and I'm going to assume that no level of improvements in mass spectrometry engineering is going to be able to fix this. Sure, we can get better coverage, but sometimes that peptide just isn't going to be there -- maybe because it's a got a single amino acid variant (SAAV) or maybe because it's got a post translational modification in patient/or condition A that is not present at all in B.
At some level, though, we've got tough decision to make. Do you reeeeeeaaaallllly want to divide by zero? Or do you want to throw out that whole peptide measurement in your downstream analysis pipeline? It often makes sense to impute a value for that peptide or molecule that you can't see in your extracted chromatogram.
ProtRank may not be the ultimate solution (...cause...realistically there may not be one universal solution...), but it's a different take on this old problem. You can read about it in this new open article.
ProtRank is assembled in Python and is available at github here.
This study is interesting in it's examination of some extreme dataset models and looks at the biases typical imputation methods cause in them. One place that is really scary to impute is phosphoproteomics. A lot of phosphorylation sites change to such an extent that they exceed the linear dynamic range of the instruments (I don't fall into the school of thought that there are truly 100% on/off switches, I think it's different bi-stability cliffs -- I almost threw in some references here, but I really should go to work). Do you impute here?
Want to talk about a nightmare dataset? They look at phosphoproteomic shifts in IRRADIATED CELLS. DNA damage repair functions through phosphorylating everything it can to stop processes that make the radiation damage worse. The increases in phosphorylation are probably as big as you can get. Imputing some values shifts the data to the point that you lose a lot of the known phosphorylation changes. Whoops.
How much better does ProtRank do? In some part we have to wait and see. It is applied in a big biological study that is in preparation. This is the introduction and logic behind the code, and a nice way to say "download me!" So...
No comments