Matthys Potgieter has developed a pipeline, MetaNovo, for processing mass-spectrometry-based proteomics data, configured to run on UCT's high performance cluster (hex), MetaNovo is probabilistic database optimization pipeline that uses MSMS de novo sequence tags to estimate representative protein sequence sets from very large search spaces prior to fdr controlled target-decoy database search. MetaNovo is typically used to create targeted databases from UniProt for metaproteome datasets but could also be used for proteogenomic database optimization.