Performance analysis of machine learning algorithms for automated diagnosis using a dataflow-based approach on the grid


Machine learning and imaging analytics are major algorithmic components of the software used by medical practitioners in the diagnosis and treatment of diseases. Whether employed by computer aided diagnosis (CADx) or content-based image retrieval (CBIR) tools, the accuracy and relevance of the results to the practitioner are paramount to the success of any such application. In order to improve on the existing results researchers often find themselves in the need to explore various approaches and methodologies, often using very large datasets and multiple sources of information. Each of these trials can, by itself, be a very time-consuming operation. One tried and true strategy to speed up operations is the use of a distributed computing platform (delivering the computational load to a number of machines). This raises a set of problems which are often orthogonal to a researcher’s interest such as which algorithmic implementations scale or how to distribute data and tasks on the grid. In this article, we present a framework that empowers researchers to quickly design sets of tests, schedule their execution and have them automatically allocated to a grid environment for execution. We describe the design and implementation of the solution, and present as an example an experiment concerning the classification of mammography segmentations.


  • Frederico Valiente
  • Augusto Silva
  • Carlos Costa
  • José Miguel Franco Valiente
  • César Suárez Ortega

More info

International Journal of Image Mining 10/2015; 1(2/3):261 - 278. DOI: 10.1504/IJIM.2015.073027

Check the ResearchGate page