Introduction

This project was done as part of my Big Data Technology class.

Breast cancer stands as the leading cancer diagnosis among women globally, comprising a quarter of all cancer instances, with over 2.1 million individuals affected in 2015 alone. It initiates when cells within the breast experience uncontrolled growth, typically resulting in the formation of detectable tumors visible through X-ray (mammograms) imaging or palpable as lumps in the breast region.

The primary obstacle in detecting breast cancer lies in effectively categorizing tumors as either malignant (cancerous) or benign (non-cancerous).

Purpose

This program:

  • Trains a Decision Tree Classifier using the training set.
  • Trains 3 Decision Trees at different max depth of 3, 5 and 7 using Multiprocessing module. Makes predictions on testing set using trained decision trees and combines predictions using majority voting (ensemble).
  • Trains the dataset on the Support Vector Machine (RBF) Classifier.
  • Trains three SVM Classifiers (linear, RBF and Polynomial), makes predictions on testing set using trained classifiers and combine predictions using majority voting (ensemble).