Title

Parallel Support Vector Machines for Multi-Category Classification of Large Scale Data

Date of Award

2007

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computing

First Advisor

Chaoyang Zhang

Advisor Department

Computing

Abstract

Support Vector Machines (SVM) is a classification algorithm based on statistical learning theory, which has been receiving wide attention for classification problems, because of its accuracy and generalization property. There are many SVM tools available for classification, which perform well for binary classification, but multi-class classification increases the complexity of the problem and makes the computation more expensive or even prohibitive for larger datasets. In this work we propose a method to parallelize the classification of multi-class data based on Sequential Minimal Optimization (SMO) SVM algorithm using parallel computing techniques. This parallel implementation allows us to use high performance computing resources to perform multi-class classification in an efficient way. The SMO algorithm breaks down the classification problem into the smallest Quadratic Programming (QP) problems avoiding expensive numerical optimization. In this implementation the SMO algorithm is used to perform multi-class classification by building a series of binary classifiers and then use them to perform one-versus-one multi-class classification. The implementation was tested on many publicly available datasets. The variable size of the data for each class results in load balancing problems, which is solved by preprocessing the data set and scheduling the subtasks based on the size of the subtasks to improve the throughput. The load balancing issue is addressed by developing different mapping schemes to distribute the tasks to the parallel nodes. The parallel algorithm developed minimizes the communication between the nodes, by reducing the data transferred between the processes. Therefore the parallel algorithms developed can be studied in both shared and distributed memory parallel systems. SVMs can be used for classification of Gene expression data, which helps to analyze and understand many aspects of gene functions and irregularities. SVMs are also used in face recognition, text categorization, credit card processing, etc.