Date of Award

Fall 12-2007

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computing

Committee Chair

Dr. Chaoyang Zhang

Committee Chair Department

Computing

Committee Member 2

Dr. Adel Ali

Committee Member 2 Department

Computing

Committee Member 3

Dr. Dia Ali

Committee Member 3 Department

Computing

Committee Member 4

Dr. Joseph Kolibal

Committee Member 4 Department

Mathematics

Committee Member 5

Dr. Ray Seyfarth

Committee Member 5 Department

Computing

Committee Member 6

Dr. Youping Deng

Abstract

Support Vector Machines (SVM) is a classification algorithm based on statistical learning theory, which has been receiving wide attention for classification problems, because of its accuracy and generalization property. There are many SVM tools available for classification, which perform well for binary classification, but multi-class classification increases the complexity of the problem and makes the computation more expensive or even prohibitive for larger datasets. In this work, we propose a method to parallelize the classification of multi-class data based on Sequential Minimal Optimization (SMO) SVM algorithm using parallel computing techniques. This parallel implementation allows us to use high-performance computing resources to perform multi-class classification in an efficient way. The SMO algorithm breaks down the classification problem into the smallest Quadratic Programming (QP) problems avoiding expensive numerical optimization. In this implementation, the SMO algorithm is used to perform multi-class classification by building a series of binary classifiers and then use them to perform one-versus-one multi-class classification. The implementation was tested on many publicly available datasets. The variable size of the data for each class results in load balancing problems, which is solved by preprocessing the data set and scheduling the subtasks based on the size of the subtasks to improve the throughput. The load balancing issue is addressed by developing different mapping schemes to distribute the tasks to the parallel nodes. The parallel algorithm developed minimizes the communication between the nodes, by reducing the data transferred between the processes. Therefore the parallel algorithms developed can be studied in both shared and distributed memory parallel systems. SVMs can be used for classification of Gene expression data, which helps to analyze and understand many aspects of gene functions and irregularities. SVMs are also used in face recognition, text categorization, credit card processing, etc..

Share

COinS