Date of Award
Fall 12-2007
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computing
Committee Chair
Dr. Chaoyang Zhang
Committee Chair Department
Computing
Committee Member 2
Dr. Adel Ali
Committee Member 2 Department
Computing
Committee Member 3
Dr. Dia Ali
Committee Member 3 Department
Computing
Committee Member 4
Dr. Joseph Kolibal
Committee Member 4 Department
Mathematics
Committee Member 5
Dr. Ray Seyfarth
Committee Member 5 Department
Computing
Committee Member 6
Dr. Youping Deng
Abstract
Support Vector Machines (SVM) is a classification algorithm based on statistical learning theory, which has been receiving wide attention for classification problems, because of its accuracy and generalization property. There are many SVM tools available for classification, which perform well for binary classification, but multi-class classification increases the complexity of the problem and makes the computation more expensive or even prohibitive for larger datasets. In this work, we propose a method to parallelize the classification of multi-class data based on Sequential Minimal Optimization (SMO) SVM algorithm using parallel computing techniques. This parallel implementation allows us to use high-performance computing resources to perform multi-class classification in an efficient way. The SMO algorithm breaks down the classification problem into the smallest Quadratic Programming (QP) problems avoiding expensive numerical optimization. In this implementation, the SMO algorithm is used to perform multi-class classification by building a series of binary classifiers and then use them to perform one-versus-one multi-class classification. The implementation was tested on many publicly available datasets. The variable size of the data for each class results in load balancing problems, which is solved by preprocessing the data set and scheduling the subtasks based on the size of the subtasks to improve the throughput. The load balancing issue is addressed by developing different mapping schemes to distribute the tasks to the parallel nodes. The parallel algorithm developed minimizes the communication between the nodes, by reducing the data transferred between the processes. Therefore the parallel algorithms developed can be studied in both shared and distributed memory parallel systems. SVMs can be used for classification of Gene expression data, which helps to analyze and understand many aspects of gene functions and irregularities. SVMs are also used in face recognition, text categorization, credit card processing, etc..
Copyright
2007, Arun Kumar Rajendran
Recommended Citation
Rajendran, Arun Kumar, "PARALLEL SUPPORT VECTOR MACHINES FOR MULTI-CATEGORY CLASSIFICATION OF LARGE SCALE DATA" (2007). Dissertations. 1334.
http://aquila.usm.edu/dissertations/1334