Dissertations

Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application

Venkata Jagan Mohan Thodima, University of Southern MississippiFollow

Date of Award

Summer 8-2008

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Biological Sciences

Committee Chair

Mohammed Elasri

Committee Chair Department

Biological Sciences

Committee Member 2

Mac Alford

Committee Member 2 Department

Biological Sciences

Committee Member 3

Joe Zhang

Committee Member 3 Department

Biological Sciences

Committee Member 4

Jonathan Sun

Abstract

With more and more biological information generated, the most pressing task of bioinformatics has become to analyze and interpret various types of data, including nucleotide and amino acid sequences, protein structures, gene expression profiling and so on. In this dissertation, we apply the data mining techniques of feature generation, feature selection, and feature integration with learning algorithms to tackle the problems of disease phenotype classification, clinical outcome and patient survival prediction from gene expression profiles.

We analyzed the effect of batch noise in microarray data on the performance of classification. Batchmatch, a batch adjusting algorithm based on double scaling method is advantageous over Combat, another batch correcting algorithm based on the empirical bayes frame work. In order to identify genes associated with disease phenotype classification or patient survival prediction from gene expression data, we compared and analyzed the performance of five feature selection algorithms. Our observations from these studies indicated that Gainratio algorithm performs better and more consistently over the other algorithms studied.

When it comes to performance metric to choose the best classifiers, MCC gives unbiased performance results over accuracy in some endpoints, where class imbalance is more. In the aspect of classification algorithms, no single algorithm is absolutely superior to all others, though SVM achieved fairly good results in most endpoints. Naive bayes algorithm also performed well in some endpoints. Overall, from the total 65 models we reported (5 top models for 13 end points) SVM and SMO (a variant of SVM) dominate mostly, also the linear kernel performed well over RBF in our binary classifications.

Copyright

2008, Venkata Jagan Mohan Thodima

Recommended Citation

Thodima, Venkata Jagan Mohan, "Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application" (2008). Dissertations. 1164.
https://aquila.usm.edu/dissertations/1164

Download

Included in

Bioinformatics Commons, Biology Commons

COinS

Dissertations

Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application

Date of Award

Degree Type

Degree Name

Department

Committee Chair

Committee Chair Department

Committee Member 2

Committee Member 2 Department

Committee Member 3

Committee Member 3 Department

Committee Member 4

Abstract

Copyright

Recommended Citation

Included in

Search

Browse

Author Corner

Dissertations

Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application

Author

Date of Award

Degree Type

Degree Name

Department

Committee Chair

Committee Chair Department

Committee Member 2

Committee Member 2 Department

Committee Member 3

Committee Member 3 Department

Committee Member 4

Abstract

Copyright

Recommended Citation

Included in

Share

Search

Browse

Author Corner