Date of Award


Degree Type

Honors College Thesis

Academic Program

Mathematics BS



First Advisor

Jacob Chapman, Ph.D

Second Advisor

Bernd Schroeder, Ph.D.

Third Advisor

Sabine Heinhorst, Ph.D.

Advisor Department



Billions of dollars are lost within insurance companies due to fraud. Large money losses force insurance companies to increase premium costs and/or restrict policies. This negatively affects a company’s loyal customers. Although this is a prevalent problem, companies are not urgently working toward bettering their machine learning algorithms. Underskilled workers paired with inefficient computer algorithms make it difficult to accurately and reliably detect fraud.

The goal of this study is to understand the idea of -Nearest Neighbors ( -NN) and to use this classification technique to accurately detect fraudulent auto insurance claims. Using -NN requires choosing a value and a distance metric. The best choice of values and distance metrics will be unique to every dataset. This study aims to break down the processes involved in determining an accurate value and distance metric for a sample auto insurance claims dataset. Odd values 1 through 19 and the Euclidean, Manhattan, Chebyshev, and Hassanat metrics are analyzed using Excel and R.

Results support the idea that unique values and distance metrics are needed depending on the dataset being worked with.

Keywords: machine learning, insurance, fraud, detection, k-NN, distance