Generating Indices With Lexical Association Methods: Term Uniqueness
A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary — the terms must be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms the highest 20, 40, 60 and 7.5% of the unique words —were compared and uniqueness factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms produced by the exact and centroid discrimination value were also nearly identical. Unique term sets were produced by the occurrence, variance and discrimination value (centroid). An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
Information Processing and Management
Huffman, G. D.,
Vital, D. A.,
Bivins, R. G.
(1990). Generating Indices With Lexical Association Methods: Term Uniqueness. Information Processing and Management, 26(4), 549-558.
Available at: http://aquila.usm.edu/fac_pubs/7439