Processing morphological variants in searches of Latin text

A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user...

Full description

Saved in:
Bibliographic Details
Main Authors: Mark Greengrass, Alexander M. Robertson, Robyn Schinke, Peter Willett
Format: Article
Language:English
Published: University of Borås 1996-01-01
Series:Information Research: An International Electronic Journal
Subjects:
Online Access:http://informationr.net/ir/2-1/paper10.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user-controlled right-hand (and occasionally left-hand) truncation to allow the retrieval of all words with the same root. A stemming algorithm, or stemmer, is a computational procedure that reduces all words with the same root to a single form by stripping the root of its derivational and inflectional affixes. In most cases, only suffixes are stripped so that a stemmer provides an automatic equivalent of manual, right-hand truncation. Thus far, most work on stemmers has focused on present-day languages, but the increasing user of computers in the humanities has resulted in a need for comparable tools to facilitate searching in historical text databases. This paper summarises some of the initial results of a project here in Sheffield to develop such tools for databases of Latin text.
ISSN:1368-1613