Stagger: an Open-Source Part of Speech Tagger for Swedish

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpu...

Full description

Saved in:
Bibliographic Details
Main Author: Robert Östling
Format: Article
Language:English
Published: Linköping University Electronic Press 2013-09-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/1653
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.
ISSN:2000-1533