Stagger: an Open-Source Part of Speech Tagger for Swedish

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpu...

Full description

Saved in:
Bibliographic Details
Main Author: Robert Östling
Format: Article
Language:English
Published: Linköping University Electronic Press 2013-09-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/1653
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590628495556608
author Robert Östling
author_facet Robert Östling
author_sort Robert Östling
collection DOAJ
description This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.
format Article
id doaj-art-b240a604d3174e308acecf0c8bdf7327
institution Kabale University
issn 2000-1533
language English
publishDate 2013-09-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-b240a604d3174e308acecf0c8bdf73272025-01-23T10:36:34ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332013-09-01310.3384/nejlt.2000-1533.1331Stagger: an Open-Source Part of Speech Tagger for SwedishRobert Östling0Stockholm University, Department of Linguistics This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance. https://nejlt.ep.liu.se/article/view/1653
spellingShingle Robert Östling
Stagger: an Open-Source Part of Speech Tagger for Swedish
Northern European Journal of Language Technology
title Stagger: an Open-Source Part of Speech Tagger for Swedish
title_full Stagger: an Open-Source Part of Speech Tagger for Swedish
title_fullStr Stagger: an Open-Source Part of Speech Tagger for Swedish
title_full_unstemmed Stagger: an Open-Source Part of Speech Tagger for Swedish
title_short Stagger: an Open-Source Part of Speech Tagger for Swedish
title_sort stagger an open source part of speech tagger for swedish
url https://nejlt.ep.liu.se/article/view/1653
work_keys_str_mv AT robertostling staggeranopensourcepartofspeechtaggerforswedish