Stagger: an Open-Source Part of Speech Tagger for Swedish
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpu...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Linköping University Electronic Press
2013-09-01
|
Series: | Northern European Journal of Language Technology |
Online Access: | https://nejlt.ep.liu.se/article/view/1653 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590628495556608 |
---|---|
author | Robert Östling |
author_facet | Robert Östling |
author_sort | Robert Östling |
collection | DOAJ |
description |
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.
|
format | Article |
id | doaj-art-b240a604d3174e308acecf0c8bdf7327 |
institution | Kabale University |
issn | 2000-1533 |
language | English |
publishDate | 2013-09-01 |
publisher | Linköping University Electronic Press |
record_format | Article |
series | Northern European Journal of Language Technology |
spelling | doaj-art-b240a604d3174e308acecf0c8bdf73272025-01-23T10:36:34ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332013-09-01310.3384/nejlt.2000-1533.1331Stagger: an Open-Source Part of Speech Tagger for SwedishRobert Östling0Stockholm University, Department of Linguistics This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance. https://nejlt.ep.liu.se/article/view/1653 |
spellingShingle | Robert Östling Stagger: an Open-Source Part of Speech Tagger for Swedish Northern European Journal of Language Technology |
title | Stagger: an Open-Source Part of Speech Tagger for Swedish |
title_full | Stagger: an Open-Source Part of Speech Tagger for Swedish |
title_fullStr | Stagger: an Open-Source Part of Speech Tagger for Swedish |
title_full_unstemmed | Stagger: an Open-Source Part of Speech Tagger for Swedish |
title_short | Stagger: an Open-Source Part of Speech Tagger for Swedish |
title_sort | stagger an open source part of speech tagger for swedish |
url | https://nejlt.ep.liu.se/article/view/1653 |
work_keys_str_mv | AT robertostling staggeranopensourcepartofspeechtaggerforswedish |