Documenting Geographically and Contextually Diverse Language Data Sources
Contemporary large-scale data collection efforts have prioritized the amount of data collected to improve large language models (LLM). This quantitative approach has resulted in concerns for the rights of data subjects represented in data collections. This concern is exacerbated by a lack of docume...
Saved in:
Main Authors: | Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, Gérard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ilić, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite |
---|---|
Format: | Article |
Language: | English |
Published: |
Linköping University Electronic Press
2025-01-01
|
Series: | Northern European Journal of Language Technology |
Online Access: | https://nejlt.ep.liu.se/article/view/5217 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Corps et vision du monde chez les Berbères de Kabylie
by: Tassadit Yacine
Published: (2019-07-01) -
Aranzadi (Pamplona) (13 de diciembre de 2019)
by: Aitor Gallardón
Published: (2020-07-01) -
Aranzadi (Pamplona) (13 de diciembre de 2019)
by: Aitor Gallardón
Published: (2020-07-01) -
Financing specifications production of small and medium enterprises
by: Ilić Đurđijana, et al.
Published: (2018-01-01) -
A hypothesis of 'How Advertising Works' and implications for marketers and advertisers
by: Brian McMillan
Published: (2022-11-01)