On Using Self-Report Studies to Analyze Language Models

We are at a curious point in time where our ability to build language models (LMs) has outpaced our ability to analyze them. We do not really know how to reliably determine their capabilities, biases, dangers, knowledge, and so on. The benchmarks we have are often overly specific, do not generalize...

Full description

Saved in:
Bibliographic Details
Main Author: Matúš Pikuliak
Format: Article
Language:English
Published: Linköping University Electronic Press 2024-09-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/5000
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832591272475361280
author Matúš Pikuliak
author_facet Matúš Pikuliak
author_sort Matúš Pikuliak
collection DOAJ
description We are at a curious point in time where our ability to build language models (LMs) has outpaced our ability to analyze them. We do not really know how to reliably determine their capabilities, biases, dangers, knowledge, and so on. The benchmarks we have are often overly specific, do not generalize well, and are susceptible to data leakage. Recently, I have noticed a trend of using self-report studies, such as various polls and questionnaires originally designed for humans, to analyze the properties of LMs. I think that this approach can easily lead to false results, which can be quite dangerous considering the current discussions on AI safety, governance, and regulation. To illustrate my point, I will delve deeper into several papers that employ self-report methodologies and I will try to highlight some of their weaknesses.
format Article
id doaj-art-e87718772025478e9c1771a65a315028
institution Kabale University
issn 2000-1533
language English
publishDate 2024-09-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-e87718772025478e9c1771a65a3150282025-01-22T15:24:16ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332024-09-0110110.3384/nejlt.2000-1533.2024.5000On Using Self-Report Studies to Analyze Language ModelsMatúš Pikuliak0Kempelen Institute of Intelligent Technologies We are at a curious point in time where our ability to build language models (LMs) has outpaced our ability to analyze them. We do not really know how to reliably determine their capabilities, biases, dangers, knowledge, and so on. The benchmarks we have are often overly specific, do not generalize well, and are susceptible to data leakage. Recently, I have noticed a trend of using self-report studies, such as various polls and questionnaires originally designed for humans, to analyze the properties of LMs. I think that this approach can easily lead to false results, which can be quite dangerous considering the current discussions on AI safety, governance, and regulation. To illustrate my point, I will delve deeper into several papers that employ self-report methodologies and I will try to highlight some of their weaknesses. https://nejlt.ep.liu.se/article/view/5000
spellingShingle Matúš Pikuliak
On Using Self-Report Studies to Analyze Language Models
Northern European Journal of Language Technology
title On Using Self-Report Studies to Analyze Language Models
title_full On Using Self-Report Studies to Analyze Language Models
title_fullStr On Using Self-Report Studies to Analyze Language Models
title_full_unstemmed On Using Self-Report Studies to Analyze Language Models
title_short On Using Self-Report Studies to Analyze Language Models
title_sort on using self report studies to analyze language models
url https://nejlt.ep.liu.se/article/view/5000
work_keys_str_mv AT matuspikuliak onusingselfreportstudiestoanalyzelanguagemodels