Entropy and type-token ratio in gigaword corpora
There are different ways of measuring diversity in complex systems. In particular, in language, lexical diversity is characterized in terms of the type-token ratio and the word entropy. We here investigate both diversity metrics in six massive linguistic data sets in English, Spanish, and Turkish, c...
Saved in:
| Main Authors: | Pablo Rosillo-Rodes, Maxi San Miguel, David Sánchez |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
American Physical Society
2025-07-01
|
| Series: | Physical Review Research |
| Online Access: | http://doi.org/10.1103/rxxz-lk3n |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Corpora for computational linguistics
by: Constantin Orasan, et al.
Published: (2007-10-01) -
SPOKEN CORPORA: RATIONALE AND APPLICATION
by: John Newman
Published: (2008-12-01) -
Corpora in language teaching and learning
by: Kate Beeching
Published: (2014-01-01) -
Terminology in the age of multilingual corpora
by: Alan K. Melby
Published: (2012-07-01) -
Corpora and Students' Autonomy in Scientific and Technical Translation training
by: Clara Inés López-Rodríguez, et al.
Published: (2008-01-01)