One Platform, Four Languages: Comparing English, Spanish, Hindi, and Russian YouTube
This study presents a comparative analysis of language-specific random samples of YouTube videos, focusing on English, Spanish, Hindi, and Russian. We produce a large random sample, retrieve metadata, calibrate and deploy language-detection software, and extract four high-confidence language samples...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SAGE Publishing
2025-08-01
|
| Series: | Social Media + Society |
| Online Access: | https://doi.org/10.1177/20563051251363216 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study presents a comparative analysis of language-specific random samples of YouTube videos, focusing on English, Spanish, Hindi, and Russian. We produce a large random sample, retrieve metadata, calibrate and deploy language-detection software, and extract four high-confidence language samples. Through an analysis of upload dates, popularity, duration, and category metadata, we highlight patterns and anomalies among our samples. For example, English YouTube has the smallest proportion of videos categorized as “News & Politics,” and Spanish videos have a longer median duration. The most salient contrast, however, is between Hindi YouTube and the other three languages. Hindi videos are much shorter and much newer, with sharp growth since 2020 and more than half of the sample uploaded in 2023 alone. The Hindi sample also exhibits a different pattern of liking, with the lowest percentage of videos with just zero or one like even while it has the highest percentage of videos with just zero or one view. These findings may help to quantify the migration of India’s short-form video culture, based around TikTok, to YouTube when TikTok was banned in the country in 2020. This study underscores the necessity of multilingual and culturally specific approaches to platform research by drawing attention to the heterogeneity of YouTube. We propose this method as a starting point to understand linguistic communities on YouTube, surfacing trends and exceptions while providing cues for more content-focused study. |
|---|---|
| ISSN: | 2056-3051 |