A practical method for searching scholarly papers in the General Index without a high-performance computer
The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguis...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Code4Lib
2023-12-01
|
| Series: | Code4Lib Journal |
| Online Access: | https://journal.code4lib.org/articles/17663 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at
https://osf.io/s39n7). It will briefly discuss what else can be done (and how) with more powerful computational resources. |
|---|---|
| ISSN: | 1940-5758 |