A practical method for searching scholarly papers in the General Index without a high-performance computer

The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguis...

Full description

Saved in:
Bibliographic Details
Main Author: Emily Cukier
Format: Article
Language:English
Published: Code4Lib 2023-12-01
Series:Code4Lib Journal
Online Access:https://journal.code4lib.org/articles/17663
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at https://osf.io/s39n7). It will briefly discuss what else can be done (and how) with more powerful computational resources.
ISSN:1940-5758