A practical method for searching scholarly papers in the General Index without a high-performance computer
The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguis...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Code4Lib
2023-12-01
|
| Series: | Code4Lib Journal |
| Online Access: | https://journal.code4lib.org/articles/17663 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849738196265467904 |
|---|---|
| author | Emily Cukier |
| author_facet | Emily Cukier |
| author_sort | Emily Cukier |
| collection | DOAJ |
| description | The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at
https://osf.io/s39n7). It will briefly discuss what else can be done (and how) with more powerful computational resources. |
| format | Article |
| id | doaj-art-ccd438f1a1df4c16bab1a3ad4db4e2aa |
| institution | DOAJ |
| issn | 1940-5758 |
| language | English |
| publishDate | 2023-12-01 |
| publisher | Code4Lib |
| record_format | Article |
| series | Code4Lib Journal |
| spelling | doaj-art-ccd438f1a1df4c16bab1a3ad4db4e2aa2025-08-20T03:06:42ZengCode4LibCode4Lib Journal1940-57582023-12-015817663A practical method for searching scholarly papers in the General Index without a high-performance computerEmily CukierThe General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at https://osf.io/s39n7). It will briefly discuss what else can be done (and how) with more powerful computational resources.https://journal.code4lib.org/articles/17663 |
| spellingShingle | Emily Cukier A practical method for searching scholarly papers in the General Index without a high-performance computer Code4Lib Journal |
| title | A practical method for searching scholarly papers in the General Index without a high-performance computer |
| title_full | A practical method for searching scholarly papers in the General Index without a high-performance computer |
| title_fullStr | A practical method for searching scholarly papers in the General Index without a high-performance computer |
| title_full_unstemmed | A practical method for searching scholarly papers in the General Index without a high-performance computer |
| title_short | A practical method for searching scholarly papers in the General Index without a high-performance computer |
| title_sort | practical method for searching scholarly papers in the general index without a high performance computer |
| url | https://journal.code4lib.org/articles/17663 |
| work_keys_str_mv | AT emilycukier apracticalmethodforsearchingscholarlypapersinthegeneralindexwithoutahighperformancecomputer AT emilycukier practicalmethodforsearchingscholarlypapersinthegeneralindexwithoutahighperformancecomputer |