Scholar User Guide
See also: About Scholarly Search
This service provides fulltext searching over research publications archived in Internet Archive's various collections. It includes content from the natural sciences, humanities, biomedicine, art, history, industrial research, government reports, and more.
Reader access to the content is provided when possible. Sometimes this access is to a "pre-print" or other version of the work, and this is indicated in the search results. In other cases, depending on search filters, results are included for which there is only a bibliographic catalog entry. It may still be possible to obtain access through a public library or from the publisher directly.
In addition to the basic filtering and sorting options, this search
interface also allows the use of Lucene query syntax in the search box. You can
restrict term queries on multiple metadata fields using colon statements like
journal:Science, set filters like
apply range queries like
While this syntax allows for relatively complex and powerful queries, at some point advanced users may run into limits on the size or complexity of queries. For the time being we recommend systems like lens.org for a more powerful interface.
Search for digitized pages about a topic from specific years:
Search for papers in Chinese matching a term:
Conference papers with an author name query:
As an experimental feature, if the search query "looks like" a formal citation, as found in the bibliography of a research paper, the service will attempt to parse the citation and do a match against our catalog of known works. When this happens, any filters are ignored.
You can restrict to records where the field exists with an asterisk like
doi:*, and negate any term like
In-depth documentation of the query syntax is available from the Elasticsearch project.
The complete current search document schema is available (as JSON) in the source code.
|type:||eg, "article-journal", "dataset", "book"|
|stage:||eg, "published", "submitted", "accepted", "draft"|
|lang;||value is a 2-character lower-case ISO lanuage code)|
|country:||value is a 2-character lower-case ISO country code|
|access_type:||"wayback", "ia_file", "ia_sim"|
Search results may have tag labels which provide additional context about the work. For example, indexes the journal is included in, or open platform technology used for publications.
|Multiple Versions||There are multiple released "versions" or "editions" of this work, and bibliographic metadata for the "primary" is being shown. Click the title to see other versions|
|lang:en||The primary language of this work is different from the search interface language. The ISO two-letter language code is indicated|
|DOAJ||Published in a Directory of Open Access Journals publication, which implies that this is an Open Access work|
|Szczepanski||Publication indexed in Szczepanski's List of Open Access Journals, which implies that this is an Open Access work|
|Open Access||The work is believed to be "Open Access" for any other reason|
|SciELO||Published on a SciELO national platform|
|OJS||Published using Open Journal Systems software|
|Wordpress||Published using WordPress software|
|JSTOR||Preserved and/or hosted on the JSTOR digital preservation platform|
Underneath search results, and alternate version listings, are any known "persistent identifiers" that uniquely identify the specific version of the work. These are usually hyperlinks.
|doi:||Digital Object Identifier (DOI), provides a redirect to the publisher's landing page|
|arxiv:||arXiv pre-print service|
|dblp:||Digital Bibliography of Logic Programming|
|doaj:||Article-level identifier for works in DOAJ, particularly those with no DOI|
|fatcat:||fatcat.wiki "release" identifier. Scholar is built on top of the fatcat catalog|
Work In Progress
Some known bugs and issues:
- Poor metadata quality for conference proceedings. Many are labeled "unpublished" and are not associated with the conference.
- Duplicate versions of same work. Eg, different versions of the same paper or dataset. We are working on basic entity-deduplication in the fatcat catalog.
- Mis-matching of file content or version with work metadata. For example, sometimes pre-prints or author manuscripts are incorrectly associated with version-of-record metadata, or vica-versa.