About Scholarly Search
See also: User Guide which lists some bugs and known issues
How It Works
Content in this search index comes from preservation copies at the Internet Archive in one of three forms:
- public web content in the Wayback Machine web archives (web.archive.org), either identified from historic collecting, crawled specifically to ensure long-term access to scholarly materials, or crawled at the direction of our Archive-It partners
- digitized print material from paper and microform collections purchased and scanned by Internet Archive or our partners
- general materials on the archive.org collections, including content from partner organizations, uploads from the general public, and mirrors of other projects
This 2019 FORCE11 conference presentation gives an overview of the technical infrastructure and goals of the project overall.
Metadata comes from fatcat.wiki, an open user-editable catalog of scholarly work. It should be possible to track and attribute the provenance of content and metadata in all cases; please contact us if you have questions or concenrs.
Text and Data Mining
We intend to provide researcher access to the full corpus for text and data mining purposes. Derived datasets may also be posted publicly for analysis, for example a citation graph or N-gram frequencies by year. If you are interested or would like to see specific datasets made available, please contact us.
Currently snapshots of the full fatcat metadata corpus and upstream metadata sources are uploaded periodically to the Bulk Bibliographic Metadata collection on archive.org. Read more in the Fatcat Guide.
The organizational contact information for the Internet Archive is listed at https://archive.org/about/contact.php. Queries about this search service and the fatcat catalog can be directed to email@example.com. There is a public chat channel at gitter.im/internetarchive/fatcat.