Academic Torrents

Last updated
Academic Torrents
Academic Torrents Logo New.png
Academic-torrents-screenshot.png
Type of site
Country of originUnited States
OwnerInstitute for Reproducible Research
Founder(s)
  • Joseph Paul Cohen
  • Henry Z Lo
IndustryNon-profit
URL academictorrents.com
Launched2013
Current statusActive

Academic Torrents [1] [2] [3] [4] [5] [6] is a platform that enables the sharing of research data using the BitTorrent protocol. Launched in November 2013, a U.S.-based 501(c)(3) non-profit organization. [7] [8] Similar to LOCKSS, Academic Torrents focuses on providing open access to research materials and supporting reproducibility in scientific studies. They do so by "offering researchers the opportunity to distribute the hosting of their papers and datasets to authors and readers, providing easy access to scholarly works and simultaneously backing them up on computers around the world." [9] [10]

Contents

Mission and Purpose

Academic Torrents aims to enhance the accessibility and preservation of research data by leveraging BitTorrent’s decentralized file-sharing technology. The platform supports researchers by reducing hosting costs, improving download speeds, and ensuring data redundancy across global networks. Its mission aligns with promoting open science and reproducible research, allowing academics to share datasets, papers, and other scholarly resources freely. [11]

Notable datasets

Reddit Comments and Submissions Dataset

Academic Torrents hosts a large collection of Reddit comment and submission datasets spanning June 2005 to June 2025, compiled through the Pushshift project, totaling over 3.4TB. [12] [13] The dataset comprises 476 zstandard-compressed NDJSON files, including monthly submission archives such as RS_2025-06.zst (18.68 GB) and earlier files like RS_2021-06.zst (9.46 GB). [13] It supports research in social media analysis, natural language processing, and computational social science, offering a historical archive of Reddit activity. [12] Python scripts for parsing the data are available on GitHub, facilitating programmatic access. [14] Distributed via BitTorrent, the dataset ensures efficient access and long-term preservation for researchers studying online communities. [13]

Several studies have specifically cited the dataset's availability through Academic Torrents for accessing Reddit data. For instance, Andrei (2025) analyzed hate speech trends on Reddit during Donald Trump's 2024–2025 presidential campaign, using approximately 500 GB of data from the platform to apply NLP techniques like BERT for classification and HDBSCAN for clustering targets. [15] Goyal et al. (2025) employed comments from the r/wallstreetbets subreddit, sourced via Academic Torrents, to develop sentiment-based predictive trading strategies, achieving higher returns than buy-and-hold approaches through metrics like Sentiment Volume Change. [16] Baumgartner et al. (2023) utilized the dataset to compare health-related vocabulary usage between laypeople and medical professionals on the r/AskDocs subreddit, reproducing their corpus from the bulk data available on Academic Torrents. [17] Boraske and Burns (2025) drew from the AITA subreddit data hosted on Academic Torrents to align large language models with human moral judgments, using nearly 50,000 submissions and comments to improve LLM accuracy in ethical evaluations. [18] Popoola et al. (2024) explored over 143,000 Reddit posts on computing internships, using topic modeling and sentiment analysis to identify prevalent themes like academics and career, sourced from Academic Torrents. [19]

Developing Human Connectome Project

The developing Human Connectome Project related to the Human Connectome Project uses the platform. "Researchers from three leading British institutions are using BitTorrent to share over 150 GB of unique high-resolution brain scans of unborn babies with colleagues worldwide... The researchers opted to go for the Academic Torrents tracker, which specializes in sharing research data" [20]

CrossRef metadata

The site hosts public metadata releases from Crossref which contain over 120+ million metadata records for scholarly work, each with a DOI. This was done so to allow the community to work with the entire database programmatically instead of using their API. "The sheer number of records means that, though anyone can use these records anytime, downloading them all via our APIs can be quite time-consuming. We hope this saves the research community valuable time during this crisis." [21] [22]

See also

References

  1. Miccoli, Fräntz (2014). "Academic Torrents: Bringing P2P Technology to the Academic World". MyScienceWork. Archived from the original on 26 July 2020. Retrieved 6 May 2020.
  2. Ernesto (31 Jan 2014). "Academics Launch Torrent Site to Share Papers and Datasets". Torrent Freak. Archived from the original on 28 May 2020. Retrieved 6 May 2020.
  3. Cohen, Joseph Paul (Oct 2016). "What is Academic Torrents and Where is Data Sharing Going?". KDnuggets. Archived from the original on 8 June 2020. Retrieved 6 May 2020.
  4. Bakshi, Kirti (18 Aug 2018). "Academic Torrents: A Distributed System For Sharing Enormous Datasets". TechLeer. Archived from the original on 22 September 2020. Retrieved 6 May 2020.
  5. Cohen, Joseph (July 2014). "Academic Torrents: A Community-Maintained Distributed Repository". Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment. pp. 1–2. doi:10.1145/2616498.2616528. ISBN   9781450328937. S2CID   5813384.
  6. Lo, Henry (14 Mar 2016). "Academic Torrents: Scalable Data Distribution". Neural Information Processing Systems Challenges in Machine Learning (CiML) Workshop. arXiv: 1603.04395 .
  7. "Institute for Reproducible Research Webpage". Archived from the original on 17 January 2023. Retrieved 4 May 2020.
  8. "Tax Exempt Organization Search". United States IRS. Archived from the original on May 31, 2020. Retrieved 4 May 2020.
  9. Chant, Ian (13 Feb 2014). "Academic Torrents Offers New Means of Storing, Distributing Scholarly Content". Library Journal. Archived from the original on 7 May 2021. Retrieved 4 May 2020.
  10. Turk, Victoria (3 Feb 2014). "A Torrent Site Wants to Be the New Academic Library". Vice News. Archived from the original on 25 September 2020. Retrieved 4 May 2020.
  11. "About - Academic Torrents" . Retrieved 2025-08-10.
  12. 1 2 Baumgartner, Jason; Zannettou, Savvas; Keegan, Brian; Squire, Megan; Blackburn, Jeremy (2020). "The Pushshift Reddit Dataset". Proceedings of the International AAAI Conference on Web and Social Media. 14 (1): 830–839. doi:10.1609/icwsm.v14i1.7347 . Retrieved 2025-08-10.
  13. 1 2 3 "Reddit comments/submissions 2005-06 to 2025-06". Academic Torrents. Retrieved 2025-08-10.
  14. Watchful1. "PushshiftDumps: Scripts for parsing Reddit data dumps". GitHub. Retrieved 2025-08-10.{{cite web}}: CS1 maint: numeric names: authors list (link)
  15. Andrei, Rebecca (2025). Election-Driven Trends in Reddit Hate Speech: A Clustering Approach to Target Analysis Before and During Trump Campaign Periods in USA (PDF) (Bachelor's thesis). University of Twente. Retrieved 2025-08-10.
  16. Goyal, Gatik; Phadke, Sharvil; Sharma, Arnav; Qin, Huifang (2025). "Leveraging Social Media Sentiment for Predictive Algorithmic Trading Strategies". arXiv: 2508.02089 [cs.SI].
  17. Baumgartner, Lucien; Reuter, Kevin; Varga, Somogy (2023). "A Common Language? Analyzing the Use of Health-Related Vocabulary Between Laypeople and Medical Professionals". Proceedings of the Annual Meeting of the Cognitive Science Society. 45. Retrieved 2025-08-10.
  18. Boraske, Matthew; Burns, Richard (2025). Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation. Proceedings of the Thirty-Eighth International Florida Artificial Intelligence Research Society Conference. Florida Artificial Intelligence Research Society. Retrieved 2025-08-10.
  19. Popoola, Saheed; Vollem, Ashwitha; Nti, Kofi (2024). "What do Computing Interns Discuss Online? An Empirical Analysis of Reddit Posts". arXiv: 2412.13296 [cs.SE].
  20. Ernesto (3 June 2017). "Torrents Help Researchers Worldwide to Study Babies' Brains". TorrentFreak. Archived from the original on 5 January 2018. Retrieved 6 May 2020.
  21. jkcrossref. "Free public data file of 112+ million Crossref records". Crossref. doi: 10.64000/wsnyw-yap64 . Archived from the original on 2021-05-22. Retrieved 2021-05-22.
  22. jkcrossref. "New public data file: 120+ million metadata records". Crossref. doi: 10.64000/96h9h-b8437 . Archived from the original on 2021-05-22. Retrieved 2021-05-22.