# Academic Papers

Here are a few papers that are useful for understanding IPFS, whether it be understanding the IPFS spec itself or the background for the decentralized web, protocols, hashing, and so on.

# IPFS - Content Addressed, Versioned, P2P File System (opens new window)

Original IPFS white paper

Benet, Juan: The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyperlinks. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

# Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web (opens new window)

Association for Computing Machinery, 2022

Trautwein, Dennis and Raman, Aravindh and Tyson, Gareth and Castro, Ignacio and Scott, Will and Schubotz, Moritz and Gipp, Bela and Psaras, Yiannis: Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the “Decentralized Web” attempts to distribute ownership and operation of web services more evenly. This paper describes the design and implementation of the largest and most widely used Decentralized Web platform — the InterPlanetary File System (IPFS) — an open-source, content-addressable peer-to-peer network that provides distributed data storage and delivery. IPFS has millions of daily content retrievals and already underpins dozens of third-party applications. This paper evaluates the performance of IPFS by introducing a set of measurement methodologies that allow us to uncover the characteristics of peers in the IPFS network. We reveal presence in more than 2700 Autonomous Systems and 152 countries, the majority of which operate outside large central cloud providers like Amazon or Azure. We further evaluate IPFS performance, showing that both publication and retrieval delays are acceptable for a wide range of use cases. Finally, we share our datasets, experiences and lessons learned.

# A practicable approach towards secure key-based routing (opens new window)

Institute of Electrical and Electronics Engineers, 2007

Baumgart, Ingmar and Mies, Sebastian: Security is a common problem in completely decentralized peer-to-peer systems. Although several suggestions exist on how to create a secure key-based routing protocol, a practicable approach is still unattended. In this paper, we introduce a secure key-based routing protocol based on Kademlia that has a high resilience against common attacks by using parallel lookups over multiple disjoint paths, limiting free nodeId generation with crypto puzzles, and introducing a reliable sibling broadcast. The latter is needed to store data in a safe, replicated way. We evaluate the security of our proposed extensions to the Kademlia protocol analytically and simulate the effects of multiple disjoint paths on lookup success under the influence of adversarial nodes

# Democratizing Content Publication with Coral (opens new window)

USENIX Association, 2004

Freedman, Michael J., Freudenthal, Eric and Mazières, David: CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users accessing it. Publishing through CoralCDN is as simple as making a small change to the hostname in an object's URL; a peer-to-peer DNS layer transparently redirects browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the origin web server. One of the system's key goals is to avoid creating hot spots that might dissuade volunteers and hurt performance. It achieves this through Coral, a latency-optimized hierarchical indexing infrastructure based on a novel abstraction called a distributed sloppy hash table or DSHT.

# Escaping the Evils of Centralized Control with self-certifying pathnames (opens new window)

Association for Computing Machinery, 1998

Mazières, David and Kaashoek, M. Frans: People have long trusted central authorities to coordinate secure collaboration on local-area networks. Unfortunately, the Internet doesn't provide the kind of administrative structures individual organizations do. As such, users risk painful consequences if global, distributed systems rely on central authorities for security. Fortunately, security need not come at the price of centralized control. To prove it, we present SFS, a secure, global, decentralized file system permitting easy cross-administrative realm collaboration. With a simple idea, self-certifying pathnames, SFS lets users escape the evils of centralized control.

# Kademlia: A Peer-to-peer Information System Based on the XOR Metric (opens new window)

Springer-Verlag, 2002

Mazières, David and Maymounkov, Petar: We describe a peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment. Our system routes queries, and locates nodes, using a novel XOR-based metric topology that simplifies the algorithm and facilitates our proof. The topology has the property that every message exchanged conveys or re-inforces useful contact information. The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users.

# IPFS - the perspective storage infrastructure for scientific data (opens new window)

Presentation slides for IPFS Introductory Webinar made as proof of side activity of ExPaNDs (opens new window) project on Elettra Sincrotrone Trieste (opens new window) 24/09/2020. Based on PaNdata Continuum (opens new window) ontology.

Vukolov, Andrey: The presentation describes in an academic manner the advantages of IPFS as a data identification and storage system with built-in basic provenance.

# Openly reproducible Persistent Identifiers (PIDs) as a factor of FAIRness in data sharing practices (opens new window)

Presentation slides for European Open Science Cloud Symposium 2021 (opens new window).

Vukolov, Andrey: This presentation describes differences and influences on the FAIR data sharing model of decentralized persistent identifiers (PIDs) associated with data. As a living example of an existing decentralized, openly reproducible PID, the IPFS CID is described as part of the decentralized provenance system.