# Content Identifiers (CIDs)

As described in IPFS and the problems it solves, IPFS is a modular suite of protocols purpose built for the organization and transfer of content-addressed data. In this guide, you'll learn more about the fundamentals of content-addressing in IPFS and how IPFS uses Content Identifiers (CIDs) to handle content-addressed data.

# What is a CID?

A content identifier, or CID, is a label used to point to material in IPFS. It doesn't indicate where the content is stored, but it forms a kind of address based on the content itself. CIDs are short, regardless of the size of their underlying content.

CIDs are based on the content’s cryptographic hash. That means:

  • Any difference in the content will produce a different CID.
  • The same content added to two different IPFS nodes using the same settings will produce the same CID.

IPFS uses the sha-256 hashing algorithm by default, but there is support for many other algorithms. The Multihash (opens new window) project represents the work for this, with the aim of future-proofing applications' use of hashes and allowing multiple hash functions to coexist. (If you're curious about how hash types in IPFS are decided upon, you may wish to keep an eye on this forum discussion (opens new window).)

# How CIDs are created

CIDs contain the hash and the codec of the data. A CID can be represented in string or binary format. In general, the CID is generated for each block by:

  1. Computing a cryptographic hash of the block's data.
  2. Combining that hash with codec information about the block using multiformats:
    • Multihash for information on the algorithm used to hash the data.
    • Multicodec for information on how to interpret the hashed data after it has been fetched.
    • Multibase for information on how the hashed data is encoded. Multibase is only used in the string representation of the CID.

CIDs will not match the hash of the data While a data block's CID is constructed using the cryptographic hash of the data block, a CID contains additional information (described above) that the hash does not. For further information, see CIDs are not file hashes below.

For a break-down of an actual CID, see this example with the IPFS CID inspector (opens new window).

# CIDs are not file hashes

Hash functions are widely used to check for file integrity. Because IPFS splits content into blocks and verifies them through directed acyclic graphs (DAGs), SHA file hashes won't match CIDs. Here's an example of what will happen if you try to do that.

A download provider may publish the output of a hash function for a file, often called a checksum. The checksum enables users to verify that a file has not been altered since it was published. This check is done by performing the same hash function against the downloaded file that was used to generate the checksum. If that checksum that the user receives from the downloaded file exactly matches the checksum on the website, then the user knows that the file was not altered and can be trusted.

For example, when you download an image file for Ubuntu Linux (opens new window) you might see the following SHA-256 checksum on the Ubuntu website listed for verification purposes:

0xB45165ED3CD437B9FFAD02A2AAD22A4DDC69162470E2622982889CE5826F6E3D ubuntu-20.04.1-desktop-amd64.iso

After downloading the Ubuntu image, you can verify the integrity of the file by hashing the file to make sure the checksums match:

echo "b45165ed3cd437b9ffad02a2aad22a4ddc69162470e2622982889ce5826f6e3d *ubuntu-20.04.1-desktop-amd64.iso" | shasum -a 256 --check

ubuntu-20.04.1-desktop-amd64.iso: OK

If we add the ubuntu-20.04.1-desktop-amd64.iso file to IPFS we receive a hash as an output:

ipfs add ubuntu-20.04.1-desktop-amd64.iso

added QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB ubuntu-20.04.1-desktop-amd64.iso
 2.59 GiB / 2.59 GiB [==========================================================================================] 100.00%

The string QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB returned by the ipfs add command is the content identifier (CID) of the file ubuntu-20.04.1-desktop-amd64.iso. We can use the CID Inspector (opens new window) to see what the CID includes. The actual hash is listed under DIGEST (HEX):

NAME: sha2-256
BITS: 256
DIGEST (HEX): 0E7071C59DF3B9454D1D18A15270AA36D54F89606A576DC621757AFD44AD1D2E

TIP

The names of hash functions are not used consistently.SHA-2, SHA-256 or SHA-256 bit all refer to the same hash function.

We can now check if the hash contained in the CID equals the checksum for the file:

echo "0E7071C59DF3B9454D1D18A15270AA36D54F89606A576DC621757AFD44AD1D2E *ubuntu-20.04.1-desktop-amd64.iso" | shasum -a 256 --check

ubuntu-20.04.1-desktop-amd64.iso: FAILED
shasum: WARNING: 1 computed checksum did NOT match

As we can see, the hash included in the CID does NOT match the hash of the input file ubuntu-20.04.1-desktop-amd64.iso.

# CID versions

CIDs can take a few different forms with different encoding bases or CID versions. Many of the existing IPFS tools still generate v0 CIDs, although the files (Mutable File System) and object operations now use CIDv1 by default.

# Version 0 (v0)

When IPFS was first designed, we used base 58-encoded multihashes as the content identifiers. This is simpler but much less flexible than newer CIDs. CIDv0 is still used by default for many IPFS operations, so you should generally support v0.

If a CID is 46 characters starting with "Qm", it's a CIDv0 (for more details, check the decoding algorithm (opens new window) in the CID specification).

# Version 1 (v1)

CID v1 contains some leading identifiers that clarify exactly which representation is used, along with the content-hash itself. These include:

  • A multibase (opens new window) prefix, specifying the encoding used for the remainder of the CID
  • A CID version identifier, which indicates which version of CID this is
  • A multicodec (opens new window) identifier, indicating the format of the target content — it helps people and software to know how to interpret that content after the content is fetched

These leading identifiers also provide forward-compatibility, supporting different formats to be used in future versions of CID.

You can use the first few bytes of the CID to interpret the remainder of the content address and know how to decode the content after being fetched from IPFS. For more details, check out the CID specification (opens new window). It includes a decoding algorithm (opens new window) and links to existing software implementations for decoding CIDs.

If you can't decide between CIDv0 and CIDv1, consider choosing CIDv1 for your new project and opt in by passing a version flag (ipfs add --cid-version 1). This is more future-proof and safe for use in browser contexts.

The IPFS project will switch to CIDv1 as the new default in the near future.

# CID Inspector

It's easy to explore a CID for yourself. Want to pull apart a specific CID's multibase, multicodec, or multihash info? You can use the CID Inspector (opens new window) or the CID Info panel in IPLD Explorer (opens new window) (both links launch using a sample CID) for an interactive breakdown of differently-formatted CIDs.

Check out ProtoSchool's Anatomy of a CID (opens new window) tutorial to see how a single file can be represented in multiple CID versions.

# CID conversion

Converting a CID from v0 to v1 enables it to be represented in multibase encodings. The default for CIDv1 is the case-insensitive base32, but use of the shorter base36 is encouraged for IPNS names to ensure same text representation on subdomains.

# v0 to v1

The built-in ipfs cid format command can be used from the command line:

$ ipfs cid format -v 1 -b base32 QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR
bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi

JavaScript users can also leverage the toV1() method provided by the multiformats (opens new window) library:

const v0 = CID.parse('QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n')
v0.toString()
//> 'QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n'
v0.toV1().toString()
//> 'bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku'

# v1 to v0

Given a CID v1, JS users can convert back to v0 using the toV0() method provided by the multiformats (opens new window) library:

const v1 = CID.parse('bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku')
v1.toString()
//> 'bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku'
v1.toV0().toString()
//> 'QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n'

See CID conversion in action See the interactive code sandbox for an example JS application that converts between CID versions and encodings.

# Converting between CID base encodings

A CID can be encoded using any of the encodings specified in the multibase table (opens new window). The use of different encodings can impact speed and storage efficiency.

To convert a CIDv1 cidV1 from one encoding to another, use the toString() method. By default, toString() will return the base32 string representation of the CID, but you can use other string representations:

const cidV1StringBase32 = cidV1.toString();

The following example returns the base256 emoji encoding of the CID:

const cidV1StringBase256 = cidV1.toString(base256emoji);

Using .bytes, the following example returns the raw bytes of the CID:

const cidV1Bytes = cidV1.bytes

See CID conversion in action See the interactive code sandbox for an example JS application that converts between CID versions and encodings.

# CID to hex

Sometimes, a hexadecimal (opens new window) representation of raw bytes is preferred for debug purposes. To get the hex for raw .bytes of a CIDv1 cidV1, use base16 encoding:

const cidV1StringBase256 = cidV1.toString(base16);

See CID conversion in action See the interactive code sandbox for an example JS application that converts between CID versions and encodings.

TIP

Subdomain gateways convert paths with custom bases like base16 to base32 or base36, in an effort to fit a CID in a DNS label:

# CodeSandbox: Converting between CID versions and encodings

For a hand-on, interactive application that converts between CID versions and encodings, use the CodeSandbox below.

# Further resources

Check out these links for more information on CIDs and how they work: