# Content addressing and CIDs

For a deep dive into how Content Identifiers (CIDs) are constructed, take a look at ProtoSchool's tutorial on the Anatomy of a CID (opens new window).

A content identifier, or CID, is a label used to point to material in IPFS. It doesn't indicate where the content is stored, but it forms a kind of address based on the content itself. CIDs are short, regardless of the size of their underlying content.

CIDs are based on the content’s cryptographic hash. That means:

  • Any difference in the content will produce a different CID and
  • The same content added to two different IPFS nodes using the same settings will produce the same CID.

IPFS uses the sha-256 hashing algorithm by default, but there is support for many other algorithms. The Multihash (opens new window) project represents the work for this, with the aim of future-proofing applications' use of hashes and allowing multiple hash functions to coexist. (If you're curious about how hash types in IPFS are decided upon, you may wish to keep an eye on this forum discussion (opens new window).)

# Identifier formats

CIDs can take a few different forms with different encoding bases or CID versions. Many of the existing IPFS tools still generate v0 CIDs, although the files (Mutable File System) and object operations now use CIDv1 by default.

# Version 0 (v0)

When IPFS was first designed, we used base 58-encoded multihashes as the content identifiers. This is simpler but much less flexible than newer CIDs. CIDv0 is still used by default for many IPFS operations, so you should generally support v0.

If a CID is 46 characters starting with "Qm", it's a CIDv0 (for more details, check the decoding algorithm (opens new window) in the CID specification).

# Version 1 (v1)

CID v1 contains some leading identifiers that clarify exactly which representation is used, along with the content-hash itself. These include:

  • A multibase (opens new window) prefix, specifying the encoding used for the remainder of the CID
  • A CID version identifier, which indicates which version of CID this is
  • A multicodec (opens new window) identifier, indicating the format of the target content — it helps people and software to know how to interpret that content after the content is fetched

These leading identifiers also provide forward-compatibility, supporting different formats to be used in future versions of CID.

You can use the first few bytes of the CID to interpret the remainder of the content address and know how to decode the content after being fetched from IPFS. For more details, check out the CID specification (opens new window). It includes a decoding algorithm (opens new window) and links to existing software implementations for decoding CIDs.

# CID Inspector

It's easy to explore a CID for yourself. Want to pull apart a specific CID's multibase, multicodec, or multihash info? You can use the CID Inspector (opens new window) or the CID Info panel in IPLD Explorer (opens new window) (both links launch using a sample CID) for an interactive breakdown of differently-formatted CIDs.

Check out ProtoSchool's Anatomy of a CID (opens new window) tutorial to see how a single file can be represented in multiple CID versions.

# Further resources

Check out these links for more information on CIDs and how they work: