In early 2022, a web page titled YourNFTs.org surfaced on the Internet, with its anonymous author asserting that they had “downloaded all the NFTs on the Ethereum blockchain.” This claim quickly caught the attention of Jason Bailey, whose startup, ClubNFT, had debunked a similar claim made by Geoffrey Huntley in late 2021. Huntley had created a parody of The Pirate Bay titled, The NFT Bay, making available a torrent file that allegedly linked to “all NFT's [sic] from Ethereum & Solana.” ClubNFT team members found the claim to be fictitious and the linked files to be completely empty.
Like Huntley, the author of YourNFTs.org also provided a torrent file pointing to their purported bounty: over 5GB of NFT metadata, contract addresses, and token IDs. However, unlike Huntley, the author of YourNFTs.org also provided a detailed analysis of the data as well as a photomosaic of images apparently incorporating Ethereum-based NFTs. According to the analysis, 9% of NFTs had their data stored entirely on chain, 55% had data stored via HTTP or a centralized web server, and 36% had data stored via the InterPlanetary File System (IPFS), a decentralized file storage protocol.
“NFT” stands for “non-fungible token,” where the token is a record stored on a blockchain. In the case of Ethereum-based NFTs, these records are typically associated with smart contracts conforming to the EIP-721 or EIP-1155 standards and are secured by the Ethereum network itself. Assuming a well-written smart contract and a secure blockchain, it is nearly impossible for a third party to modify an NFT’s record. This makes NFTs an excellent way to record digital ownership. However, blockchains are not great at storing large amounts of data: images, videos, or audio files that are commonly associated with NFTs. Instead, these files are often stored off chain using IPFS or HTTP storage solutions.
In their analysis, the author of YourNFTs.org broadly classified NFTs according to their storage solutions: on chain, HTTP, or IPFS. “On chain” refers to an NFT whose content is stored entirely on the blockchain. Storing content on the blockchain can be prohibitively expensive and is usually implemented via clever smart contract designs. Nevertheless, there are a number of NFT artists and projects that make use of on-chain data storage, including 0xDEAFBEEF, Blitmap, and CyberBrokers. HTTP refers to an NFT whose content is stored on a centralized web server. Content stored on HTTP is at the most risk of disappearing because it is subject to a single point of failure and cannot practically be restored once offline. Even if the content does stay online, there is no guarantee that the files will not be modified from their original state. Some early works, such as John Karel’s genesis NFT, Skull Still Life (2018), have already had content disappear due to their reliance on HTTP storage solutions.
IPFS refers to an NFT whose content is stored using the InterPlanetary File System protocol, a decentralized, peer-to-peer solution for storing and sharing data. IPFS is one of the most popular NFT storage solutions and widely considered a best practice in the space. However, IPFS does not guarantee that an NFT’s content will always be available. The content must be stored somewhere and made available to the IPFS network. Fortunately, even if an NFT’s content becomes unavailable, with an appropriate backup, IPFS files can be restored to the network and anyone can pay to ensure they remain online.
While classifying NFTs according to their storage solutions may seem like a straightforward task, many complications emerge in practice. For example, Art Blocks is a generative art platform whose NFTs are fully reproducible using on-chain data. However, each NFT also stores data via HTTP. In some cases, an NFT may point to data stored on IPFS via an HTTP gateway. As a result, the pointer itself may break even if the content to which it is pointing remains accessible via IPFS. To further complicate things, NFTs are not bound by the three aforementioned storage solutions. A growing number of NFTs now rely on Arweave for data storage, which can also be used to pin data to IPFS. Even among fully on-chain projects, further distinctions can be made depending on their implementation. Suffice it to say, classification is complicated.
The NFT storage findings presented on YourNFTs.org have considerable implications for the longevity of NFTs. This topic is especially pertinent to ClubNFT, as their NFT backup solution aims to assist collectors in ensuring that their IPFS-based NFT data are sufficiently backed up and can be re-uploaded to the IPFS network if necessary. We used the torrent file provided on YourNFTs.org to download over 5GB of alleged NFT metadata in order to test the findings for ourselves. The downloaded torrent included two files: a README file and a data set. The README file clarified that the data set spanned through December 2021 and provided additional information regarding the data set’s structure. With this information, we read the first line of the data set; It contained metadata for a Jake Nerwinski NFT issued by Sorare: Jake Nerwinski 2020-21 • Rare 33/100 (2021).
Elated that the data set was not empty, we proceeded to load the rest of the file, which amounted to 12,359,778 unique NFTs in total. After validating the contents, we concluded that the data were legitimate. However, we also found that several notable NFT collections were missing, including CryptoKitties, Ethereum Name Service, and MarbleCards, amongst others. While the author’s claim of downloading all NFTs on the Ethereum blockchain may have been exaggerated, we were still incredibly impressed with the data set and believe it to be sufficient for the presented analysis.
Armed with a working data set, our next goal was to test the storage findings of YourNFTs.org. The content of an NFT is typically defined by a list of attributes, or metadata, that is included in the NFT’s smart contract record. Under the EIP-721 and EIP-1155 standards, such lists must conform to a predefined JSON schema that includes an NFT’s title, type, name, description, and image. Nevertheless, any number of additional attributes can be included.
A typical NFT can be thought of as a tree-like structure sprouting from the blockchain, where ownership is secured at the root and content is stored in the branches.
Identifying all branches necessary to reproduce an NFT can be complicated. We chose to limit our analysis to the storage solutions associated with five common NFT attributes: “image,” “image_url,” “url,” “animation_url,” and “thumbnail.” The storage solution for the list of attributes itself was omitted from our analysis as this information was not available in the YourNFTs.org data set.
For each NFT in the data set, we extracted the contents of the token’s attributes, “image,” “image_url,” “url,” “animation_url,” and “thumbnail,” where applicable. Typically, these attributes contain the data necessary to reproduce an NFT’s content or link to off-chain storage solutions to do so. Therefore, each attribute can be reasonably classified by its storage solution using elementary pattern matching. For example, it can be assumed that a link to content hosted on IPFS will contain the term “IPFS” and a link to content hosted on a web server will contain the term “HTTP” or “HTTPS.” Conversely, data hosted on chain will not contain “IPFS,” “HTTP,” or “HTTPS.” While this methodology is not impervious to error, the results are reasonably accurate in application. Arweave and IPNS were included in our analysis for completeness. Pseudocode summarizing our attribute classification methodology can be viewed here:
Once an NFT’s attributes — for instance, an image — are classified according to their storage methodologies, a classification must be made for the NFT itself. In order to make this classification, we chose to assign a numeric rating to each of the NFT’s attributes based upon the perceived difficulty of keeping that attribute accessible. On-chain attributes were assigned the lowest rating of zero, and HTTP attributes were assigned the highest rating of five. IPFS and Arweave attributes were assigned ratings in between, ranging from one to four. Each NFT was then classified according to its single highest-rated attribute. Under this approach, a single HTTP attribute would result in an HTTP classification, regardless of the storage solutions used for all other attributes. Applying this methodology to each of the 12,359,778 unique NFTs in the YourNFTs.org data set, we found that 9.06% of the NFTs had their data stored entirely on chain; 40.70% had data stored via HTTP; 49.55% had data stored via IPFS; 0.69% had data stored via Arweave; and one lone NFT had data stored via IPNS, a mutable alternative to IPFS.
Put simply, ~10% of NFTs are on-chain, ~40% of NFTs are on private servers and are doomed, and ~50% remaining are on IPFS.
While our findings are generally consistent with those of YourNFTs.org, there is a meaningful divergence in the HTTP and IPFS results. Specifically, YourNFTs.org classified 55% of NFTs as HTTP and 36% as IPFS whereas we classified 40% of NFTs as HTTP and 50% as IPFS. Does this mean that the author of YourNFTs.org is wrong? Not in our opinion. Classifying NFTs based on their storage solutions is a complex and nuanced task. Throughout our analysis, we made subjective decisions that led us to results that we felt comfortable with. Nevertheless, there is more than one way to feed a cat. For example, if we chose to classify attributes stored on IPFS via an HTTP gateway as HTTP rather than IPFS, then 66.44% of NFTs would be classified as HTTP and only 23.81% would be classified as IPFS. Pseudocode summarizing this alternative classification methodology can be viewed here:
While we consider both methodologies to be valid, we feel more comfortable with the former, where an attribute stored on IPFS via an HTTP gateway is classified as IPFS rather than HTTP. This is because, even if an HTTP gateway becomes inaccessible, the URLs encoded in a token’s record should contain enough information to infer the original IPFS data to which it pointed. Nevertheless, such gateways are doomed in the long run, and we recommend NFT creators avoid embedding HTTP gateways in their tokens.
Of course, we are all doomed in the long run, but with forward-looking NFT storage practices we can ensure our art is not. While storing NFT data on chain may be ideal, it is not an economically viable solution for most works. HTTP-based storage solutions cannot guarantee NFT data will remain online or unchanged and are subject to a single point of failure. IPFS offers a decentralized, cost-effective solution to NFT data storage that can be maintained with minimal oversight. Even if NFT content stored on IPFS becomes inaccessible, it can be reuploaded with an appropriate backup. ~10% of NFTs are on-chain, ~40% of NFTs are on private servers and are doomed, and ~50% remaining are on IPFS and can be made safe with a free backup from ClubNFT.
Nick Hladek is an NFT collector, artist, and open-source contributor who currently serves as Head of Data Science at ClubNFT.