Building a User-driven Search Platform for the Permaweb

When data grows at a tremendous speed, there is naturally a lot more content, but also a lot more time spent finding what you're looking for. I discovered I wasn't the only person looking for a way to search and discover great content on the Arweave blockchain, so I set out to build a solution.

Overview

Arweave is a popular and quickly growing decentralized blockchain that offers data storage and processing capabilities. As more users and app developers flock to Arweave, the need for a reliable self-service search platform becomes more apparent. Decentralized applications (dApps) tend to build and grow in silos, disconnected from other platforms and users outside of their immediate ecosystem. By creating an Arweave-focused search platform, developers of dApps can easily implement search and user recommendation capabilities in order to deliver better user experiences and get more value out of the content already available. This approach maximizes the value of user and developer content already created while also decreasing friction for new users and increasing engagement from existing users.

Arweave Growth

In 2023, over 30 terabytes of data were stored to the network and nearly 1.8 billion user transactions were recorded. The rapid growth of Arweave and its userbase is seen in the numbers for 1Q 2024, with 32% growth in data storage and 900%+ increase in transactions for the same period YoY (Viewblock). Driving this growth is the persistent increase in application JSON data on Arweave, reflecting 85% of all Content-Types stored in 2023. Image media (image/png, image/jpeg) Content-Types still carry an average of 20-30 million transactions/year over the past several years, however currently appears stagnant - 23m images stored in 2023 versus 27m stored in 2021.

Existing Solutions

Accessing data on Arweave is accomplished via ‘Gateways’ using GraphQL query syntax. Because Arweave is a decentralized blockchain, there are many gateways (e.g. Arweave.net, ar-io.net) to choose from in order to minimize latency and ensure availability. Querying Arweave Gateways with GraphQL is reliable and efficient when a user or dApp know the exact Tag key-value pairs that were initially assigned to the data stored on Arweave.

For example, a transaction could have the Tags:

”App-Name”: “Astro-App”,

“Title”: “FarMarket is live!”,

“Description”: “FarMarket on far.quest is live, buy and sell Farcaster accounts!”

If a user would like to find all of the most recent transactions for the app “Astro-App”, or would like to find the transaction with the case-sensitive title “FarMarket is live!”, they could easily query via GraphQL to find it. One Gateway (GoldSky) has implemented a modified approach that allows users to query Tags with fuzzy matching, enabling case-insensitivity and limited wildcard support. This is useful for basic searching and retrieving known data, but not even this approach fulfills the needs for real-time deep search and content recommendations.

Evaluating User Demand

Work began on a new distributed search technology for on-chain retrieval (DSTOR) in 2023 and has completed the initial proof of concept (POC) phase. The project was made freely available on dstor.io and allowed users to do full-text search for content. User searches were analyzed with AI to understand context and find the most relevant results, whether that be a long blog post or an image with no text information initially provided. Dstor dynamically indexes transactions on Arweave to identify contextual meaning in text and images, and stores pointers to the source data, so that results may be presented via any Arweave Gateway.

The initial launch of the POC showed strong organic growth and supportive feedback from users. Site analytics recorded:

  • Month One: ~1,200 users,
  • Month Two: ~3,000 users,
  • Month Three: ~9,000+ users

With a small team at work we began implementing the great feedback and suggestions received, as well as focused on usage patterns observed in the analytics to prioritize our roadmap.

Plans to Launch

DSTOR has been build from the ground-up based on user feedback, and is now ready to open the doors to allow users and dApp developers to utilize our self-service Engine Builder. With this intuitive UI-wizard, a user can easily select their source data, specify what AI enrichment features they would like applied to their searches, and then simply and quickly query their data. DSTOR also will be releasing a code-snippet copy-and-paste embeddable search box that developers can implement in their apps in just minutes.

import { SearchBox } from '@dstor/web'

export function MyApp({ children }) {
  return (
    <div>
      <SearchBox apiKey={######} />
      {children}
    </div>
  )
}