Authors (sorted alphabetically):
Suma Bhat: Assistant Professor in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign; Research Scholar in Computer Science at Princeton University
Canhui Chen: Ph.D. student in the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Zhixuan Fang
Zerui Cheng: Ph.D. student in Electrical and Computer Engineering, Princeton University, advised by Prof. Pramod Viswanath; was an undergraduate student in the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University at the time of publication
Zhixuan Fang: Assistant Professor in the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Ashwin Hebbar: Ph.D. student in Electrical and Computer Engineering, Princeton University, advised by Prof. Pramod Viswanath
Sreeram Kannan: Founder and CEO of EigenLayer
Ranvir Rana: Co-founder and CEO of Witness Chain
Peiyao Sheng: Ph.D. student in Computer Science at University of Illinois at Urbana-Champaign, advised by Prof. Pramod Viswanath
Himanshu Tyagi: Co-founder of Sentient; Co-founder of Witness Chain
Pramod Viswanath: Co-founder of Sentient; Co-founder of Witness Chain; Forrest G. Hamrick Professor in Engineering in the Department of Electrical and Computer Engineering, Princeton University
Xuechao Wang: Assistant Professor in Thrust of Fintech at HKUST(GZ)
In this blog, we briefly introduce the main idea of the decentralized AI platform: SAKSHI.
The full paper is available at: https://arxiv.org/pdf/2307.16562
Era of AI.
Artificial Intelligence (AI) has been steadily making progress on a variety of tasks (household tasks by vacuuming robots, playing games – Chess, Go – at superhuman levels, scientific discovery via protein folding predictions, medical progress by drug discoveries), but have broken through the barrier of general intelligence in recent months with the emergence of a new family of generative deep learning models – GPT4 is the prototypical application capturing the world’s attention, at a tremendous energy price. GPT4 has super-human mastery over natural language, and can comprehend complex ideas, exhibiting proficiency in a myriad of domains such as medicine, law, accounting, computer programming, music, and more. Moreover, GPT4 is capable of effectively leveraging external tools such as search engines, calculators, and APIs to complete tasks with minimal instructions and no demonstrations, showcasing its remarkable ability to adapt and learn from external resources. Such progress portends AI’s forthcoming dominance in mediating (and replacing under several situations) human interactions, and promises AI to be the dominant energy consuming activity for years to come.
Large Generative AI Models.
An AI model that is largely representative of the class is generative AI, which creates content that resembles human-generated ones. These models have attracted considerable interest and popularity due to their impressive capabilities in generating high-quality, realistic images, text, video and music. For instance, large language models (LLMs) like ChatGPT, Bard, and LLaMA attain impressive performance on a wide array of tasks and are being integrated in products such as search engines, coding assistants and productivity tools in Google Docs. Further, text-to-image models like StableDiffusion, MidJourney, Flamingo, text-to-music models like MusicLM, and text-to-video models like Make-a-Video have shown the immense potential of large multimodal generative AI models. As large generative AI models continue to evolve, we will witness the emergence of numerous fine-tuned and instruction-tuned models catering to specific use cases (e.g., healthcare, finance, law).
Whilst models grow rapidly, Amazon and Nvidia report that AI inference tasks particularly account for up to 90% of the computational resource in AI systems, which are much more frequently demanded than AI model training tasks. In this white paper, we mainly focus on the AI inference tasks, but the flexibility of our layer architecture design allows the market for model training as well.
Current model: Centralized inference.
The dominant platform of serving these large models is through public inference APIs, offered via by the dominant platform companies of today’s economy. For example, the OpenAI API allows users to query models like ChatGPT and DALL-E over a web interface. Although this is a relatively user-friendly option, it is susceptible to the deleterious side-effect of centralization: monopolization. Apart from the rent-seeking aspect of the centralized nature of the service offering, privacy implications loom large: the human interactions mediated by generative AI models is vastly more personal and intrusive than a web browsing and search queries. Addressing the grand challenge of AI computation via the design of decentralized and programmable platforms is the goal of this paper.
Proposed model: Decentralized Inference.
In this paper, we propose to decentralize AI inference across servers provided by consumer devices at the grid edge. Decentralized inference can reduce communication and energy costs by leveraging local computation capabilities. This is made possible by utilizing energy-efficient devices located at the edge, which could potentially be powered by renewable energy sources. Crucially, the energy overhead of running large data-centers is largely reduced, simultaneously opening an opportunity to democratize AI whilst limiting its ecological footprint. Such a decentralized platform would also enable the deployment of a library of large customized models in a scalable manner - users can host in-demand customized models on this decentralized cloud, and earn appropriate rewards.
Our decentralized AI platform, SAKSHI, is populated by a host of different agents: AI service providers, AI clients, storage and compute hosting nodes. A carefully designed incentive fabric stitches the different agents together into an efficient, trustworthy, and economically fruitful AI platform. Our design of SAKSHI is best visualized in terms of a layered architecture (analogous to network stacks). The layers are enumerated below and visualized in the following figure.
- Service layer. This is the path where the query and response (AI inference) are managed. The goal is to have high throughput and low latency – the goal is to enable user journey similar to a standard web2-like service, with the underlying resources (storage, computation) and economic transaction managed in a decentralized and trustless manner.
- Control layer. This is the path where networking and compute/storage load balancing actions are managed. The decentralized AI models are hosted at multiple locations connected via a (potentially peer to peer) network, and our decentralized design borrows from classical web2 content delivery network designs (e.g., Akamai) while managing the economic transaction also in a decentralized and trustless manner.
- Transaction layer. This is the path where billing and metering are conducted. The key is to have this outside the data path and visible to a broader audience (e.g., via commitments on blockchains). Importantly this is trust free crucially enabled via Witness Chain’s transaction layer service (originally designed for decentralized 5G wireless networks, but now naturally repurposed for decentralized AI services).
- Proof layer. Any dispute in terms of metering and billing are handled here. These proofs also provide resistance to unauthorized usage (e.g., just copying) of AI models. This is definitely outside the data path, but also outside the transaction path. This layer allows the formulation of novel research questions (at the intersection of large AI models, cryptography and security). We highlight three such key questions: (i) Proof of Inference – where the proof of computation of a specific (deep learning) AI model can be verified; (ii) Proof of ownership, fine-tuning and watermarking – where the proof of downstream modification to an AI model can be verified; (iii) Proof of service delivery – where the proof of the delivery of an AI service can be verified at customizable granularities. These dispute resolutions naturally feed into a reputation system (leading to positive incentives for salutary behavior) or crypto economic security via slashing (negative incentives; see next layer). This new research, outlined in detail in this paper, is joint work between multiple universities (Princeton University, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST), and two blockchain startups Witness Chain and Eigen Layer.
- Economic layer. So far, the transactions can be handled purely via fiat without the need for a token. This layer explores the benefits of having a token to incentivize participants, both in the transient and long term stages and the corresponding economic benefits therein. → Eigenlayer integration and ideas.
- Marketplace. Compositional AI services, in a single atomic transaction, are naturally enabled. The common data shared on the blockchain leads to the creation of a decentralized marketplace for AI services. The supply and demand allows the efficient discovery of prices. Optional in the first version.
For more details, please continue reading on: https://arxiv.org/pdf/2307.16562