AI Counterparty Trust

Madhavan Malolan

Jan 20, 2025

NarrativeUsecases

A threat we're not talking about enough is this - not only that AI might become uncontrollable, but that humans get great at controlling AI.

If you are still debating whether you can trust AI with important decisions, you are fighting yesterday's battles. Humans have already demonstrated that a skilled prompt engineer can manipulate an AI for their own gains.

But that doesn't change anything. We still blindly believe Claude, ChatGPT and MetaAI for information. Soon, we will blindly trust them to take actions on our behalf.

The question isn't "Can we trust AI with actions?", but rather "What if AI is the only thing we trust?".

We've seen this play out

AI models, like humans, can be too trusting of what they read during training. This reminds me of how some people, like my aunt, believe everything they see online. While it's one thing for someone to privately believe misinformation, it becomes problematic when they:

Spread that misinformation to others
Make poor decisions based on false information (like falling for scams)

AI faces a similar challenge. Even though AI models are trained on massive datasets that contain more accurate than false information - making them useful for everyday tasks - they can still be misled into believing and stating things that aren't true. This vulnerability is similar to how humans can be deceived by false information online.

Personal AI

I would call this the current state of affairs. We use AI for our own personal use. We can do roleplaying like "imagine i'm a five year old, now explain me ...". It doesn't matter if you are really five years old or not. It's garbage in garbage out.

Even if you believe that all the data it has been trained on has been fact checked, you can still prompt it with false information. The only person who stands to benefit or lose because of that false information provided is you. Nothing else changes in the world. So obviously, it's acceptable.

Counterparty AI

But there is a special set of usecases that I think aren't even getting built because anyone can fool an AI, with mere prompt engineering. If you are deploying an AI agent as a business without verifying the prompts, it can be weaponized against you. Consider the following :

Your AI hiring manager? Can be gamed by candidates with fake resumes.
Your AI customer service? Can be exploited for refunds.
Your AI content moderator? Blind to coordinated manipulation.

AI has the power to bring efficiencies to society. But these usecases aren't even being discussed, because verifiablity of prompts is a messy problem to solve.

How do we get to an efficient AI courtroom, where disputes can be resolved in days - not decades? If counterparties can provide evidences and the AI accepts only verified evidences and no other prompts, the AI can make a decision orders of magnitudes faster.
How do we get to an AI bureacracy, where the mundane paperwork and inefficiencies give way to quick decisions and actions? If the information provided to the office is verified, that lowers the friction for decision making. And these decisions are not particularly hard - they can be taken by an AI. Renewing your driver's license? That shouldn't be too hard for an AI to decide.
And if you're ambitious enough, how can we get a fair government that's run by an incorruptible AI? That's a dystopian or utopian future depending on your mental models. But it's a question on the table.

All of these massive gains in efficiency stems from an AI replacing human corruption, lethargy, and physical limits of human parallelization.

It's a future, I would like to live in. It would suck if we have a colony on Mars, but we still need to stand in these excruciating lines to renew our driver's license.

This isn't distant future - we already launched a Counterparty AI

Questbook is a grant management tool. Millions of dollars are given out to developer to build software public goods. There are thousands of applications that come in every month, but only a few dozen get approved. I have been a grant manager myself, managing grants to improve the developer tooling of the one of the largest decentralized lending platforms - Compound.

The process of evaluation isn't particularly hard. The core insight is, it's easy to reject but hard to accept applications. I'm sure many, if not most, of these applications are written using an LLM. And it's completely fine, we don't want developers to waste time on writing a beautiful proposal instead of writing code. But as the grant manager, it becomes hard to call out bullshit. The first level of screening is usually to identify if the team is bullshitting.

Grant Farming the AI

There are some common trends among Grant Farmers.

They'd repackage something they've already built, and say they'll build this for Compound or some other ecosystem.
They'd make claims about the team with credentials they do not have.
They'd take a completely irrelevant product and try to make a convincing case that it is a public good for the ecosystem.

The initial review and filtering is something that we can delegate to an AI. We can tweak it to have a few false positives. That way, the turn around time for many proposals can be seconds instead of days. And, help the grant managers scale further.

Am I BalajiS?

The moment the word goes out that the first filtering is done by an AI, obviously people are going to try and game the system. Because there is, literally, hundreds of thousands of dollars to be exploited. We tried this ourselves. We tried to convince the AI that we are a team led by Balaji Srinivasan and that we are creating a network state for developers who are building tools for a blockchain. It's a noble cause and an important public good that actually gets funded elsewhere.

The AI gave a high score to the team for having the competence. Ofcourse, if Balaji was part of the team, it indeed has the competence. But also obviously, Balaji or his team is not the one applying here. But the AI doesn't know.

So, we turned on verifiability. The AI can determine what are data points that aren't backed by verifiable proofs and ask the applicant to provide them. For this discussion, we needed to provide proof that the user is providing the real profile of Balaji in the application, they also need to prove that the link is actually one that belongs to Balaji and that he's indeed associated with this project.

The moment we turn this feature on, you can see below how the AI identifies that there is no proof of the team's competence provided, and gives a low score.

With this simple tweak of verifiability, you can scale the review process on Questbook -- even if people know that the reviewer is an AI.

Verifiability

Ok, but how do you get to this future where everything is verified? There are three ways.

API integrations with trusted service providers

If you are prompting something that needs information from twitter, directly hit twitter APIs. If you need verified information from your bank, hit the bank's APIs and so on.

This obviously has the downside that you need to indivitually add integrations for every website in the world. Because, the set of information that you'll have to verify cannot be known upfront.

This is actually the bear case for AI - stupidity by silos. It doesn't get smart enough not because the technology doesn't exist, but because corporate entities won't talk to one another.

Signatures at source

If you are providing information, you are required to also provide a cryptographic signature of the source of the information.

This is a great future to live in, but the amount of signed data right now is far and few.

zkTLS

You can think of zkTLS as a subset of Signatures at source. zkTLS is a stopgap solution for adding signatures to sources - particularly websites - that haven't implemented it yet.

If you provide a public link - you can probably just give the link to the AI and the AI can crawl it in real time. Even better, AI can have access to Archive.org so that all public links like Wikipedia are trustworthy.

zkTLS allows us to go one step beyond.

News articles these days are behind paywalls that a crawler may not have access to. There's no way an AI will subscribe to every news subscription on the planet.
Many social media websites are walled gardens. An AI cannot open a linkedin post without having an account on linkedin. There's no way an AI will sign up to every social app that launches.
Many information is also siloed in internal portals. Stanford, YC, Microsoft - all organizations have their own portals whose data is not accessible from outside.
Lastly, many pieces of information are infact private. The fact that I ordered sneakers on Amazon is accessible only to me. The fact that I work for CreatorOS is information that, resides only on Rippling.

The key thing to note about signatures and zkTLS is that the verification process for the proofs provided by the user is exactly the same, no matter what the website is where the information is coming from. Just verify the signatures or zkProofs. So you need to give the AI only one tool - signature/zkproof verification. With that tool, the AI will be able to determine when it needs to verify information and invoke the tool to verify the proofs provided by the user instead of blindly believing.

You are not trapped by the great hole of integration.

The winners are decided

There will be two kinds of AI Agents in the future

Those that have built in verifiability, preferably cryptographic verifiability
Those that go out of business

I don't mean this as a hyperbole. Any system where manipulation is possible, it becomes inevitable. Game theory 101.

The default in AI Agents doing anything remotely useful, apart from pureplay entertainment, should be that they trust nothing at all by default. Assume prompt engineers are waiting to jailbreak every single AI agent. Where you can, verify automatically. Cryptography reduces the friction in verifiability.

Your move

I reject the thesis that Crypto(graphy) is the counterbalancing force for AI. I believe crypto is an accelerating force for AI. I want to see a future where I trust AI with all the mundane decision making of society. A society with no dingy offices everyone hates.

The question is, what will you trust AI with. And when.

Huge thanks to Benji and Cryptpal for their help with this post.