The V2 PNG Specification: The Heart of Portable AI Personas
If you have ever used local AI roleplay frontends like SillyTavern or TavernAI, you have encountered the ".png" character card. On the surface, it looks like a standard image. However, beneath the pixels lies a sophisticated data structure known as the V2 Specification. This guide explores the technical architecture of these digital containers and why they are essential for the AI creator community.
What is a Character Card?
At its core, a character card is a way to package a "Large Language Model Prompt" into a portable file. Instead of manually copying and pasting descriptions into a chat window, the card allows you to share a character's "mind" just by sharing their portrait. The V2 specification, established by the community in mid-2023, expanded the original V1 format to handle the increasing complexity of modern LLMs. Before V2, cards were limited to a single "description" field. V2 introduced structured fields like personality, scenario, and first message, which gives LLMs much clearer context about how to behave. The difference is like the gap between a paragraph of instructions and a detailed character sheet with sections. This structure matters because LLMs follow structured prompts better than wall-of-text descriptions. When you see a character that responds with perfect tone and personality, someone spent time writing those structured fields. The V2 format makes that possible.
The Anatomy of a PNG Metadata Chunk
Standard images store pixel data in blocks called "chunks." The PNG format specifically allows for auxiliary chunks that store non-visual data. The V2 spec utilizes two primary chunk types:
- tEXt Chunks: These store uncompressed strings. The key
charais used to hold the character data. To ensure that special characters (like emojis or foreign languages) don't break the file, the data is typically Base64 encoded. - zTXt Chunks: For very large characters with thousands of lines of example dialogue, creators use zTXt chunks. These utilize zlib/deflate compression to shrink the metadata, ensuring the PNG file size remains manageable while containing a massive amount of text.
The V2 JSON Structure
Unlike the flat structure of V1 cards, a V2 card uses a nested JSON object. This separation allows frontends to differentiate between the character's core identity and its metadata. A standard V2 JSON includes:
{
"spec": "chara_card_v2",
"spec_version": "2.0",
"data": {
"name": "Character Name",
"description": "Visual and narrative description",
"personality": "Core behavioral traits",
"scenario": "The current setting",
"first_mes": "The opening line of the chat",
"mes_example": "Example dialogues for training"
}
}
Why Compatibility Matters
Because the AI space is fragmented, different tools have different levels of strictness. Some apps might only read tEXt chunks, while others require zTXt. Some expect raw JSON, others expect Base64. Our Character Card Converter is designed to handle all these variations automatically. By normalizing these different "dialects" of the V2 spec, we ensure that your character works whether you are importing into a mobile app or a high-end desktop workstation.
Common V2 Issues and How to Fix Them
Even with the V2 spec being well-documented, creators still run into problems. Here are the most common issues I see in the community:
- Base64 encoding errors: If a character card won't load, the Base64 string might have line breaks in the wrong places. This usually happens when someone copies JSON from a web page and pastes it into a card editor. Our converter handles this automatically.
- Missing spec version: Some older cards use
chara_card_v1instead ofchara_card_v2. The V1 format only supports a single description field. You need to convert these to V2 to unlock personality, scenario, and first message fields. - Character limit exceeded: Most PNG files have a 2MB limit for the metadata chunk. If you have a character with thousands of lines of dialogue, you might hit this wall. The solution is to use zTXt compression or split your character into multiple cards.
How Our Converter Handles V2 Cards
When you upload a PNG character card to our site, we extract the metadata chunk, decode it from Base64, parse the JSON, and then output it in whatever format you need. We handle the encoding, compression, and structure automatically. You don't need to understand how PNG chunks work — just upload your card and pick your destination format. The tool does the heavy lifting.
The Future of Portability
As we move toward more advanced formats like V3 and specialized schemas for voice-enabled companions like Voxta, the principles of the V2 spec remain the foundation. The goal is simple: data sovereignty. Your characters belong to you, not the platform you created them on. By understanding the technology behind the card, you can ensure your digital creations remain accessible for years to come.
Coming Soon: CharacterCardGenerator
We are building a new tool to help you create character cards from scratch. Instead of manually editing JSON or using clunky editors, CharacterCardGenerator.com will let you describe your character in plain English and generate a properly formatted card in seconds. Think of it as a character card AI assistant that handles all the technical details for you. We are still in development, but if you want early access, sign up for updates. It will be free to start with a credit system for power features.
Coming Soon: Python Right-Click Runner
Tired of manually converting cards? We are developing a Windows tool that lets you right-click any PNG and convert it instantly using our engine. Get notified when the beta drops:
