VRM (Virtual Reality Markup)

definition

GLOSSARY

What is VRM (Virtual Reality Markup)?

A technical specification for 3D humanoid avatar models designed for use in virtual reality, metaverse applications, and other 3D environments.

VRM (Virtual Reality Markup) is an open, royalty-free file format for 3D humanoid avatars, built as an extension of the glTF 2.0 standard. It defines a comprehensive schema for avatar data, including mesh geometry, materials, blend shapes for facial expressions, and spring bones for dynamic hair and clothing physics. The format's primary goal is to ensure avatar portability and interoperability across different VR/AR platforms, social applications, and game engines, preventing vendor lock-in. It is governed by the VRM Consortium, a Japanese organization promoting its development and adoption.

The specification addresses key challenges in avatar representation through several dedicated extensions. The VRM Humanoid extension defines a standardized bone structure, allowing avatars to be retargeted to different animation rigs. The VRM Blendshape extension controls facial expressions and lip sync via predefined morph targets like "Blink" or "Joy." Furthermore, the VRM Spring Bone system simulates secondary motion for accessories, while VRM FirstPerson settings manage how the avatar is rendered from the user's own viewpoint (e.g., making the head mesh invisible in first-person view).

VRM files are created using authoring tools like UniVRM for Unity, which provides exporters, importers, and utilities for validation. A typical workflow involves modeling and rigging a character in a 3D tool like Blender, then using the UniVRM plugin to add VRM-specific metadata, configure humanoid mapping, and define expression presets. This pipeline enables creators to produce avatars that are immediately functional in any VRM-compliant application, from VRChat to standalone VRM viewers and virtual meeting spaces.

The adoption of VRM is significant for the open metaverse ecosystem, as it provides a common avatar lingua franca. Unlike proprietary avatar systems, VRM's open specification allows for a decentralized creation economy where users can own, customize, and transfer their digital identity. Its reliance on glTF ensures wide support in modern graphics pipelines and web-based environments using frameworks like Three.js. This technical foundation makes VRM a critical standard for enabling user-generated content and social interaction in interoperable 3D worlds.

etymology

VRM (VIRTUAL REALITY MARKUP)

Etymology & Origin

The term VRM, or Virtual Reality Markup, refers to a historical framework for describing 3D objects and scenes for early web-based virtual reality experiences.

VRM (Virtual Reality Markup) is a file format and scene description language, originating in the mid-1990s, designed to create interactive 3D worlds viewable through web browsers. It was developed as part of the Virtual Reality Modeling Language (VRML) specification, with VRM often used interchangeably with VRML in its early iterations. The core concept was to provide a text-based, human-readable markup—similar to HTML for web pages—to define geometry, lighting, and basic interactivity for virtual spaces on the nascent World Wide Web.

The origin of VRM is deeply tied to the VRML 1.0 specification finalized in 1995, following proposals from pioneers like Mark Pesce and Tony Parisi. It emerged from the Silicon Graphics (SGI) Open Inventor file format, with its syntax adapted for network transmission. The "Markup" component signifies its role as a declarative language where authors define a scene graph—a hierarchical tree of nodes representing shapes, transforms, and materials—rather than writing imperative rendering code. This allowed 3D content to be created and shared more easily across different platforms.

While revolutionary for its time, VRM/VRML was ultimately superseded by more powerful and efficient technologies. Its legacy is evident in modern standards like X3D (the official successor to VRML) and glTF, the contemporary "JPEG of 3D." The etymology of VRM highlights a pivotal, if transitional, phase in making 3D graphics accessible on the open web, establishing foundational concepts for scene description that continue to influence immersive media and metaverse development today.

key-features

VRM (VIRTUAL REALITY MARKUP)

Key Features

VRM (Virtual Reality Markup) is a protocol for creating and managing composable, on-chain virtual assets and environments, enabling persistent digital worlds.

01

Composable Asset Standard

VRM defines a standard for non-fungible tokens (NFTs) that represent 3D objects, avatars, and environmental elements. This allows assets from different creators to be interoperable within the same virtual space, enabling a modular, Lego-like approach to building digital worlds. Key properties like mesh data, textures, and behavioral scripts are stored on-chain or referenced via decentralized storage.

02

Persistent World State

The protocol maintains a decentralized ledger of world state, tracking object positions, ownership, and interactions. This persistence ensures that changes made by users are saved and visible to all participants, creating a shared, continuous reality. The state is typically managed by a smart contract or a network of validators, preventing any single entity from controlling the environment.

03

Spatial Scripting & Logic

VRM incorporates a scripting language or logic layer that allows objects and spaces to have programmable behaviors. This enables:

Interactive elements (doors, switches, vehicles)
Game mechanics and rule sets
Dynamic content that reacts to user presence or on-chain events Scripts can be attached to assets, making them autonomously functional within the virtual environment.

04

Decentralized Economy Layer

Native integration with blockchain economies allows for verifiable ownership and peer-to-peer commerce of virtual assets. Features include:

In-world transactions using cryptocurrencies or tokens.
Royalty mechanisms for asset creators on secondary sales.
Proof-of-ownership for accessing gated areas or content. This turns virtual spaces into open marketplaces and economies.

05

Cross-Platform Interoperability

A core goal of VRM is to enable assets and identities to move seamlessly between different virtual worlds and platforms. By adhering to the open standard, an avatar or item minted in one VRM-compliant world can be imported and used in another, breaking down walled gardens. This requires standardized metadata schemas and runtime environments.

06

User Identity & Avatars

VRM provides a framework for sovereign digital identity through customizable avatars. These avatars are user-owned NFTs that serve as a persistent identity across worlds. They can:

Carry verifiable credentials and reputation.
Be equipped with wearable assets (clothing, tools).
Have their appearance and history stored on-chain, owned and controlled by the user, not the platform.

how-it-works

TECHNICAL PRIMER

How VRM Works

Virtual Reality Markup (VRM) is a file format and ecosystem for representing 3D humanoid avatars in virtual spaces. This section details its core technical architecture and operational workflow.

The VRM specification is built upon the glTF 2.0 standard, a widely adopted 3D transmission format, and extends it with specialized extensions for humanoid avatars. At its core, a VRM file contains a complete 3D model with a defined skeletal structure, materials, textures, and the critical VRM metadata. This metadata is a JSON-based schema that defines avatar-specific properties, including human bone mappings, blend shape presets for facial expressions, first-person view configurations, and licensing information. This layered structure ensures compatibility with standard 3D pipelines while adding the semantic data needed for avatar interoperability.

The workflow for using a VRM avatar begins with import and validation. A compatible application, such as a game engine plugin or VR chat platform, loads the .vrm file, parses its glTF data, and validates it against the VRM schema. The system then maps the model's skeleton to a standardized humanoid bone hierarchy, allowing animations designed for one VRM avatar to work on another. Key features like blend shapes (for facial expressions like blink or smile) and spring bone physics simulations (for dynamic hair and clothing) are initialized based on the metadata, bringing the static model to life.

At runtime, the VRM avatar is driven by input data. This can include user tracking from VR controllers and headsets to control body and head movement, audio input to drive lip-sync visemes via blend shapes, or predefined animation states. The spring bone system calculates secondary motion in real-time, adding realistic jiggle and sway to specified parts of the model. For rendering, the avatar utilizes the MToon shader, a cel-shading style material commonly bundled with VRM, which provides a consistent, anime-inspired aesthetic across different platforms and lighting conditions.

A critical aspect of VRM's operation is its focus on creator and user permissions, enforced through its metadata. The format includes fields for authorship, contact information, and allowed usage (e.g., personal use, commercial use, prohibited behaviors). Applications can read this data to enforce license compliance automatically. Furthermore, the specification defines first-person view settings, allowing creators to designate which parts of the avatar's model should be rendered or hidden when viewed from the user's own perspective, preventing visual obstruction in VR.

technical-details-extension

VRM (VIRTUAL REALITY MARKUP)

Technical Details: The glTF Extension

An exploration of the glTF extension that defines the VRM format, a standard for 3D humanoid avatars in virtual reality and metaverse applications.

The VRM extension for glTF is a formal specification that adds avatar-specific metadata and constraints to the core glTF (GL Transmission Format) 3D model standard, enabling the creation of portable, humanoid 3D characters for use in virtual reality, games, and social platforms. Defined by the VRM Consortium, this extension standardizes properties like blend shapes for facial expressions, spring bone physics for hair and clothing simulation, and first-person view camera configurations, ensuring avatars behave consistently across different compatible applications and engines.

At its core, the extension introduces a VRM top-level object within the glTF JSON, which contains all avatar-specific data. This includes the humanoid skeleton definition, mapping standard bone names (like hips, leftUpperArm) to glTF node indices, and material properties for advanced shading such as MToon, a cel-shaded style popular in anime-style avatars. The meta object within this structure stores crucial information like the avatar's name, author, licensing terms, and a reference thumbnail, making it self-describing and easy to catalog.

A key technical feature is the blend shape group system, which defines preset facial expressions (e.g., Joy, Angry) and viseme shapes for lip-syncing, going beyond the basic morph targets in standard glTF. The secondary animation system, often called spring bone, uses colliders and physics parameters to simulate jiggle dynamics for soft body parts, adding life to hair, tails, and accessories without requiring complex rigging or real-time simulation code in the host application.

For practical implementation, the VRM extension is supported by major tools and SDKs. Authoring software like VRM editor and UniVRM for Unity allow creators to export models from 3D applications like Blender into the .vrm file format, which is essentially a glTF 2.0 file with the VRM extension and embedded binary data. Runtime loaders, such as those provided by the VRM consortium, parse this data to reconstruct the avatar with all its humanoid, expression, and physics capabilities intact.

The standardization provided by the glTF VRM extension solves critical interoperability issues in the avatar ecosystem. It allows an avatar created for one social VR platform to be used in another, facilitates the development of avatar marketplaces, and provides a clear technical foundation for user identity in the metaverse. By building upon the widely adopted glTF standard, it leverages existing tooling and performance optimizations for 3D asset delivery while adding the specialized features required for expressive, interactive humanoid characters.

ecosystem-usage

VRM (VIRTUAL REALITY MARKUP)

Ecosystem Usage & Adoption

Virtual Reality Markup (VRM) is a standard for 3D humanoid avatars, enabling their creation, distribution, and interoperability across virtual reality (VR), augmented reality (AR), and metaverse platforms.

01

Core Technical Standard

VRM is an open file format based on glTF 2.0, specifically designed for humanoid 3D models. It defines a schema for avatar data, including:

Mesh, materials, and skeletal structure for visual representation.
Blend shapes for facial expressions and lip sync.
Spring bone physics for secondary motion (e.g., hair, clothing).
First-person view configurations and look-at settings for eye tracking. This standardization allows avatars to be portable across compliant applications.

02

Primary Use Case: Avatar Interoperability

The primary adoption driver is enabling users to own and use a single digital identity across different virtual spaces. A VRM avatar created in one platform (e.g., VRChat) can be imported into another (e.g., Cluster, Nostalgia), breaking down platform silos. This fosters user-centric identity and reduces the friction of creating new avatars for each application, a key principle for an open metaverse.

03

Integration with Blockchain & NFTs

VRM files are commonly minted as Non-Fungible Tokens (NFTs) on blockchains like Ethereum and Polygon. This combination enables:

Provable ownership and authenticity of unique digital avatars.
A creator economy where artists can sell avatar assets in marketplaces.
Interoperable digital assets that function as both collectibles and usable identities. Projects like CryptoAvatars and various NFT marketplaces have adopted VRM as the technical standard for tradable 3D characters.

04

Developer Adoption & Tooling

Widespread support in game engines and tools drives ecosystem growth. Key integrations include:

Unity and UniVRM: The official SDK for importing, creating, and exporting VRM avatars in Unity projects.
Blender via add-ons for 3D modeling and rigging.
WebXR frameworks for displaying VRM avatars in browsers. This robust toolchain lowers the barrier for developers to support VRM avatars in their VR/AR applications and games.

EXPLORE

05

Commercial & Enterprise Applications

Beyond social VR, VRM is used in commercial contexts:

Virtual meetings and conferences where participants use consistent avatars.
Customer service and virtual showrooms with branded avatar representatives.
VTuber (Virtual YouTuber) industry, where many popular creators use VRM-based models for live streaming via software like VSeeFace. These applications leverage VRM's expressiveness and cross-platform compatibility for professional use.

06

Related Concepts & Ecosystem

VRM exists within a broader ecosystem of 3D and identity standards:

glTF: The foundational 3D transmission format VRM extends.
VMC Protocol: A separate protocol for sending real-time motion data to drive VRM avatars.
Decentralized Identifiers (DIDs): A W3C standard for self-sovereign identity that can be associated with a VRM avatar for verifiable credentials.
Metaverse Standards Forum: An industry group where VRM is discussed alongside other interoperability standards.

SPECIFICATION COMPARISON

VRM vs. Other 3D Avatar Formats

A technical comparison of VRM with other common formats for 3D humanoid avatars, focusing on interoperability, licensing, and runtime features.

Feature	VRM	glTF 2.0	FBX
Primary Purpose	Humanoid avatar interchange for real-time apps	3D asset runtime delivery (PBR)	3D authoring & interchange
File Extension	.vrm	.gltf / .glb	.fbx
Open Standard
Built-in Humanoid Definition
Expression & Viseme Support
Look-At & First-Person Controls
Spring Bone (Secondary Animation)
Embedded User License Metadata
Primary Use Case	VTubing, VR/AR, Metaverse	WebGL, mobile apps, games	3D modeling pipeline, game engines

VRM (VIRTUAL REALITY MARKUP)

Common Misconceptions

Clarifying frequent misunderstandings about VRM, a foundational protocol for creating and exchanging virtual assets on blockchains.

No, VRM is not a virtual world or metaverse itself; it is a specification for 3D humanoid avatars. VRM defines a file format and a set of rules for creating interoperable, portable 3D models, primarily for use within various virtual environments, games, and applications. Think of it as the JPEG standard for avatars—it doesn't create the social platform or game world, but it provides a common format for avatar assets that can be used across them, enabling user identity and assets to move between different virtual spaces.

VRM (VIRTUAL REALITY MARKUP)

Frequently Asked Questions (FAQ)

Essential questions and answers about VRM, the open standard for 3D humanoid avatars in virtual reality and metaverse applications.

VRM (Virtual Reality Markup) is an open, royalty-free file format specification for 3D humanoid avatar models, designed for use in virtual reality, metaverse platforms, and other 3D applications. It works by extending the glTF 2.0 standard, adding specific metadata and constraints for humanoid avatars, such as bone structure definitions, facial expression blendshapes, and material properties for toon shading. A VRM file packages the 3D model data, textures, and this avatar-specific metadata into a single, portable .vrm file, enabling interoperability between different creation tools, game engines like Unity and Unreal Engine, and virtual platforms.

What is VRM (Virtual Reality Markup)?

Etymology & Origin

Key Features

Composable Asset Standard

Persistent World State

Spatial Scripting & Logic

Decentralized Economy Layer

Cross-Platform Interoperability

User Identity & Avatars

How VRM Works

Technical Details: The glTF Extension

Ecosystem Usage & Adoption

Core Technical Standard

Primary Use Case: Avatar Interoperability

Integration with Blockchain & NFTs

Developer Adoption & Tooling

Commercial & Enterprise Applications

Related Concepts & Ecosystem

VRM vs. Other 3D Avatar Formats

glTF (GL Transmission Format)

VRChat Avatars & SDK

UniVRM (Unity SDK)

Common Misconceptions

Frequently Asked Questions (FAQ)

Get a free quote.

Get In Touch
today.

VRM (Virtual Reality Markup)

What is VRM (Virtual Reality Markup)?

Etymology & Origin

Key Features

Composable Asset Standard

Persistent World State

Spatial Scripting & Logic

Decentralized Economy Layer

Cross-Platform Interoperability

User Identity & Avatars

How VRM Works

Technical Details: The glTF Extension

Ecosystem Usage & Adoption

Core Technical Standard

Primary Use Case: Avatar Interoperability

Integration with Blockchain & NFTs

Developer Adoption & Tooling

Commercial & Enterprise Applications

Related Concepts & Ecosystem

VRM vs. Other 3D Avatar Formats

Related Concepts & Standards

glTF (GL Transmission Format)

Blend Shapes (Morph Targets)

Spring Bone (Secondary Animation)

VRChat Avatars & SDK

UniVRM (Unity SDK)

First-Person View Configuration

Common Misconceptions

Frequently Asked Questions (FAQ)

Get In Touch today.

Get In Touch
today.