Next Steps Roadmap

This page provides more details on several key directions on SimWorld’s roadmap. We are happy to share our Unreal project files and assets to all contributors for internal development and research purposes. If you’re interested in collaborating on any item below, please reach out to the corresponding contact listed for that project.

Comprehensive Agent Framework 

We plan to build a general, modular agent framework for autonomous agents in SimWorld, including:

Standardized agent modules (perception, memory, reasoning, and learning) that can be flexibly composed (e.g., dynamic cheat sheets, CoT, reflection)
Gym-compatible interfaces for RL training across a wide range of embodied tasks
Systematic ablations across environments to understand what actually matters for success in long-horizon embodied tasks

If you’re interested in contributing designs or use cases for this framework, please reach out to jic182@ucsd.edu.

Code Generation for Scenes 

We are exploring AI-powered coding agents that programmatically generate rich scenarios and cities:

A scene DSL / API that compiles to SimWorld maps, assets, traffic rules, and scripted events
LLM tooling chains that turn prompts or task specs into executable scene code with validation and preview
Safety and quality checks (asset budgets, collision-free placements, playability tests)
Curated seeds and benchmarks to evaluate diversity, controllability, and realism of generated content

If you have use cases or evaluation ideas for scene code generation, please contact x8ye@ucsd.edu.

Interactive Layout Editor 

We plan to build a web-based layout editor for real-time city visualization and editing:

Live map canvas with layers for roads, zoning, traffic lights, and spawn points
Asset palette and snapping for roads, buildings, props, and scripted triggers with constraint-aware placement
Co-editing and versioning so teams can iterate together and diff/export layouts into UE or SimWorld gym wrappers
Simulation-aware validation (navmesh coverage, connectivity, spawn density, performance budget estimates)

If you are interested in frontend or visualization contributions, please reach out to x8ye@ucsd.edu.

Arbitrary Natural Language → UE Actions 

SimWorld already exposes a rich set of low-level Unreal Engine actions (e.g., move, rotate, interact, pick up). The next step is to support mapping free-form natural language instructions to executable UE actions/tools, for example:

“Walk to the coffee shop on the left, then sit down at the table by the window.”
“Spawn ten pedestrians crossing the main street and record a 20-second video.”

This involves:

Designing an extensible action schema / tool specification for UE actions
Training / prompting llm local planners that ground language into these tools
Providing debugging and visualization tools for action traces

If you are working on language-to-action or tool-use agents and would like to build on SimWorld, please contact lingjun@ucsd.edu.

RL Training Pipeline for SimWorld 

We plan to provide a unified RL training pipeline for diverse embodied tasks (e.g., DeliveryBench) in SimWorld, including:

Gym-like environment wrappers
Standard observation and reward interfaces for embodied tasks
Reference training scripts (e.g., PPO, SAC, multi-agent RL)

This will make it straightforward to run large-scale RL experiments across diverse embodied tasks, and to derive insights that can guide the design of new RL algorithms.

If you are interested in RL research and exploration in embodied simulation settings, please reach out at lingjun@ucsd.edu.

City-Scale Multi-Agent Simulation 

One of SimWorld’s long-term goals is to support city-scale multi-agent simulation with 1K+ concurrent agents in the same city, covering pedestrians, vehicles, service robots, and other interactive entities.

Key directions include:

Scalable simulation backends and load balancing across machines
Rich social and physical interaction patterns between agents
Tools for logging, visualization, and analysis of large-scale behaviors

This direction is especially relevant for research on emergent behavior, social dynamics, and large-scale coordination. If you are interested in pushing city-scale simulations or have industrial use cases, please contact jir015@ucsd.edu.

Video-to-Scene Generation 

We aim to support video-to-scene pipelines that convert real videos into simulation-ready UE scenes:

Camera pose and intrinsics estimation plus multi-view geometry / SLAM for structure recovery
Object detection, tracking, and 3D reconstruction to infer dynamic actors and static layout
Asset mapping to replace reconstructed meshes with SimWorld-ready assets and materials
Temporal consistency and evaluation tools to check fidelity, scale, and replayability of generated scenes

Reference material:

Vid2Sim: https://metadriverse.github.io/vid2sim/
Video2Game: https://video2game.github.io/

If you work on video-to-3D or can share datasets, please contact x8ye@ucsd.edu.

MuJoCo Integration 

We are prototyping MuJoCo as an optional physics backend to complement UE for high-frequency control:

Interchange layer to mirror SimWorld agents, sensors, and actions in MuJoCo while keeping scene semantics
Time-sync and co-sim bridges so planners can mix UE visuals with MuJoCo dynamics when needed
Benchmark tasks (manipulation, legged locomotion, aerial) to compare fidelity and performance across backends

Reference material:

MuJoCo-Unity plugin example: https://github.com/OpenHUTB/mujoco_plugin
MuJoCo in Unity docs: https://mujoco.readthedocs.io/en/latest/unity.html
URoboViz (3D robot visualization): https://github.com/HoangGiang93/URoboViz
Unreal Robotics Lab: https://arxiv.org/html/2504.14135v2

If you want to help shape the MuJoCo bridge or contribute tasks, please reach out to x8ye@ucsd.edu.

Expanded Agents, Actions, and Interactable Objects 

We plan to expand the embodied ecosystem across agents, action spaces, and interactables:

New agent types (e.g., drones, service robots, manipulation platforms) with standardized capability profiles
Richer action schemas (continuous and symbolic) with compatibility across UE tools, RL wrappers, and planners
Broader interactable set (doors, elevators, appliances, IoT props) with consistent affordances and state machines
Evaluation suites to measure coverage, compositionality, and cross-agent interoperability

If you are interested in defining new agent/action specs or supplying assets, please contact x8ye@ucsd.edu.