Building MarkovExplorer: Visualizing and Optimizing MDPs

Introduction

I watched a recent Veritassium video that inspired me to try to visualize any type of markov chain and this grew as a personal project to make Markov Decision Processes (MDPs) more approachable. While textbooks explain the theory, it can be hard to see how individual transitions and rewards add up. I wanted a tool that lets me build an MDP visually, run simulations, and watch reinforcement-learning algorithms improve a policy in real time. That idea grew into MarkovExplorer, a web application that mixes a drag-and-drop interface with a robust simulation and optimization engine.

What It Does

The application focuses on three core tasks:

MDP Configuration & Visualization – An interactive builder lets you create states, actions and transitions. A dynamic graph shows how everything connects and validates the model as you go.
Monte Carlo Simulation – With the press of a button you can simulate thousands of episodes. The tool reports distributions of rewards, terminal states, action usage and common paths.
Agent Optimization – A built-in optimizer runs value iteration, policy iteration, Q-learning and Monte Carlo policy search. It compares baseline and optimized policies and even rates the confidence of the results.

On top of that, charts and summaries give deeper insight: histograms for reward distributions, pie charts for terminal states and transition-frequency analyses, all generated in real time. An AI-powered interpreter highlights interesting trends or potential issues in the results.

Building the Application

MarkovExplorer lives in a Next.js 14 + TypeScript environment. The mdp-viz directory holds the entire frontend:

src/
├── app/                    # Next.js app router
├── components/             # Graphs, configurator & result widgets
├── lib/                    # Simulation & optimization logic
└── types/                  # Shared TypeScript definitions

Core Libraries

Next.js & React for the interface.
Tailwind CSS for styling.
Chart.js and recharts for visualizations.
Zod for runtime validation.
vis-network to render the MDP graph.

Simulation Engine

The simulation lives in lib/sim.ts. It executes episodic Monte Carlo runs with configurable parameters such as episode count, maximum steps and starting state. During each run it tracks:

cumulative reward,
path length,
state transitions,
action frequencies.

The results feed the charts and statistics in the UI.

Optimization Engine

The optimizer (lib/optimizer.ts) tries multiple strategies automatically:

Value Iteration and Policy Iteration for tabular planning.
Q-Learning for model-free improvement.
Monte Carlo Policy Search for direct policy optimization.

Each method is validated by running fresh simulations, and the best policy is presented with a confidence score. The UI component AgentOptimizer.tsx shows the step-by-step progress and final evaluation.

User Experience

Everything revolves around feedback. The interface highlights configuration errors immediately, updates charts live and keeps controls responsive on mobile screens. Saving and loading presets makes experimentation quick, and the 3D graph view gives a tangible sense of navigating a state space.

Lessons Learned

Interactive visuals make theory concrete – Seeing an MDP evolve on-screen clarifies abstract concepts like discount factors and policy convergence.
Validation is key – Running Monte Carlo tests after theoretical optimization prevents overfitting to the model.
TypeScript + Zod – Static and runtime type checks eliminated whole classes of bugs early.

What's Next

The roadmap includes gap minimization to better align theoretical and empirical results, export/import of models and richer policy comparison tools. Eventually I'd like to support multi-objective optimization and advanced visualizations such as animated 3D trajectories.

MarkovExplorer shows that reinforcement learning concepts don't have to stay in research papers. With a browser and some curiosity, anyone can explore how intelligent agents learn to act.