Utsav Sharma

I've always been frustrated with manual website testing. Clicking through pages, filling out forms, checking responsiveness—it's tedious and error-prone. But existing automated testing tools felt either too rigid (scripted tests that break on any UI change) or too complex (requiring extensive setup and maintenance). I wanted something smarter: a tool that could understand a website like a human tester would, adapt to changes, and provide meaningful feedback.

That's how UI-tester was born—an AI-powered terminal UI that tests websites using real browser automation and LLM analysis. It drives a real browser, generates intelligent test plans based on page content, and produces comprehensive quality reports with actionable insights.

The Vision: Intelligent Testing, Beautiful Interface

The core idea was simple: combine three powerful technologies:

Real Browser Automation (Playwright) - Actually interact with websites like a user would
LLM Intelligence (OpenRouter) - Understand page content and generate adaptive test plans
Beautiful Terminal UI (Ink) - Make the whole process enjoyable to watch and use

Instead of writing brittle test scripts, you just point it at a URL and watch it discover pages, plan tests, execute them, and generate reports—all in a beautiful terminal interface.

Architecture: The Three-Stage Pipeline

The system follows a clean three-stage pipeline:

Planner (LLM) → Executor (Browser) → Judge (LLM)

1. Planner: Understanding Before Testing

The planner (qa/planner.ts) is where the magic starts. It analyzes the DOM structure of a page and uses an LLM to generate an intelligent test plan. Instead of blindly clicking around, it understands:

What the page is trying to accomplish
What key interactions should be tested
What potential issues to look for

The LLM receives the page HTML (with sensitive data redacted), the site's goals, and generates a structured test plan with specific steps. This makes the tests adaptive—if you redesign your homepage, the planner will understand the new structure and create appropriate tests.

// Simplified planner flow
async function generateTestPlan(pageContent: string, goals: string) {
  const prompt = `
    Analyze this page and create a test plan focusing on: ${goals}
    Page content: ${redactSensitiveData(pageContent)}
  `;
  
  const plan = await llm.generate(prompt);
  return parseTestPlan(plan);
}

2. Executor: Real Browser Interaction

The executor (qa/executor.ts) takes the test plan and runs it step-by-step using Playwright. This isn't just checking if elements exist—it's actually:

Clicking buttons and links
Filling out forms (with safe test data)
Navigating between pages
Capturing screenshots at key moments
Recording evidence of what happened

Each step is executed in a real Chromium browser, so you're testing what users actually experience. The executor also handles edge cases gracefully—timeouts, missing elements, navigation issues—and captures evidence for later analysis.

3. Judge: Comprehensive Evaluation

After execution, the judge (qa/judge.ts) analyzes all the evidence—screenshots, DOM snapshots, execution logs—and generates a scored report. The LLM evaluates:

Accessibility issues - Missing alt text, poor contrast, keyboard navigation problems
Usability problems - Confusing flows, broken interactions, unclear CTAs
Performance concerns - Slow loading, layout shifts, rendering issues
Content quality - Broken links, missing content, unclear messaging

Each issue is categorized by severity (critical, high, medium, low) and includes reproduction steps, suggested fixes, and screenshot evidence.

Discovery: Finding All the Pages

One of the trickiest parts was page discovery. The tool needs to find all pages on a site to test comprehensively. I implemented a multi-strategy approach:

Sitemap.xml - If available, parse it for all URLs
Robots.txt - Extract sitemap references
Link Crawling - Follow internal links from the homepage

The discovery phase (utils/sitemap.ts) respects robots.txt rules and can be configured with depth limits to avoid crawling entire massive sites.

Parallel Testing: Speed Meets Quality

Testing pages sequentially would be too slow. I built a parallel testing system (qa/parallelTester.ts) that:

Maintains a pool of browser instances
Tests multiple pages concurrently
Manages resource limits (max parallel browsers)
Aggregates results from all pages

This means testing 10 pages takes roughly the same time as testing 1 page (within browser resource limits).

Terminal UI: Making It Beautiful

The terminal interface (ink/App.tsx) was crucial for making the tool enjoyable to use. Built with Ink (React for CLIs), it provides:

Real-time progress - Watch phases progress live
Colorful logs - Different colors for different log levels
Interactive controls - Scroll through logs, retry on errors
Results summary - Quick overview of score and issues

The UI shows six distinct phases:

Init - Browser startup and initial screenshot
Discovery - Finding pages to test
Planning - LLM generating test plans
Traversal - Testing discovered pages
Execution - Running planned tests
Evaluation - Generating final report

Each phase updates in real-time, so you always know what's happening.

Storage: Local-First Results

All results are saved locally in .ui-qa-runs/<run-id>/:

run.json - Metadata about the run
report.json - Full structured report with scores
evidence.json - Detailed execution evidence
report.md - Human-readable markdown report
llm-fix.txt - Instructions for AI to fix issues
screenshots/ - All captured screenshots

This local-first approach means you own your data and can review results even after the run completes.

Safety First: Ethical Testing

I built several safety features to ensure the tool never causes harm:

Dummy data only - Forms are filled with test@example.com, "Test User", etc.
No payment submission - Detects payment forms and skips submission
Sensitive data redaction - Removes emails, phone numbers, etc. before LLM processing
Timeouts everywhere - All browser operations have timeouts
Controlled navigation - Only follows internal links, respects robots.txt

Technical Challenges

Building this wasn't without challenges:

LLM Token Limits: Page HTML can be massive. I had to implement smart truncation—keeping the structure and key content while removing noise.

Browser Resource Management: Running multiple browsers in parallel requires careful resource management. I implemented a browser pool (utils/browserPool.ts) that reuses instances and manages lifecycle.

Streaming Updates: The terminal UI needs to update in real-time as tests run. I built a streaming architecture (qa/run-streaming.ts) that emits events as phases progress.

Error Recovery: Tests can fail for many reasons—network issues, timeouts, missing elements. The executor needs to gracefully handle failures and continue testing other pages.

What's Next

The tool is already useful, but there's more I want to add:

CI/CD Integration - Run as part of deployment pipelines
Regression Detection - Compare reports across runs to catch regressions
Custom Test Goals - More granular control over what to test
Multi-browser Testing - Test across Chrome, Firefox, Safari
Performance Metrics - Lighthouse integration for performance scores

Try It Out

If you want to test your website (or any website), you can install and run:

npx @utsav/ui-qa https://example.com

Or clone the repo and run locally:

git clone https://github.com/usharma123/UI-tester-
cd UI-tester-
bun install
bun start https://example.com

The tool will discover pages, generate test plans, execute them, and produce a comprehensive report—all while showing beautiful progress in your terminal.

UI-tester demonstrates that testing doesn't have to be boring or brittle. By combining browser automation with LLM intelligence, we can create tools that understand websites like humans do, adapt to changes, and provide meaningful feedback. It's been a fun project to build, and I'm excited to see how it evolves.

Source code: github.com/usharma123/UI-tester-

Building UI-tester: An AI-Powered Terminal UI for Website QA