Skip to content

Testing Guide

OSTwin uses three testing frameworks spanning its PowerShell, Python, and TypeScript codebases. This guide covers the testing architecture, conventions, and CI/CD integration.

Testing Stack

FrameworkLanguageScopeLocation
Pester 5+PowerShellEngine, roles, lifecycle.agents/tests/
pytestPythonMCP, memory, CLI, API.agents/tests/, dashboard/tests/
CypressTypeScriptDashboard E2Ecypress/e2e/

Pester Tests

PowerShell tests validate the orchestration engine, role runners, and lifecycle state machines.

Running Tests

Terminal window
pwsh -Command "Invoke-Pester .agents/tests/ -Output Detailed"

Test Structure

Terminal window
Describe "Start-ManagerLoop" {
Context "When processing a valid plan" {
It "Creates war-room directories" {
# Arrange
$plan = Get-Content "tests/fixtures/simple-plan.md" -Raw
# Act
$result = Start-ManagerLoop -Plan $plan -DryRun
# Assert
$result.Rooms.Count | Should -BeGreaterThan 0
}
}
Context "When a dependency fails" {
It "Blocks downstream rooms" {
# Test implementation
}
}
}

Naming Convention

  • Files: *.Tests.ps1
  • Tags: Unit, Integration, Lifecycle, Channel, DAG

Key Test Areas

AreaTests
Lifecycle transitionsAll 14 states, guard evaluation, action execution
DAG buildingKahn’s algorithm, cycle detection, wave generation
Channel I/OMessage serialization, file locking, concurrent writes
Role spawningRunner invocation, prompt assembly, timeout handling
Retry mechanicsCounter increment, max retries, auto-transition

pytest Tests

Python tests cover the MCP server, memory system, CLI tools, and dashboard API.

Running Tests

Terminal window
pytest .agents/tests/ -v
pytest dashboard/tests/ -v

Test Structure

import pytest
from agents.memory import MemoryLedger
class TestMemoryLedger:
def test_publish_creates_entry(self, tmp_path):
ledger = MemoryLedger(tmp_path / "ledger.jsonl")
entry_id = ledger.publish(
kind="artifact",
summary="Test artifact",
tags=["test"],
room_id="room-001",
ref="EPIC-001"
)
assert entry_id is not None
def test_supersede_excludes_old_entry(self, tmp_path):
ledger = MemoryLedger(tmp_path / "ledger.jsonl")
old_id = ledger.publish(kind="decision", summary="Old", ...)
new_id = ledger.publish(kind="decision", summary="New", supersedes=old_id, ...)
results = ledger.query()
assert old_id not in [r["id"] for r in results]

Naming Convention

  • Files: test_*.py
  • Classes: Test*
  • Functions: test_*

Key Test Areas

AreaTests
Memory ledgerPublish, query, search, supersede, context generation
Channel moduleRead, write, filter, locking
MCP serverTool registration, request/response, transport
Dashboard APIREST endpoints, WebSocket, authentication
CLI commandsostwin run, status, chat, skills

Cypress Tests

End-to-end tests for the Next.js dashboard.

Running Tests

Terminal window
cd cypress
npx cypress run

Test Areas

AreaTests
Dashboard renderingWar-room cards, status indicators, progress bars
Plan managementUpload plan, view DAG, inspect rooms
Real-time updatesWebSocket state changes, channel messages
NavigationRoute transitions, deep linking

Test Fixtures

Shared test data lives in .agents/tests/:

.agents/tests/
├── sample/
│ └── room-001/ # Complete room fixture
│ ├── config.json
│ ├── lifecycle.json
│ ├── channel.jsonl
│ └── brief.md
├── fixtures/
│ ├── simple-plan.md
│ └── cyclic-plan.md
└── scripts/
└── mock.sh

Mocking Strategies

Pester provides built-in mocking:

Terminal window
Mock Start-Process { return @{ ExitCode = 0 } }
Mock Get-Content { return '{"status": "developing"}' }

Use InModuleScope for mocking private functions.

CI/CD Integration

Tests run on every PR via GitHub Actions:

  1. Lint — PowerShell ScriptAnalyzer, Python ruff, TypeScript ESLint
  2. Unit tests — Pester (Unit tag), pytest (unit markers)
  3. Integration tests — Pester (Integration tag), pytest (integration markers)
  4. E2E tests — Cypress in headless mode
  5. Coverage report — pytest-cov with minimum threshold