Testing Guide

OSTwin uses three testing frameworks spanning its PowerShell, Python, and TypeScript codebases. This guide covers the testing architecture, conventions, and CI/CD integration.

Testing Stack

Framework	Language	Scope	Location
Pester 5+	PowerShell	Engine, roles, lifecycle	`.agents/tests/`
pytest	Python	MCP, memory, CLI, API	`.agents/tests/`, `dashboard/tests/`
Cypress	TypeScript	Dashboard E2E	`cypress/e2e/`

Pester Tests

PowerShell tests validate the orchestration engine, role runners, and lifecycle state machines.

Running Tests

pwsh -Command "Invoke-Pester .agents/tests/ -Output Detailed"

pwsh -Command "Invoke-Pester .agents/tests/ -Tag 'Unit' -Output Detailed"
pwsh -Command "Invoke-Pester .agents/tests/ -Tag 'Integration' -Output Detailed"

pwsh -Command "Invoke-Pester .agents/tests/lifecycle.Tests.ps1 -Output Detailed"

Test Structure

Describe "Start-ManagerLoop" {
    Context "When processing a valid plan" {
        It "Creates war-room directories" {
            # Arrange
            $plan = Get-Content "tests/fixtures/simple-plan.md" -Raw

            # Act
            $result = Start-ManagerLoop -Plan $plan -DryRun

            # Assert
            $result.Rooms.Count | Should -BeGreaterThan 0
        }
    }

    Context "When a dependency fails" {
        It "Blocks downstream rooms" {
            # Test implementation
        }
    }
}

Naming Convention

Files: *.Tests.ps1
Tags: Unit, Integration, Lifecycle, Channel, DAG

Key Test Areas

Area	Tests
Lifecycle transitions	All 14 states, guard evaluation, action execution
DAG building	Kahn’s algorithm, cycle detection, wave generation
Channel I/O	Message serialization, file locking, concurrent writes
Role spawning	Runner invocation, prompt assembly, timeout handling
Retry mechanics	Counter increment, max retries, auto-transition

pytest Tests

Python tests cover the MCP server, memory system, CLI tools, and dashboard API.

Running Tests

pytest .agents/tests/ -v
pytest dashboard/tests/ -v

pytest .agents/tests/ -v --cov=.agents --cov-report=html

pytest .agents/tests/ -x --tb=short

Test Structure

import pytest
from agents.memory import MemoryLedger

class TestMemoryLedger:
    def test_publish_creates_entry(self, tmp_path):
        ledger = MemoryLedger(tmp_path / "ledger.jsonl")
        entry_id = ledger.publish(
            kind="artifact",
            summary="Test artifact",
            tags=["test"],
            room_id="room-001",
            ref="EPIC-001"
        )
        assert entry_id is not None

    def test_supersede_excludes_old_entry(self, tmp_path):
        ledger = MemoryLedger(tmp_path / "ledger.jsonl")
        old_id = ledger.publish(kind="decision", summary="Old", ...)
        new_id = ledger.publish(kind="decision", summary="New", supersedes=old_id, ...)
        results = ledger.query()
        assert old_id not in [r["id"] for r in results]

Naming Convention

Files: test_*.py
Classes: Test*
Functions: test_*

Key Test Areas

Area	Tests
Memory ledger	Publish, query, search, supersede, context generation
Channel module	Read, write, filter, locking
MCP server	Tool registration, request/response, transport
Dashboard API	REST endpoints, WebSocket, authentication
CLI commands	`ostwin run`, `status`, `chat`, `skills`

Cypress Tests

End-to-end tests for the Next.js dashboard.

cd cypress
npx cypress run

cd cypress
npx cypress open

Test Areas

Area	Tests
Dashboard rendering	War-room cards, status indicators, progress bars
Plan management	Upload plan, view DAG, inspect rooms
Real-time updates	WebSocket state changes, channel messages
Navigation	Route transitions, deep linking

Test Fixtures

Shared test data lives in .agents/tests/:

.agents/tests/
├── sample/
│   └── room-001/        # Complete room fixture
│       ├── config.json
│       ├── lifecycle.json
│       ├── channel.jsonl
│       └── brief.md
├── fixtures/
│   ├── simple-plan.md
│   └── cyclic-plan.md
└── scripts/
    └── mock.sh

Mocking Strategies

Pester provides built-in mocking:

Mock Start-Process { return @{ ExitCode = 0 } }
Mock Get-Content { return '{"status": "developing"}' }

Use InModuleScope for mocking private functions.

Use pytest fixtures and unittest.mock:

from unittest.mock import patch, MagicMock

@patch("agents.channel.fcntl.flock")
def test_channel_write_locks(mock_flock, tmp_path):
    # Test file locking behavior
    pass

@pytest.fixture
def mock_room(tmp_path):
    room = tmp_path / "room-001"
    room.mkdir()
    (room / "status").write_text("developing")
    return room

Use cy.intercept for API mocking:

cy.intercept('GET', '/api/rooms', { fixture: 'rooms.json' })
cy.intercept('GET', '/api/dag', { fixture: 'dag.json' })

CI/CD Integration

Tests run on every PR via GitHub Actions:

Lint — PowerShell ScriptAnalyzer, Python ruff, TypeScript ESLint
Unit tests — Pester (Unit tag), pytest (unit markers)
Integration tests — Pester (Integration tag), pytest (integration markers)
E2E tests — Cypress in headless mode
Coverage report — pytest-cov with minimum threshold