Skip to main content

Ticketing Agent

The Ticketing Agent is an AI SRE agent that consumes anomalies from the pipeline, correlates them with trace data, and creates deduplicated incident tickets across multiple platforms.

Supported Platforms

PagerDuty

Severity-based routing with auto-acknowledgment and escalation policies.

Jira

Project/issue type mapping with custom fields and assignment rules.

ServiceNow

CMDB integration with assignment groups and priority mapping.

OpsGenie

Team-based routing with on-call schedules and escalation.

Slack

Webhook notifications with thread updates for incident progression.

Zammad

Self-hosted ticketing for air-gapped or on-prem deployments.

How It Works

Kafka "enriched-logs" (anomaly events)


┌─────────────────────────┐
│    Anomaly Consumer     │  Consume anomaly-flagged documents
│    (Kafka consumer)     │
└────────────┬────────────┘


┌─────────────────────────┐
│   Deduplication Engine  │  Group by service + error pattern
│   (time window + hash)  │  Prevent duplicate tickets
└────────────┬────────────┘


┌─────────────────────────┐
│   Trace Correlation     │  Attach request timeline
│   (blast radius)        │  Compute affected services
└────────────┬────────────┘


┌─────────────────────────┐
│   LLM Summarization     │  Generate human-readable
│   (optional)            │  incident summary
└────────────┬────────────┘


┌─────────────────────────┐
│   Severity Router       │  Route to platforms based on
│   (per-severity rules)  │  configured routing rules
└────────────┬────────────┘

    ┌────────┼────────┬──────────┐
    ▼        ▼        ▼          ▼
PagerDuty  Jira  ServiceNow   Slack ...

Configuration

Platform Configuration

Each platform requires specific credentials. These are stored in the cluster’s secret store (AWS Secrets Manager, GCP Secret Manager, Vault, or Azure Key Vault) and synced via External Secrets Operator.
FieldRequiredDescription
routingKeyYesPagerDuty Events API v2 routing key
{
  "pagerduty": {
    "enabled": true,
    "routingKey": "your-routing-key"
  }
}

Severity Routing

Configure which platforms receive incidents based on severity:
{
  "routing": {
    "critical": ["pagerduty", "slack", "jira"],
    "high": ["jira", "slack"],
    "medium": ["jira"],
    "low": ["slack"]
  }
}

Anomaly Settings

ParameterDefaultDescription
minimumScore0.85Minimum anomaly score to create an incident
lookbackWindow15mTime window for anomaly grouping
{
  "anomaly": {
    "minimumScore": 0.85,
    "lookbackWindow": "15m"
  }
}

LLM Configuration

The agent optionally uses an LLM to generate human-readable incident summaries:
{
  "llm": {
    "provider": "openai",
    "model": "gpt-4",
    "apiKey": "sk-..."
  }
}

Incident Lifecycle

Incidents progress through these states:
OPEN → ACKNOWLEDGED → RESOLVED
  │                       ▲
  └── ESCALATED ──────────┘
StateDescription
openNew incident, not yet acknowledged
acknowledgedTeam has seen the incident
resolvedIssue has been fixed
escalatedEscalated to a higher-priority platform or team

Incident Actions

# List incidents
GET /api/incidents

# Acknowledge an incident
POST /api/incidents/:id/acknowledge

# Resolve an incident
POST /api/incidents/:id/resolve

# Escalate an incident
POST /api/incidents/:id/escalate

Helm Values

logclaw-ticketing-agent:
  config:
    pagerduty:
      enabled: true
    jira:
      enabled: true
      baseUrl: "https://yourorg.atlassian.net"
      projectKey: "SRE"
    servicenow:
      enabled: true
      instance: "yourorg"
    anomaly:
      minimumScore: 0.85
      lookbackWindow: "15m"

Testing Connectivity

Test connectivity to each platform before going live:
# Test PagerDuty connection
POST /api/ticketing/api/v1/test-connection
{ "platform": "pagerduty" }

# Test LLM connection
POST /api/ticketing/api/v1/test-llm