Skip to main content

Bridge

The Bridge is a Python-based service that provides the core log processing pipeline. It consumes raw OTLP JSON from Kafka, normalizes it into flat documents, detects anomalies, correlates traces, and indexes everything into OpenSearch.

Architecture

The Bridge runs 4 concurrent threads, each handling a distinct stage of the pipeline:
Kafka "raw-logs"


┌─────────────────────┐
│  Thread 1: OTLP ETL │  Flatten OTLP JSON → normalized documents
│  (consumer group:   │  Write to "enriched-logs" topic
│   logclaw-bridge-etl)│
└──────────┬──────────┘

     Kafka "enriched-logs"
       │           │
       ▼           ▼
┌──────────┐  ┌──────────────────────┐
│ Thread 2 │  │ Thread 3: Indexer    │
│ Anomaly  │  │ Bulk write to       │
│ Detector │  │ OpenSearch           │
│ (Signal) │  │ logclaw-logs-*      │
└──────────┘  └──────────────────────┘


┌──────────────────────┐
│ Thread 4: Lifecycle  │
│ Request correlation  │
│ (5-layer trace)      │
└──────────────────────┘

OTLP ETL (Thread 1)

The ETL thread consumes OTLP JSON messages from the raw-logs Kafka topic and flattens them into canonical log documents. OTLP unwrapping path:
resourceLogs → scopeLogs → logRecords → flatten each record
Field mapping:
OTLP FieldOutput FieldDescription
resource.attributes["service.name"]serviceService name
logRecord.body.stringValuemessageLog message
logRecord.severityTextlevelLog level (INFO, WARN, ERROR)
logRecord.timeUnixNanotimestampISO-8601 timestamp
logRecord.traceIdtrace_idDistributed trace ID
logRecord.spanIdspan_idSpan ID
resource.attributes["host.name"]hostHostname
resource.attributes["tenant_id"]tenant_idTenant identifier
logRecord.attributes[*](flattened)Custom attributes as top-level fields

Anomaly Detection (Thread 2)

Uses a signal-based composite scoring system to classify whether an error log is incident-worthy. Not every error triggers an incident — the system distinguishes actionable failures (OOM, database deadlocks, cascading failures) from noise (validation errors, 404s). Two detection paths run in parallel:
  • Immediate path — OOM, crash, and resource exhaustion patterns fire in under 1 second without waiting for a time window
  • Windowed path — statistical z-score combined with blast radius, velocity, and recurrence signals across the sliding window
See Incident Classification for the full signal system documentation. Key environment variables:
VariableDefaultDescription
ANOMALY_ZSCORE_THRESHOLD2.0Z-score threshold for statistical spike signal
ANOMALY_WINDOW_SECONDS300Sliding window duration
ANOMALY_COMPOSITE_SCORE_THRESHOLD0.4Minimum composite score to emit an event
ANOMALY_IMMEDIATE_DEDUP_SECONDS60Dedup window for immediate-path emissions
When an anomaly is detected, an event is published to anomaly-events Kafka topic with anomaly_score, severity, signals, detection_mode, error_category, and error_message fields.

Request Lifecycle Engine (Thread 4)

The lifecycle engine performs 5-layer trace correlation to group related log entries into request timelines:
  1. Trace ID grouping — group logs sharing the same trace_id
  2. Temporal proximity — cluster logs within a time window
  3. Service dependency mapping — map caller→callee relationships
  4. Error propagation tracking — trace error cascades across services
  5. Blast radius computation — determine affected services and endpoints

Prometheus Metrics

The Bridge exposes Prometheus-format metrics at GET /metrics:
MetricTypeDescription
logclaw_bridge_etl_consumed_totalCounterKafka messages (batches) consumed from raw-logs
logclaw_bridge_etl_records_received_totalCounterIndividual OTLP log records flattened
logclaw_bridge_etl_produced_totalCounterEnriched documents written to enriched-logs
logclaw_bridge_anomaly_detected_totalCounterTotal anomaly events emitted
logclaw_bridge_anomaly_immediate_detected_totalCounterAnomalies detected via immediate path (OOM/crash/resource)
logclaw_bridge_anomaly_windowed_detected_totalCounterAnomalies detected via windowed statistical path
logclaw_bridge_anomaly_signals_extracted_totalCounterError records with at least one signal extracted
logclaw_bridge_anomaly_below_threshold_totalCounterSignals filtered (below composite score threshold)
logclaw_bridge_anomaly_std_zero_detected_totalCounterConstant error rate cases (previously silent failures)
logclaw_bridge_indexer_indexed_totalCounterDocuments indexed into OpenSearch
logclaw_bridge_indexer_errors_totalCounterOpenSearch indexing errors

Configuration

Environment Variables

VariableRequiredDefaultDescription
KAFKA_BROKERSYesKafka bootstrap servers
KAFKA_TOPIC_RAWNoraw-logsTopic to consume raw OTLP JSON
KAFKA_TOPIC_ENRICHEDNoenriched-logsTopic to produce enriched documents
OPENSEARCH_ENDPOINTYesOpenSearch cluster URL
OPENSEARCH_USERNAMENoOpenSearch Basic Auth username
OPENSEARCH_PASSWORDNoOpenSearch Basic Auth password
ANOMALY_ZSCORE_THRESHOLDNo2.0Z-score threshold for statistical spike signal
ANOMALY_WINDOW_SECONDSNo300Sliding window duration in seconds
ANOMALY_COMPOSITE_SCORE_THRESHOLDNo0.4Minimum composite score to emit an anomaly event
ANOMALY_IMMEDIATE_DEDUP_SECONDSNo60Dedup window for immediate-path emissions
ANOMALY_BLAST_RADIUS_WINDOW_SECONDSNo60Cross-service error tracking window
PORTNo8080HTTP server port

Runtime Configuration

The Bridge supports dynamic runtime configuration via the /config endpoint:
# Get current config
curl http://localhost:8080/config

# Update anomaly thresholds
curl -X PATCH http://localhost:8080/config \
  -H "Content-Type: application/json" \
  -d '{"zscoreThreshold": 3.0, "windowSeconds": 600, "compositeScoreThreshold": 0.45}'

Helm Values

logclaw-bridge:
  env:
    KAFKA_BROKERS: "logclaw-kafka-bootstrap:9092"
    OPENSEARCH_ENDPOINT: "https://logclaw-opensearch:9200"
    ANOMALY_ZSCORE_THRESHOLD: "2.0"
    ANOMALY_WINDOW_SECONDS: "300"
    ANOMALY_COMPOSITE_SCORE_THRESHOLD: "0.4"

Health Check

curl http://localhost:8080/health
Returns:
{
  "status": "ok",
  "threads": {
    "etl": "running",
    "anomaly": "running",
    "indexer": "running",
    "lifecycle": "running"
  },
  "kafka": "connected",
  "opensearch": "connected"
}