LogClaw uses a namespace-per-tenant, dedicated-instance model. Every tenant receives its own isolated Kubernetes namespace (logclaw-<tenantId>) containing a full, dedicated copy of every component. There is no shared data plane between tenants.Cluster-scoped operators (Strimzi, Flink Operator, ESO, cert-manager, OpenSearch Operator) are installed once per cluster and watch all tenant namespaces via label selectors. Tenant workloads are provisioned and reconciled through ArgoCD ApplicationSet, which generates one ArgoCD Application per tenant values file committed to gitops/tenants/.
The OpenTelemetry Collector is the sole entry point for all log data. It accepts OTLP over both gRPC (:4317) and HTTP (:4318), the CNCF industry standard supported by Datadog, Splunk, Grafana, AWS, GCP, and Azure.Pipeline:otlp receiver → memory_limiter → resource processor (inject tenant_id) → batch → kafka exporter (otlp_json, lz4 compression)
Sliding window over error rates per service, configurable threshold and window size
OpenSearch Indexer
Bulk index enriched documents
Reads from enriched-logs, writes to logclaw-logs-YYYY.MM.dd indices
Request Lifecycle
5-layer trace correlation engine
Groups logs by traceId → builds request timelines → computes blast radius → generates incident context
Bridge vs Flink: The Bridge provides trace correlation, anomaly detection, and OpenSearch indexing in a single lightweight Python service. For high-throughput production, Flink handles stream processing. For dev/demo and early-stage deployments, the Bridge is simpler — no Flink Operator needed. Enable both for maximum capability.
OpenSearch provides full-text search, log analytics, and visualization. Deployed with dedicated master and data nodes for production tiers. Index pattern: logclaw-logs-YYYY.MM.dd with automatic ILM policies.
Feast Feature Store + KServe InferenceService for serving anomaly detection models. Airflow orchestrates retraining DAGs that pull features from Feast, train models, and deploy updated InferenceServices.