Tenant Onboarding
This guide walks through provisioning a new LogClaw tenant from zero to fully operational. Expected time: 30 minutes after prerequisites are met.
Prerequisites
Cluster Prerequisites (one-time)
Kubernetes Cluster
Kubernetes >= 1.27
Control plane: 4 vCPU, 8 GB RAM
Workers (minimum 3): 8 vCPU, 32 GB RAM each
Operators (install once per cluster)
Strimzi Kafka Operator (operators/strimzi/)
Flink Kubernetes Operator (operators/flink-operator/)
External Secrets Operator (operators/eso/)
cert-manager (operators/cert-manager/)
OpenSearch Operator (operators/opensearch-operator/)
GitOps
ArgoCD installed with logclaw AppProject applied
ArgoCD ApplicationSet logclaw-tenants applied
Unique tenantId chosen (lowercase letters, numbers, hyphens; max 40 chars)
Kubernetes StorageClass available (e.g. gp3, standard, pd-ssd)
Object storage bucket pre-created and accessible from cluster
Secrets backend configured (AWS Secrets Manager, GCP Secret Manager, Vault, or Azure Key Vault)
Image pull secret logclaw-registry-pull created (or omit if using public registry)
Ticketing provider credentials stored in secret backend (if ticketing agent enabled)
Write access to the LogClaw Git repository
ArgoCD UI or CLI access to monitor sync status
Step 1: Create the Tenant Values File
Copy the template and fill in required fields:
cp gitops/tenants/_template.yaml gitops/tenants/ < tenantI d > .yaml
Set the required global values:
tenantId : "acme-corp"
global :
tenantName : "Acme Corporation"
storageClass : "gp3"
tier : "ha" # standard | ha | ultra-ha
objectStorage :
provider : "s3" # s3 | gcs | azure
bucket : "logclaw-acme-corp"
region : "us-east-1"
secretStore :
provider : "aws" # aws | gcp | vault | azure
region : "us-east-1"
Enable Components
Toggle components based on your requirements:
# Core pipeline (recommended: all enabled)
platform : { enabled : true } # RBAC, NetworkPolicy, SecretStore
otelCollector : { enabled : true } # OTLP ingestion (gRPC :4317, HTTP :4318)
kafka : { enabled : true } # Event bus (required by most components)
opensearch : { enabled : true } # Search & analytics
# Processing (choose one or both)
flink : { enabled : true } # High-throughput stream processing
bridge : { enabled : true } # OTLP ETL + anomaly + trace correlation
# AI & Operations
mlEngine : { enabled : true } # Feast + KServe model inference
airflow : { enabled : true } # ML pipeline orchestration
ticketingAgent : { enabled : true } # AI SRE incident management
agent : { enabled : true } # Infrastructure health collector
# UI
dashboard : { enabled : true } # Next.js pipeline UI
For development environments, enable bridge and dashboard while disabling flink, mlEngine, and airflow to reduce resource requirements.
See the Values Reference for the full list of configurable fields.
Step 2: Commit and Push
git add gitops/tenants/ < tenantI d > .yaml
git commit -m "feat(tenants): onboard <tenantId>"
git push origin main
The ArgoCD Git generator polls every 3 minutes. To trigger immediately:
argocd app get logclaw-tenants --refresh
Step 3: Monitor Deployment
Watch the tenant application sync:
# List all tenant applications
argocd app list -l logclaw.io/managed-by=applicationset
# Watch sync status
argocd app get logclaw-tenant- < tenantI d > --watch
Expected sync wave order:
Namespace + Platform (t=0–5m)
Namespace creation, RBAC, NetworkPolicy, ClusterSecretStore
Kafka (t=5–10m)
Strimzi reconciles KRaft cluster. Wait for READY=True.
OTel Collector + OpenSearch (t=10–15m)
Deploy in parallel. OTel Collector connects to Kafka bootstrap. OpenSearch cluster reaches green.
Flink + Bridge (t=15–20m)
Flink jobs enter RUNNING state. Bridge connects to Kafka + OpenSearch.
ML Engine + Airflow (t=20–25m)
KServe InferenceService ready. Airflow scheduler + webserver healthy.
Ticketing Agent + Dashboard (t=25–30m)
Agent validates ticketing credentials. Dashboard available on ClusterIP.
Step 4: Verify Components
Run the built-in Helm test suite:
helm test logclaw- < tenantI d > \
--namespace logclaw- < tenantI d > \
--timeout 5m \
--logs
Individual Component Checks
Kafka
OTel Collector
OpenSearch
Bridge
Ticketing Agent
Dashboard
Flink
Agent
kubectl get kafka -n logclaw- < tenantI d >
# Expected: READY=True, REPLICAS=3 (ha tier)
kubectl get pods -n logclaw- < tenantI d > \
-l app.kubernetes.io/name=logclaw-otel-collector
# Expected: all pods Running
# Test OTLP endpoint
kubectl port-forward svc/logclaw-otel-collector 4318:4318 \
-n logclaw- < tenantI d >
curl -s -o /dev/null -w "%{http_code}" \
-X POST http://localhost:4318/v1/logs \
-H "Content-Type: application/json" \
-d '{"resourceLogs":[]}'
# Expected: 200
kubectl get opensearchcluster -n logclaw- < tenantI d >
# Expected: state=Ready
kubectl port-forward svc/logclaw-opensearch 9200:9200 \
-n logclaw- < tenantI d >
curl -sk https://localhost:9200/_cluster/health | jq .status
# Expected: "green"
kubectl get pods -n logclaw- < tenantI d > \
-l app.kubernetes.io/name=logclaw-bridge
# Expected: Running
kubectl port-forward svc/logclaw-bridge 8080:8080 \
-n logclaw- < tenantI d >
curl http://localhost:8080/health
# Expected: {"status": "ok", ...}
kubectl logs -n logclaw- < tenantI d > \
-l app.kubernetes.io/name=logclaw-ticketing-agent \
--tail=20
# Expected: "Connected to Kafka", "Ticketing provider validated"
kubectl port-forward svc/logclaw-dashboard 3000:3000 \
-n logclaw- < tenantI d >
# Open http://localhost:3000
kubectl get flinkdeployment -n logclaw- < tenantI d >
# Expected: LIFECYCLE_STATE=STABLE, JOB_STATE=RUNNING
kubectl get pods -n logclaw- < tenantI d > \
-l app.kubernetes.io/name=logclaw-agent
# Expected: Running
kubectl port-forward svc/logclaw-agent 8080:8080 \
-n logclaw- < tenantI d >
curl http://localhost:8080/health
# Expected: {"status": "ok"}
Step 5: Send Your First Logs
After verification, send a test log via OTLP HTTP:
curl -X POST http://localhost:4318/v1/logs \
-H "Content-Type: application/json" \
-d '{
"resourceLogs": [{
"resource": {
"attributes": [
{"key": "service.name", "value": {"stringValue": "onboarding-test"}}
]
},
"scopeLogs": [{
"logRecords": [{
"timeUnixNano": "' $( date +%s ) 000000000'",
"severityText": "INFO",
"body": {"stringValue": "Tenant onboarding complete!"}
}]
}]
}]
}'
Verify it appears in OpenSearch:
curl -sk https://localhost:9200/logclaw-logs- * /_search \
-H "Content-Type: application/json" \
-d '{"query":{"match":{"service":"onboarding-test"}}}'
Troubleshooting
ArgoCD Application stuck in Progressing
argocd app get logclaw-tenant- < tenantI d >
kubectl describe application logclaw-tenant- < tenantI d > -n argocd
Check the “Conditions” section for error details.
Kafka not reaching Ready state
kubectl describe kafka logclaw- < tenantI d > -kafka -n logclaw- < tenantI d >
kubectl logs -n strimzi-system -l name=strimzi-cluster-operator --tail=50
Common causes: StorageClass not found, PVC provisioning failure, resource limits too low.
OpenSearch cluster health red
kubectl describe opensearchcluster -n logclaw- < tenantI d >
kubectl logs -n opensearch-operator-system \
-l control-plane=controller-manager --tail=50
Common causes: Insufficient memory (2 Gi minimum per data node), disk pressure.
kubectl describe flinkdeployment -n logclaw- < tenantI d >
kubectl logs -n logclaw- < tenantI d > -l app=logclaw-flink-anomaly --tail=50
Common causes: Kafka bootstrap unreachable, object storage credentials invalid.
External Secrets not syncing
kubectl get externalsecret -n logclaw- < tenantI d >
kubectl describe externalsecret -n logclaw- < tenantI d >
kubectl get clustersecretstore
Common causes: IAM role not attached, wrong region, missing permissions.
argocd app get logclaw-tenant- < tenantI d > --hard-refresh
argocd app sync logclaw-tenant- < tenantI d > --force