Skip to main content

Tenant Onboarding

This guide walks through provisioning a new LogClaw tenant from zero to fully operational. Expected time: 30 minutes after prerequisites are met.

Prerequisites

Kubernetes Cluster
  • Kubernetes >= 1.27
  • Control plane: 4 vCPU, 8 GB RAM
  • Workers (minimum 3): 8 vCPU, 32 GB RAM each
Operators (install once per cluster)
  • Strimzi Kafka Operator (operators/strimzi/)
  • Flink Kubernetes Operator (operators/flink-operator/)
  • External Secrets Operator (operators/eso/)
  • cert-manager (operators/cert-manager/)
  • OpenSearch Operator (operators/opensearch-operator/)
GitOps
  • ArgoCD installed with logclaw AppProject applied
  • ArgoCD ApplicationSet logclaw-tenants applied
  • Unique tenantId chosen (lowercase letters, numbers, hyphens; max 40 chars)
  • Kubernetes StorageClass available (e.g. gp3, standard, pd-ssd)
  • Object storage bucket pre-created and accessible from cluster
  • Secrets backend configured (AWS Secrets Manager, GCP Secret Manager, Vault, or Azure Key Vault)
  • Image pull secret logclaw-registry-pull created (or omit if using public registry)
  • Ticketing provider credentials stored in secret backend (if ticketing agent enabled)
  • Write access to the LogClaw Git repository
  • ArgoCD UI or CLI access to monitor sync status

Step 1: Create the Tenant Values File

Copy the template and fill in required fields:
cp gitops/tenants/_template.yaml gitops/tenants/<tenantId>.yaml
Set the required global values:
tenantId: "acme-corp"

global:
  tenantName: "Acme Corporation"
  storageClass: "gp3"
  tier: "ha"                       # standard | ha | ultra-ha

  objectStorage:
    provider: "s3"                 # s3 | gcs | azure
    bucket: "logclaw-acme-corp"
    region: "us-east-1"

  secretStore:
    provider: "aws"                # aws | gcp | vault | azure
    region: "us-east-1"

Enable Components

Toggle components based on your requirements:
# Core pipeline (recommended: all enabled)
platform:       { enabled: true }   # RBAC, NetworkPolicy, SecretStore
otelCollector:  { enabled: true }   # OTLP ingestion (gRPC :4317, HTTP :4318)
kafka:          { enabled: true }   # Event bus (required by most components)
opensearch:     { enabled: true }   # Search & analytics

# Processing (choose one or both)
flink:          { enabled: true }   # High-throughput stream processing
bridge:         { enabled: true }   # OTLP ETL + anomaly + trace correlation

# AI & Operations
mlEngine:       { enabled: true }   # Feast + KServe model inference
airflow:        { enabled: true }   # ML pipeline orchestration
ticketingAgent: { enabled: true }   # AI SRE incident management
agent:          { enabled: true }   # Infrastructure health collector

# UI
dashboard:      { enabled: true }   # Next.js pipeline UI
For development environments, enable bridge and dashboard while disabling flink, mlEngine, and airflow to reduce resource requirements.
See the Values Reference for the full list of configurable fields.

Step 2: Commit and Push

git add gitops/tenants/<tenantId>.yaml
git commit -m "feat(tenants): onboard <tenantId>"
git push origin main
The ArgoCD Git generator polls every 3 minutes. To trigger immediately:
argocd app get logclaw-tenants --refresh

Step 3: Monitor Deployment

Watch the tenant application sync:
# List all tenant applications
argocd app list -l logclaw.io/managed-by=applicationset

# Watch sync status
argocd app get logclaw-tenant-<tenantId> --watch
Expected sync wave order:
1

Namespace + Platform (t=0–5m)

Namespace creation, RBAC, NetworkPolicy, ClusterSecretStore
2

Kafka (t=5–10m)

Strimzi reconciles KRaft cluster. Wait for READY=True.
3

OTel Collector + OpenSearch (t=10–15m)

Deploy in parallel. OTel Collector connects to Kafka bootstrap. OpenSearch cluster reaches green.
4

Flink + Bridge (t=15–20m)

Flink jobs enter RUNNING state. Bridge connects to Kafka + OpenSearch.
5

ML Engine + Airflow (t=20–25m)

KServe InferenceService ready. Airflow scheduler + webserver healthy.
6

Ticketing Agent + Dashboard (t=25–30m)

Agent validates ticketing credentials. Dashboard available on ClusterIP.

Step 4: Verify Components

Run the built-in Helm test suite:
helm test logclaw-<tenantId> \
  --namespace logclaw-<tenantId> \
  --timeout 5m \
  --logs

Individual Component Checks

kubectl get kafka -n logclaw-<tenantId>
# Expected: READY=True, REPLICAS=3 (ha tier)

Step 5: Send Your First Logs

After verification, send a test log via OTLP HTTP:
curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{
    "resourceLogs": [{
      "resource": {
        "attributes": [
          {"key": "service.name", "value": {"stringValue": "onboarding-test"}}
        ]
      },
      "scopeLogs": [{
        "logRecords": [{
          "timeUnixNano": "'$(date +%s)000000000'",
          "severityText": "INFO",
          "body": {"stringValue": "Tenant onboarding complete!"}
        }]
      }]
    }]
  }'
Verify it appears in OpenSearch:
curl -sk https://localhost:9200/logclaw-logs-*/_search \
  -H "Content-Type: application/json" \
  -d '{"query":{"match":{"service":"onboarding-test"}}}'

Troubleshooting

argocd app get logclaw-tenant-<tenantId>
kubectl describe application logclaw-tenant-<tenantId> -n argocd
Check the “Conditions” section for error details.
kubectl describe kafka logclaw-<tenantId>-kafka -n logclaw-<tenantId>
kubectl logs -n strimzi-system -l name=strimzi-cluster-operator --tail=50
Common causes: StorageClass not found, PVC provisioning failure, resource limits too low.
kubectl describe opensearchcluster -n logclaw-<tenantId>
kubectl logs -n opensearch-operator-system \
  -l control-plane=controller-manager --tail=50
Common causes: Insufficient memory (2 Gi minimum per data node), disk pressure.
kubectl get externalsecret -n logclaw-<tenantId>
kubectl describe externalsecret -n logclaw-<tenantId>
kubectl get clustersecretstore
Common causes: IAM role not attached, wrong region, missing permissions.
argocd app get logclaw-tenant-<tenantId> --hard-refresh
argocd app sync logclaw-tenant-<tenantId> --force