IncidentRule

The IncidentRule CRD allows you to define rules for automatically creating, updating, and managing incidents based on events and conditions in your infrastructure.

Definition

apiVersion: mission-control.flanksource.com/v1
kind: IncidentRule
metadata:
  name: example-incident-rule
spec:
  # Source of events to process
  source:
    type: canary
    selector:
      matchLabels:
        app: frontend
        
  # Conditions that trigger the rule
  condition:
    status: unhealthy
    duration: 10m
    
  # Incident creation settings
  incident:
    title: "Frontend Availability Issue"
    severity: high
    owner: platform-team
    labels:
      service: frontend
      type: availability

Schema

The IncidentRule resource supports the following fields:

Field	Description
`spec.source`	Source configuration for events
`spec.source.type`	Type of event source (canary, component, alert, etc.)
`spec.source.selector`	Kubernetes label selector for matching sources
`spec.condition`	Conditions that trigger the rule
`spec.condition.status`	Required status of the source (e.g., unhealthy)
`spec.condition.duration`	Time duration condition must be true before triggering
`spec.condition.count`	Number of occurrences required to trigger
`spec.condition.message`	Message pattern to match
`spec.condition.labels`	Labels that must be present on the source
`spec.condition.expression`	CEL expression for complex conditions
`spec.incident`	Incident configuration
`spec.incident.title`	Title template for the incident
`spec.incident.description`	Description template for the incident
`spec.incident.severity`	Severity level (critical, high, medium, low)
`spec.incident.type`	Type classification for the incident
`spec.incident.owner`	Default owner for the incident
`spec.incident.labels`	Labels to apply to the incident
`spec.incident.components`	Components to associate with the incident
`spec.incident.playbooks`	Playbooks to trigger when incident is created
`spec.incident.responders`	Initial responders to assign
`spec.jira`	JIRA integration settings
`spec.pagerduty`	PagerDuty integration settings
`spec.teams`	Microsoft Teams integration settings
`spec.slack`	Slack integration settings

Examples

Basic Canary Failure Rule

apiVersion: mission-control.flanksource.com/v1
kind: IncidentRule
metadata:
  name: api-availability
spec:
  source:
    type: canary
    selector:
      matchLabels:
        check: api-health
  condition:
    status: unhealthy
    duration: 5m
  incident:
    title: "API Availability Issue"
    severity: high
    owner: api-team
    labels:
      service: api
      type: availability

Component Health Rule

apiVersion: mission-control.flanksource.com/v1
kind: IncidentRule
metadata:
  name: database-health
spec:
  source:
    type: component
    selector:
      matchLabels:
        type: database
        tier: production
  condition:
    status: unhealthy
    duration: 2m
  incident:
    title: "Database Health Issue - {{.component.name}}"
    description: "The database component {{.component.name}} is reporting unhealthy status.\n\nLast error: {{.component.status.message}}"
    severity: critical
    components:
      - "{{.component.id}}"
    playbooks:
      - database-recovery

Alert Manager Integration

apiVersion: mission-control.flanksource.com/v1
kind: IncidentRule
metadata:
  name: prometheus-alerts
spec:
  source:
    type: alertmanager
    selector:
      matchLabels:
        severity: critical
  condition:
    status: firing
    duration: 1m
  incident:
    title: "{{.alert.labels.alertname}}"
    description: "{{.alert.annotations.description}}"
    severity: "{{.alert.labels.severity}}"
    labels:
      source: prometheus
  pagerduty:
    integration: primary-pd-service
    severity: critical
  slack:
    channel: "#incidents"
    message: "Critical alert triggered: {{.alert.labels.alertname}}"

Complex Condition with Expression

apiVersion: mission-control.flanksource.com/v1
kind: IncidentRule
metadata:
  name: advanced-rule
spec:
  source:
    type: component
  condition:
    expression: |
      source.status == "unhealthy" && 
      (source.labels.tier == "production" || source.labels.criticality == "high") &&
      duration("10m")
  incident:
    title: "Service Disruption - {{.component.name}}"
    severity: high
    type: availability
    components:
      - "{{.component.id}}"
      - "{{range .component.dependencies}}{{.id}}{{end}}"

Definition​

Schema​

Examples​

Basic Canary Failure Rule​

Component Health Rule​

Alert Manager Integration​

Complex Condition with Expression​

See Also​