Skip to main content

ScrapeConfig

The ScrapeConfig CRD allows you to define configurations for scraping data from various sources to populate components, relationships, and other resources in Mission Control.

Definition

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: example-scrape-config
spec:
# Source to scrape data from
source:
type: kubernetes
connection: production-cluster

# How to transform the scraped data
transform:
components:
- name: "{{.metadata.name}}"
type: "kubernetes.{{.kind}}"
labels:
namespace: "{{.metadata.namespace}}"

Schema

The ScrapeConfig resource supports the following fields:

FieldDescription
spec.scheduleSchedule for the scrape job (cron format)
spec.sourceSource configuration for data scraping
spec.source.typeType of data source (kubernetes, aws, azure, etc.)
spec.source.connectionConnection to use for the source
spec.source.resourceResource type to scrape
spec.source.queryQuery to filter resources
spec.source.selectorSelector to filter resources
spec.transformTransformation configuration
spec.transform.componentsComponent transformation rules
spec.transform.relationshipsRelationship transformation rules
spec.transform.propertiesProperty transformation rules
spec.transform.labelsLabel transformation rules
spec.transform.templateCustom transformation template
spec.transform.scriptCustom transformation script
spec.pluginsPlugins to use for transformation
spec.timeoutTimeout for the scrape job
spec.backoffBackoff configuration for retries

Examples

Kubernetes Resources Scrape

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: k8s-deployments
spec:
schedule: "*/10 * * * *" # Every 10 minutes
source:
type: kubernetes
connection: production-cluster
resource: deployments
transform:
components:
- name: "{{.metadata.name}}"
type: kubernetes.deployment
icon: kubernetes
description: "Kubernetes Deployment in {{.metadata.namespace}}"
labels:
namespace: "{{.metadata.namespace}}"
app: "{{index .metadata.labels \"app\" | default \"\"}}"
properties:
replicas: "{{.spec.replicas}}"
strategy: "{{.spec.strategy.type}}"
selector: "{{.spec.selector | toJson}}"
image: "{{(index .spec.template.spec.containers 0).image}}"

AWS EC2 Instances Scrape

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: aws-ec2-instances
spec:
schedule: "*/30 * * * *" # Every 30 minutes
source:
type: aws
connection: aws-production
resource: ec2
transform:
components:
- name: "EC2 {{.InstanceId}}"
type: aws.ec2
icon: ec2
description: "{{tags.Name | default .InstanceId}}"
labels:
region: "{{.Region}}"
type: "{{.InstanceType}}"
environment: "{{index .Tags \"Environment\" | default \"\"}}"
properties:
state: "{{.State.Name}}"
privateIp: "{{.PrivateIpAddress}}"
publicIp: "{{.PublicIpAddress | default \"\"}}"
launchTime: "{{.LaunchTime | formatTime}}"
securityGroups: "{{range .SecurityGroups}}{{.GroupName}}, {{end}}"
ami: "{{.ImageId}}"

Database Schema Scrape

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: postgres-schema
spec:
schedule: "0 */6 * * *" # Every 6 hours
source:
type: sql
connection: production-db
query: |
SELECT
t.table_name,
t.table_schema,
obj_description((t.table_schema || '.' || t.table_name)::regclass) as description,
(SELECT COUNT(*) FROM information_schema.columns c WHERE c.table_name = t.table_name AND c.table_schema = t.table_schema) as column_count
FROM information_schema.tables t
WHERE t.table_schema NOT IN ('pg_catalog', 'information_schema')
ORDER BY t.table_schema, t.table_name
transform:
components:
- name: "{{.table_name}}"
type: database.table
icon: table
description: "{{.description | default (printf \"Table %s.%s\" .table_schema .table_name)}}"
labels:
schema: "{{.table_schema}}"
database: "production"
properties:
columnCount: "{{.column_count}}"

API Service Scrape with Relationships

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: api-services
spec:
schedule: "*/15 * * * *" # Every 15 minutes
source:
type: http
connection: service-registry
url: /api/services
transform:
components:
- name: "{{.name}}"
type: service.api
icon: api
description: "{{.description}}"
labels:
version: "{{.version}}"
team: "{{.team}}"
environment: "{{.environment}}"
properties:
endpoint: "{{.endpoint}}"
status: "{{.status}}"
lastDeployed: "{{.lastDeployTime | formatTime}}"
relationships:
- source:
selector:
id: "{{.name}}"
target:
selector:
id: "{{.database}}"
relationship: dependsOn
properties:
connectionString: "{{.connectionDetails.type}}://{{.connectionDetails.host}}:{{.connectionDetails.port}}/{{.connectionDetails.database}}"
- source:
selector:
id: "{{.name}}"
target:
selector:
id: "{{range .dependencies}}{{.}},{{end}}"
relationship: dependsOn

See Also