Monitoring core services in AWS with Prometheus and exporters for CloudWatch

managed- RDS ElastiCache AWS, , — . :





  1. endpoint’ ?





  2. Prometheus, ?





  3. / ?





: Prometheus CloudWatch cloudwatch_exporter prometheus_aws_cost_exporter AWS, Helm- Kubernetes. (K8s .) , .






CloudWatch — . , , . Prometheus, .





AWS CPU IO. , . , , . . , . CPU/IO, , , . 





, : prometheus_aws_cost_exporter , cloudwatch_exporter . cloudwatch_exporter .





!





1. IAM

, IAM (AWS Identity and Access Management). , :





  • cloudwatch:ListMetrics







  • cloudwatch:GetMetricStatistics







  • tag:GetResources







  prometheus_aws_cost_exporter : . JSON:





{
  "Effect": "Allow",
  "Action": [
    "cloudwatch:PutMetricData",
    "ec2:DescribeVolumes",
    "ec2:DescribeTags",
    "logs:PutLogEvents",
    "logs:DescribeLogStreams",
    "logs:DescribeLogGroups",
    "logs:CreateLogStream",
    "logs:CreateLogGroup",
    "ce:GetCostAndUsage"
  ],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": [
    "ssm:GetParameter"
  ],
  "Resource": [
    "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*",
    "arn:aws:ce:*:*:/GetCostAndUsage"
  ]
}
      
      



access key ID secret access key, (AWS_ACCESS_KEY_ID



AWS_SECRET_ACCESS_KEY



). 





2. IAM- -

AWS IAM, cloudwatch_users



.





Creating a user in IAM
IAM

Access Type Programmatic access, access key ID secret access key ( API, - ). – Attach existing policies directly, . IAM- ListMetrics



GetMetricStatistics



.





Creating a custom policy
policy

API, JSON-:





{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}
      
      



Review policy Policy (Create policy). . IAM-, Policy. AWS_ACCESS_KEY_ID



AWS_SECRET_ACCESS_KEY



, values.yaml Helm-.





Terraform, Terraform receipt IAM- . API terraform.tfstate



jq



:





jq '.resources[].instances[].attributes | {(.id): .secret}'







: CloudWatch ; . , API — , .





3. Helm-

cloudwatch-exporter cost-exporter Kubernetes. Helm-, .





values.yaml



:





---
aws_access_key_id: <AWS_ACCESS_KEY_ID>
aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
region: eu-central-1
replicas:  1
resources:
  requests:
    cpu: 1m
    memory: 512Mi
env:
  metric_today_daily_costs: "yes"
  metric_yesterday_daily_costs: "yes"
  query_period: "1800"
  metric_today_daily_usage: "yes"
  metric_today_daily_usage_norm: "yes"

      
      



:





  • aws_access_key_id



    aws_secret_access_key



    , IAM-;





  • region



    — , ; 





  • query_period



    — AWS ( );





  • metric_today_daily_costs



    , metric_yesterday_daily_costs



    , metric_today_daily_usage



    , metric_today_daily_usage_norm



    — / (costs) (usage) ( no



    );





  • env



    cost-exporter’ ( cloudwatch-exporter ).





— Deployment cloudwatch-exporter, ( ).





apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloudwatch-exporter
spec:
  selector:
    matchLabels:
      app: cloudwatch-exporter
  template:
    metadata:
      labels:
        app: cloudwatch-exporter
    spec:
      containers:
      - name: cloudwatch-exporter
        image: prom/cloudwatch-exporter:cloudwatch_exporter-0.9.0
        env:
        - name: AWS_ACCESS_KEY_ID
          value: "{{ .Values.aws_access_key_id }}"
        - name: AWS_SECRET_ACCESS_KEY
          value: "{{ .Values.aws_secret_access_key }}"
        volumeMounts:
        - name: config
          subPath: config.yml
          mountPath: /config/config.yml
      volumes:
      - name: config
        configMap:
          name: config
      
      



Deployment (, ) — cost-exporter:





apiVersion: apps/v1
kind: Deployment
metadata:
  name: cost-exporter
spec:
  selector:
    matchLabels:
      app: cost-exporter
  template:
    metadata:
      labels:
        app: cost-exporter
    spec:
      containers:
      - name: cost-exporter
        image: nachomillangarcia/prometheus_aws_cost_exporter:latest
        args:
        - --host
        - 0.0.0.0
        env:
        - name: AWS_ACCESS_KEY_ID
          value: "{{ .Values.aws_access_key_id }}"
        - name: AWS_SECRET_ACCESS_KEY
          value: "{{ .Values.aws_secret_access_key }}"
        - name: METRIC_TODAY_DAILY_COSTS
          value: "{{ .Values.env.metric_today_daily_costs }}"
        - name: METRIC_YESTERDAY_DAILY_COSTS
          value: "{{ .Values.env.metric_yesterday_daily_costs }}"
        - name: QUERY_PERIOD
          value: "{{ .Values.env.query_period }}"
        - name: METRIC_TODAY_DAILY_USAGE
          value: "{{ .Values.env.metric_today_daily_usage }}"
        - name: METRIC_TODAY_DAILY_USAGE_NORM
          value: "{{ .Values.env.metric_today_daily_usage_norm }}"
      
      



4.

— , Prometheus .





cloudwatch_exporter . — , EC2 — aws:





aws cloudwatch list-metrics --namespace EC2







aws



. :





  - aws_namespace: AWS/NetworkELB
    aws_metric_name: HealthyHostCount
    aws_dimensions:
    - LoadBalancer
    - TargetGroup
    aws_statistics:
    - Sum
    period_seconds: 60
      
      



AWS/NetworkELB HealthyHostCount



60 , LoadBalancer



TargetGroup



, Sum



.





!

CPU Redis ElastiCache:





  - alert: RedisCPUUsage
    annotations:
    description: |
      Redis CPU utilization on {{`{{$labels.cache_cluster_id}}`}} in cluster is over than 60%
    summary: Redis CPU utilization on {{`{{$labels.cache_cluster_id}}`}} in cluster is over than 60%
    expr: |
      aws_elasticache_cpuutilization_average >= 60
    for: 5m
      
      



target LoadBalancer:





  - alert: LBTargetGroupIsUnhealthy
    annotations:
    description: Some hosts are target group {{`{{$labels.target_group}}`}} in cluster is unhealthy!
    summary: Some hosts are target group {{`{{$labels.target_group}}`}} in cluster is unhealthy!
    expr: |
      aws_networkelb_healthy_host_count_sum{load_balancer=~".*someservice.*",target_group=~".*someservice.*"} < 3
    for: 1m
      
      



EBS Burst balance:





  - alert: EBSBurst_balance
    annotations:     
    description: EBS Burst balance in cluster is less than 60%                 
    summary: EBS Burst balance in cluster is less than 60%
    expr: |
      aws_ebs_burst_balance_average <= 60         
    for: 5m
      
      



.





, Prometheus:





AWS EC2 EBS IO balance (average)
AWS EC2 EBS IO balance (average)
AWS ElastiCache CPU utilization (average)
AWS ElastiCache CPU utilization (average)

AWS Prometheus exporter’. , managed- , . 





Prometheus ( , scrape’ ), Grafana. (, prometheus_aws_cost_exporter dashboard .) , , .





P.S.

:





  • « Kubernetes» ( );





  • « Prometheus (2020)»;





  • « Prometheus Operator Kubernetes».








All Articles