managed- RDS ElastiCache AWS, , — . :
endpoint’ ?
Prometheus, ?
/ ?
: Prometheus CloudWatch cloudwatch_exporter prometheus_aws_cost_exporter AWS, Helm- Kubernetes. (K8s .) , .
CloudWatch — . , , . Prometheus, .
AWS CPU IO. , . , , . . , . CPU/IO, , , .
, : prometheus_aws_cost_exporter , cloudwatch_exporter . cloudwatch_exporter .
!
1. IAM
, IAM (AWS Identity and Access Management). , :
cloudwatch:ListMetrics
cloudwatch:GetMetricStatistics
tag:GetResources
prometheus_aws_cost_exporter : . JSON:
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"ec2:DescribeVolumes",
"ec2:DescribeTags",
"logs:PutLogEvents",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:CreateLogStream",
"logs:CreateLogGroup",
"ce:GetCostAndUsage"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": [
"arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*",
"arn:aws:ce:*:*:/GetCostAndUsage"
]
}
access key ID secret access key, (AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
).
2. IAM- -
AWS IAM, cloudwatch_users
.
Access Type Programmatic access, access key ID secret access key ( API, - ). – Attach existing policies directly, . IAM- ListMetrics
GetMetricStatistics
.
API, JSON-:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
}
]
}
Review policy Policy (Create policy). . IAM-, Policy. AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
, values.yaml Helm-.
Terraform, Terraform receipt IAM- . API terraform.tfstate
jq
:
jq '.resources[].instances[].attributes | {(.id): .secret}'
: CloudWatch ; . , API — , .
3. Helm-
cloudwatch-exporter cost-exporter Kubernetes. Helm-, .
values.yaml
:
---
aws_access_key_id: <AWS_ACCESS_KEY_ID>
aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
region: eu-central-1
replicas: 1
resources:
requests:
cpu: 1m
memory: 512Mi
env:
metric_today_daily_costs: "yes"
metric_yesterday_daily_costs: "yes"
query_period: "1800"
metric_today_daily_usage: "yes"
metric_today_daily_usage_norm: "yes"
:
aws_access_key_id
aws_secret_access_key
, IAM-;
region
— , ;
query_period
— AWS ( );
metric_today_daily_costs
,metric_yesterday_daily_costs
,metric_today_daily_usage
,metric_today_daily_usage_norm
— / (costs) (usage) (no
);
env
cost-exporter’ ( cloudwatch-exporter ).
— Deployment cloudwatch-exporter, ( ). .
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloudwatch-exporter
spec:
selector:
matchLabels:
app: cloudwatch-exporter
template:
metadata:
labels:
app: cloudwatch-exporter
spec:
containers:
- name: cloudwatch-exporter
image: prom/cloudwatch-exporter:cloudwatch_exporter-0.9.0
env:
- name: AWS_ACCESS_KEY_ID
value: "{{ .Values.aws_access_key_id }}"
- name: AWS_SECRET_ACCESS_KEY
value: "{{ .Values.aws_secret_access_key }}"
volumeMounts:
- name: config
subPath: config.yml
mountPath: /config/config.yml
volumes:
- name: config
configMap:
name: config
Deployment (, ) — cost-exporter:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cost-exporter
spec:
selector:
matchLabels:
app: cost-exporter
template:
metadata:
labels:
app: cost-exporter
spec:
containers:
- name: cost-exporter
image: nachomillangarcia/prometheus_aws_cost_exporter:latest
args:
- --host
- 0.0.0.0
env:
- name: AWS_ACCESS_KEY_ID
value: "{{ .Values.aws_access_key_id }}"
- name: AWS_SECRET_ACCESS_KEY
value: "{{ .Values.aws_secret_access_key }}"
- name: METRIC_TODAY_DAILY_COSTS
value: "{{ .Values.env.metric_today_daily_costs }}"
- name: METRIC_YESTERDAY_DAILY_COSTS
value: "{{ .Values.env.metric_yesterday_daily_costs }}"
- name: QUERY_PERIOD
value: "{{ .Values.env.query_period }}"
- name: METRIC_TODAY_DAILY_USAGE
value: "{{ .Values.env.metric_today_daily_usage }}"
- name: METRIC_TODAY_DAILY_USAGE_NORM
value: "{{ .Values.env.metric_today_daily_usage_norm }}"
4.
— , Prometheus .
cloudwatch_exporter . — , EC2 — aws:
aws cloudwatch list-metrics --namespace EC2
aws
. :
- aws_namespace: AWS/NetworkELB
aws_metric_name: HealthyHostCount
aws_dimensions:
- LoadBalancer
- TargetGroup
aws_statistics:
- Sum
period_seconds: 60
AWS/NetworkELB HealthyHostCount
60 , LoadBalancer
TargetGroup
, Sum
.
!
CPU Redis ElastiCache:
- alert: RedisCPUUsage
annotations:
description: |
Redis CPU utilization on {{`{{$labels.cache_cluster_id}}`}} in cluster is over than 60%
summary: Redis CPU utilization on {{`{{$labels.cache_cluster_id}}`}} in cluster is over than 60%
expr: |
aws_elasticache_cpuutilization_average >= 60
for: 5m
target LoadBalancer:
- alert: LBTargetGroupIsUnhealthy
annotations:
description: Some hosts are target group {{`{{$labels.target_group}}`}} in cluster is unhealthy!
summary: Some hosts are target group {{`{{$labels.target_group}}`}} in cluster is unhealthy!
expr: |
aws_networkelb_healthy_host_count_sum{load_balancer=~".*someservice.*",target_group=~".*someservice.*"} < 3
for: 1m
EBS Burst balance:
- alert: EBSBurst_balance
annotations:
description: EBS Burst balance in cluster is less than 60%
summary: EBS Burst balance in cluster is less than 60%
expr: |
aws_ebs_burst_balance_average <= 60
for: 5m
, Prometheus:
AWS Prometheus exporter’. , managed- , .
Prometheus ( , scrape’ ), Grafana. (, prometheus_aws_cost_exporter dashboard .) , , .
P.S.
: