Project Icon

alertmanager

智能告警管理与路由工具

Alertmanager是一款专业的告警处理工具,主要用于接收和管理来自Prometheus等监控系统的告警信息。它能够对告警进行去重、分组和智能路由,并将其发送到电子邮件、PagerDuty、OpsGenie等多种接收端。Alertmanager还提供告警静默、抑制等高级功能,并支持高可用性配置,是大规模监控系统中的关键组件。

Alertmanager [CircleCI][circleci]

[Docker Repository on Quay][quay] [Docker Pulls][hub]

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integrations such as email, PagerDuty, OpsGenie, or many other mechanisms thanks to the webhook receiver. It also takes care of silencing and inhibition of alerts.

Install

There are various ways of installing Alertmanager.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Alertmanager.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch an Alertmanager container for trying it out with

$ docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager

Alertmanager will now be reachable at http://localhost:9093/.

Compiling the binary

You can either go get it:

$ GO15VENDOREXPERIMENT=1 go get github.com/prometheus/alertmanager/cmd/...
# cd $GOPATH/src/github.com/prometheus/alertmanager
$ alertmanager --config.file=<your_file>

Or clone the repository and build manually:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/alertmanager.git
$ cd alertmanager
$ make build
$ ./alertmanager --config.file=<your_file>

You can also build just one of the binaries in this repo by passing a name to the build function:

$ make build BINARIES=amtool

Example

This is an example configuration that should cover most relevant aspects of the new YAML configuration format. The full documentation of the configuration can be found here.

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'
  smtp_from: 'alertmanager@example.org'

# The root route on which each incoming alert enters.
route:
  # The root route must not have any matchers as it is the entry point for
  # all alerts. It needs to have a receiver configured so alerts that do not
  # match any of the sub-routes are sent to someone.
  receiver: 'team-X-mails'

  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  #
  # To aggregate by all possible labels use '...' as the sole label name.
  # This effectively disables aggregation entirely, passing through all
  # alerts as-is. This is unlikely to be what you want, unless you have
  # a very low alert volume or your upstream notification system performs
  # its own grouping. Example: group_by: [...]
  group_by: ['alertname', 'cluster']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 30s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5m

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

  # The child route trees.
  routes:
  # This route performs a regular expression match on alert labels to
  # catch alerts that are related to a list of services.
  - matchers:
    - service=~"^(foo1|foo2|baz)$"
    receiver: team-X-mails

    # The service has a sub-route for critical alerts, any alerts
    # that do not match, i.e. severity != critical, fall-back to the
    # parent node and are sent to 'team-X-mails'
    routes:
    - matchers:
      - severity="critical"
      receiver: team-X-pager

  - matchers:
    - service="files"
    receiver: team-Y-mails

    routes:
    - matchers:
      - severity="critical"
      receiver: team-Y-pager

  # This route handles all alerts coming from a database service. If there's
  # no team to handle it, it defaults to the DB team.
  - matchers:
    - service="database"

    receiver: team-DB-pager
    # Also group alerts by affected database.
    group_by: [alertname, cluster, database]

    routes:
    - matchers:
      - owner="team-X"
      receiver: team-X-pager

    - matchers:
      - owner="team-Y"
      receiver: team-Y-pager


# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_matchers:
    - severity="critical"
  target_matchers:
    - severity="warning"
  # Apply inhibition if the alertname is the same.
  # CAUTION: 
  #   If all label names listed in `equal` are missing 
  #   from both the source and target alerts,
  #   the inhibition rule will apply!
  equal: ['alertname']


receivers:
- name: 'team-X-mails'
  email_configs:
  - to: 'team-X+alerts@example.org, team-Y+alerts@example.org'

- name: 'team-X-pager'
  email_configs:
  - to: 'team-X+alerts-critical@example.org'
  pagerduty_configs:
  - routing_key: <team-X-key>

- name: 'team-Y-mails'
  email_configs:
  - to: 'team-Y+alerts@example.org'

- name: 'team-Y-pager'
  pagerduty_configs:
  - routing_key: <team-Y-key>

- name: 'team-DB-pager'
  pagerduty_configs:
  - routing_key: <team-DB-key>

API

The current Alertmanager API is version 2. This API is fully generated via the OpenAPI project and Go Swagger with the exception of the HTTP handlers themselves. The API specification can be found in api/v2/openapi.yaml. A HTML rendered version can be accessed here. Clients can be easily generated via any OpenAPI generator for all major languages.

With the default config, endpoints are accessed under a /api/v1 or /api/v2 prefix. The v2 /status endpoint would be /api/v2/status. If --web.route-prefix is set then API routes are prefixed with that as well, so --web.route-prefix=/alertmanager/ would relate to /alertmanager/api/v2/status.

API v2 is still under heavy development and thereby subject to change.

amtool

amtool is a cli tool for interacting with the Alertmanager API. It is bundled with all releases of Alertmanager.

Install

Alternatively you can install with:

$ go install github.com/prometheus/alertmanager/cmd/amtool@latest

Examples

View all currently firing alerts:

$ amtool alert
Alertname        Starts At                Summary
Test_Alert       2017-08-02 18:30:18 UTC  This is a testing alert!
Test_Alert       2017-08-02 18:30:18 UTC  This is a testing alert!
Check_Foo_Fails  2017-08-02 18:30:18 UTC  This is a testing alert!
Check_Foo_Fails  2017-08-02 18:30:18 UTC  This is a testing alert!

View all currently firing alerts with extended output:

$ amtool -o extended alert
Labels                                        Annotations                                                    Starts At                Ends At                  Generator URL
alertname="Test_Alert" instance="node0"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
alertname="Test_Alert" instance="node1"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
alertname="Check_Foo_Fails" instance="node0"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
alertname="Check_Foo_Fails" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local

In addition to viewing alerts, you can use the rich query syntax provided by Alertmanager:

$ amtool -o extended alert query alertname="Test_Alert"
Labels                                   Annotations                                                    Starts At                Ends At                  Generator URL
alertname="Test_Alert" instance="node0"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
alertname="Test_Alert" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local

$ amtool -o extended alert query instance=~".+1"
Labels                                        Annotations                                                    Starts At                Ends At                  Generator URL
alertname="Test_Alert" instance="node1"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
alertname="Check_Foo_Fails" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local

$ amtool -o extended alert query alertname=~"Test.*" instance=~".+1"
Labels                                   Annotations                                                    Starts At                Ends At                  Generator URL
alertname="Test_Alert" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local

Silence an alert:

$ amtool silence add alertname=Test_Alert
b3ede22e-ca14-4aa0-932c-ca2f3445f926

$ amtool silence add alertname="Test_Alert" instance=~".+0"
e48cb58a-0b17-49ba-b734-3585139b1d25

View silences:

$ amtool silence query
ID                                    Matchers              Ends At                  Created By  Comment
b3ede22e-ca14-4aa0-932c-ca2f3445f926  alertname=Test_Alert  2017-08-02 19:54:50 UTC  kellel

$ amtool silence query instance=~".+0"
ID                                    Matchers                            Ends At                  Created By  Comment
e48cb58a-0b17-49ba-b734-3585139b1d25  alertname=Test_Alert instance=~.+0  2017-08-02 22:41:39 UTC  kellel

Expire a silence:

$ amtool silence expire b3ede22e-ca14-4aa0-932c-ca2f3445f926

Expire all silences matching a query:

$ amtool silence query instance=~".+0"
ID                                    Matchers                            Ends At                  Created By  Comment
e48cb58a-0b17-49ba-b734-3585139b1d25  alertname=Test_Alert instance=~.+0  2017-08-02 22:41:39 UTC  kellel

$ amtool silence expire $(amtool silence query -q instance=~".+0")

$ amtool silence query instance=~".+0"

Expire all silences:

$ amtool silence expire $(amtool silence query -q)

Try out how a template works. Let's say you have this in your configuration file:

templates:
  - '/foo/bar/*.tmpl'

Then you can test out how a template would look like with example by using this command:

amtool template render --template.glob='/foo/bar/*.tmpl' --template.text='{{ template "slack.default.markdown.v1" . }}'

Configuration

amtool allows a configuration file to specify some options for convenience. The default configuration file paths are $HOME/.config/amtool/config.yml or /etc/amtool/config.yml

An example configuration file might look like the following:

# Define the path that `amtool` can find your `alertmanager` instance
alertmanager.url: "http://localhost:9093"

# Override the default author. (unset defaults to your username)
author: me@example.com

# Force amtool to give you an error if you don't include a comment on a silence
comment_required: true

# Set a default output format. (unset defaults to simple)
output: extended

# Set a default receiver
receiver: team-X-pager

Routes

amtool allows you to visualize the routes of your configuration in form of text tree view. Also you can use it to test the routing by passing it label set of an alert and it prints out all receivers the alert would match ordered and separated by ,. (If you use --verify.receivers amtool returns error code 1 on mismatch)

Example of usage:

# View routing tree of remote Alertmanager
$ amtool config routes --alertmanager.url=http://localhost:9090

# Test if alert matches expected receiver
$ amtool config routes test --config.file=doc/examples/simple.yml --tree --verify.receivers=team-X-pager service=database owner=team-X

High Availability

Alertmanager's high availability is in production use at many companies and is enabled by default.

Important: Both UDP and TCP are needed in alertmanager 0.15 and higher for the cluster to work.

  • If you are using a firewall, make sure to whitelist the clustering port for both protocols.
  • If you are running in a container, make sure to expose the clustering port for both protocols.

To create a highly available cluster of the Alertmanager the instances need to be configured to communicate with each other. This is configured using the --cluster.* flags.

  • --cluster.listen-address string: cluster listen address (default "0.0.0.0:9094"; empty string disables HA mode)
  • --cluster.advertise-address string: cluster advertise address
  • --cluster.peer value: initial peers (repeat flag for each additional peer)
  • --cluster.peer-timeout value: peer timeout period (default "15s")
  • --cluster.gossip-interval value: cluster message propagation speed (default "200ms")
  • --cluster.pushpull-interval value: lower values will increase convergence speeds at expense of bandwidth (default "1m0s")
  • --cluster.settle-timeout value: maximum time to wait for cluster connections to settle before evaluating notifications.
  • --cluster.tcp-timeout value: timeout value for tcp connections, reads and writes (default "10s")
  • --cluster.probe-timeout value: time to wait for ack before marking node unhealthy (default "500ms")
  • --cluster.probe-interval value: interval between random node probes (default "1s")
  • --cluster.reconnect-interval value: interval between attempting to reconnect to lost peers (default "10s")
  • --cluster.reconnect-timeout value: length of time to attempt to reconnect to a lost peer (default: "6h0m0s")
  • --cluster.label value: the label is an optional string to include on
项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

吐司

探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover

SubCat字幕猫

SubCat字幕猫APP是一款创新的视频播放器,它将改变您观看视频的方式!SubCat结合了先进的人工智能技术,为您提供即时视频字幕翻译,无论是本地视频还是网络流媒体,让您轻松享受各种语言的内容。

Project Cover

美间AI

美间AI创意设计平台,利用前沿AI技术,为设计师和营销人员提供一站式设计解决方案。从智能海报到3D效果图,再到文案生成,美间让创意设计更简单、更高效。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号