Page MenuHomeMiraheze

Install prometheus-es-exporter for prometheus <-> graylog integration
Closed, ResolvedPublic

Description

In T6979#138313, @John wrote:

https://grafana.com/docs/grafana/latest/datasources/elasticsearch/ data could alternatively be collected directly via ES potentially

I told you on IRC yesterday, but it's good practice to document this on Phabricator: my suggestion is to use https://github.com/braedon/prometheus-es-exporter, a tool that runs on the graylog hosts. The tool takes an elasticsearch query, performs the search and returns the result in prometheus format. Prometheus collects the metrics, after which we can use the metrics in Grafana dashboards.

ACLs in elasticsearch are hard (but aren't needed so far, since elasticsearch only listens on 127.0.0.1), but the prometheus-es-exporter tool is a nice solution.
Wikimedia queries: https://github.com/wikimedia/puppet/tree/f9bdcb97b5d1a6154bfa033f0cac292ede3710a1/modules/prometheus/files/es_exporter
Puppet classes: https://github.com/wikimedia/puppet/blob/f9bdcb97b5d1a6154bfa033f0cac292ede3710a1/modules/prometheus/manifests/es_exporter.pp

Wikimedia has debian packages: https://apt.wikimedia.org/wikimedia/pool/main/p/prometheus-es-exporter/. prometheus-es-exporter must be installed on the graylog server, then we'll have to find out how to ingest the output into prometheus.

Event Timeline

Southparkfan triaged this task as Normal priority.

Is there a use case for this that the ES data source wouldn’t fulfil? Is this the approach Technology-Team (MediaWiki) wish to take? If so this would fall under the MW team to implement as part of their task as without a use case for Infra, what’s the point in implementing something unused?

Proof of concept:
/etc/prometheus-es-exporter/mediawiki.cfg:

[query_log_mediawiki]
QueryIntervalSecs = 900
QueryIndices = <graylog_deflector>
QueryJson = {
    "size": 0,
    "track_total_hits": true,
        "query": {
                "bool": {
                        "must": [
                                {
                                        "match": {
                                                "application_name": "mediawiki"
                                        }
                                }
                        ],
                        "filter": [
                                {
                                        "range": {
                                                "timestamp": { "gte": "now-15m", "lte": "now" }
                                        }
                                }
                        ]
                }
        },
        "aggs": {
                "mediawiki-channels": {
                        "terms": {
                                "field": "mediawiki_channel"
                        }
                }
        }
    }

(search for all entries from the last 15 minutes where application_name is mediawiki, make an aggregation: how many times was each value of mediawiki_channel seen?)

Output of curl http://localhost:9206 (cut down to the relevant parts:

log_mediawiki_hits 3381.0
# HELP log_mediawiki_took_milliseconds
# TYPE log_mediawiki_took_milliseconds gauge
log_mediawiki_took_milliseconds 5.0
# HELP log_mediawiki_mediawiki_channels_doc_count_error_upper_bound
# TYPE log_mediawiki_mediawiki_channels_doc_count_error_upper_bound gauge
log_mediawiki_mediawiki_channels_doc_count_error_upper_bound 0.0
# HELP log_mediawiki_mediawiki_channels_sum_other_doc_count
# TYPE log_mediawiki_mediawiki_channels_sum_other_doc_count gauge
log_mediawiki_mediawiki_channels_sum_other_doc_count 24.0
# HELP log_mediawiki_mediawiki_channels_doc_count
# TYPE log_mediawiki_mediawiki_channels_doc_count gauge
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="CentralAuth"} 153.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="FlowDebug"} 25.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="Parsoid"} 36.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="captcha"} 338.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="deprecated"} 350.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="error"} 63.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="exec"} 194.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="http"} 2151.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="session"} 11.0
log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="visualeditor"} 35.0
In T7073#140060, @John wrote:

Is there a use case for this that the ES data source wouldn’t fulfil? Is this the approach Technology-Team (MediaWiki) wish to take? If so this would fall under the MW team to implement as part of their task as without a use case for Infra, what’s the point in implementing something unused?

There are more use cases than MediaWiki only. For example, I would like to monitor SSH authentication attempts and access logs of non-MediaWiki services, which is a task for us, not for the MediaWiki team. The proof of concept above was tailored for MediaWiki logs, because said logs have a higher priority.

prometheus-es-exporter uses the same Elasticsearch data as everyone does currently (via Graylog). The difference is the implementation method: instead of requiring authentication in elasticsearch, which requires a license (which was not OSI-approved, but neither is their new licensing system, so sooner or later we'll have to revisit the usage of Elasticsearch either way), we run an exporter on the Graylog server that can fetch data directly from Elasticsearch, after which it transform the data into Prometheus format. Not only does the exporter avoid the need for the proprietary plug-in, but it also allows quick rendering of Grafana graphs, since it is less intensive to look up a few integers for a timeseries, than it is to search a 10G+ daily ingested data indice in Elasticsearch.

Unknown Object (User) unsubscribed.Apr 3 2021, 19:55

Since there are more uses than MediaWiki, should this be tagged as Technology-Team (MediaWiki) only?

Since there are more uses than MediaWiki, should this be tagged as Technology-Team (MediaWiki) only?

There’s no open tasks or plans for any uses external to MediaWiki currently.

I could work on adding the metrics to prometheus. Which metrics would you like to collect? (a counter of <this> in unit <that>)

I could work on adding the metrics to prometheus. Which metrics would you like to collect? (a counter of <this> in unit <that>)

T6979#138317

Unknown Object (User) claimed this task.Oct 14 2021, 22:31
Unknown Object (User) subscribed.Oct 14 2021, 22:38
Unknown Object (User) moved this task from Unsorted to Goals on the Universal Omega board.Oct 15 2021, 21:05
Unknown Object (User) added a comment.Oct 15 2021, 21:07

https://github.com/miraheze/puppet/pull/2032 should complete this task I believe. It should also allow us to finalise T6979 if I did this one correctly.

Unknown Object (User) closed this task as Resolved.Oct 16 2021, 19:28