Wikimedia has debian packages: https://apt.wikimedia.org/wikimedia/pool/main/p/prometheus-es-exporter/. prometheus-es-exporter must be installed on the graylog server, then we'll have to find out how to ingest the output into prometheus.
Description
Revisions and Commits
rPUPC Puppet Configuration | |||
rPUPCa5d9b040b501 Merge pull request #2032 from Universal-Omega/patch-130 | |||
rPUPCbf529236b458 Merge pull request #2031 from Universal-Omega/patch-129 |
Status | Assigned | Task | ||
---|---|---|---|---|
Resolved | Unknown Object (User) | T6979 Collect Statistics for API Requests (Including Module Type) | ||
Resolved | Unknown Object (User) | T7073 Install prometheus-es-exporter for prometheus <-> graylog integration |
Event Timeline
Is there a use case for this that the ES data source wouldn’t fulfil? Is this the approach Technology-Team (MediaWiki) wish to take? If so this would fall under the MW team to implement as part of their task as without a use case for Infra, what’s the point in implementing something unused?
Proof of concept:
/etc/prometheus-es-exporter/mediawiki.cfg:
[query_log_mediawiki] QueryIntervalSecs = 900 QueryIndices = <graylog_deflector> QueryJson = { "size": 0, "track_total_hits": true, "query": { "bool": { "must": [ { "match": { "application_name": "mediawiki" } } ], "filter": [ { "range": { "timestamp": { "gte": "now-15m", "lte": "now" } } } ] } }, "aggs": { "mediawiki-channels": { "terms": { "field": "mediawiki_channel" } } } }
(search for all entries from the last 15 minutes where application_name is mediawiki, make an aggregation: how many times was each value of mediawiki_channel seen?)
Output of curl http://localhost:9206 (cut down to the relevant parts:
log_mediawiki_hits 3381.0 # HELP log_mediawiki_took_milliseconds # TYPE log_mediawiki_took_milliseconds gauge log_mediawiki_took_milliseconds 5.0 # HELP log_mediawiki_mediawiki_channels_doc_count_error_upper_bound # TYPE log_mediawiki_mediawiki_channels_doc_count_error_upper_bound gauge log_mediawiki_mediawiki_channels_doc_count_error_upper_bound 0.0 # HELP log_mediawiki_mediawiki_channels_sum_other_doc_count # TYPE log_mediawiki_mediawiki_channels_sum_other_doc_count gauge log_mediawiki_mediawiki_channels_sum_other_doc_count 24.0 # HELP log_mediawiki_mediawiki_channels_doc_count # TYPE log_mediawiki_mediawiki_channels_doc_count gauge log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="CentralAuth"} 153.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="FlowDebug"} 25.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="Parsoid"} 36.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="captcha"} 338.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="deprecated"} 350.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="error"} 63.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="exec"} 194.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="http"} 2151.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="session"} 11.0 log_mediawiki_mediawiki_channels_doc_count{mediawiki_channels="visualeditor"} 35.0
There are more use cases than MediaWiki only. For example, I would like to monitor SSH authentication attempts and access logs of non-MediaWiki services, which is a task for us, not for the MediaWiki team. The proof of concept above was tailored for MediaWiki logs, because said logs have a higher priority.
prometheus-es-exporter uses the same Elasticsearch data as everyone does currently (via Graylog). The difference is the implementation method: instead of requiring authentication in elasticsearch, which requires a license (which was not OSI-approved, but neither is their new licensing system, so sooner or later we'll have to revisit the usage of Elasticsearch either way), we run an exporter on the Graylog server that can fetch data directly from Elasticsearch, after which it transform the data into Prometheus format. Not only does the exporter avoid the need for the proprietary plug-in, but it also allows quick rendering of Grafana graphs, since it is less intensive to look up a few integers for a timeseries, than it is to search a 10G+ daily ingested data indice in Elasticsearch.
Since there are more uses than MediaWiki, should this be tagged as Technology-Team (MediaWiki) only?
I could work on adding the metrics to prometheus. Which metrics would you like to collect? (a counter of <this> in unit <that>)
https://github.com/miraheze/puppet/pull/2032 should complete this task I believe. It should also allow us to finalise T6979 if I did this one correctly.