Page MenuHomeMiraheze

4 Sept PowerDNS Outage across on some servers
Closed, ResolvedPublic

Event Timeline

RhinosF1 triaged this task as Unbreak Now! priority.
Void removed Void as the assignee of this task.Wed, Sep 4, 07:49
Void subscribed.

Tried rebooting some servers, but that doesn't seem to be a lasting solution. Not sure what the exact problem is. I suspect something network related, but all I really have to go off of is:

Sep  4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" error="Too much time waiting for .|NS, timeouts: 5, throttles: 0, queries: 6, 7508msec" subsystem="housekeeping" level="0" prio="Error" tid="0" ts="1725428569.875" exception="ImmediateServFailException"
Sep  4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" subsystem="housekeeping" level="0" prio="Warning" tid="0" ts="1725428569.875" rcode="-1"

Finished rebooting affected servers and we seem to be all up now. Barring a proper post-mortem, this is hopefully resolved. cc @OrangeStar @Universal_Omega

Tentatively resolved again, waiting for confirmation from the datacenter on what exactly happened. They did say it was resolved, but I would like to continue monitoring for a bit to ensure services don't go back down.

RhinosF1 lowered the priority of this task from Unbreak Now! to High.Thu, Sep 5, 07:25

@Void yep, this is resolved and we have an explanation on what happened from FiberState.

It was related to neighbor discovery on one of our distribution switches. We've made a change to the VLAN that should take care of it.

Closing this.

In T12538#250747, @Void wrote:

Tried rebooting some servers, but that doesn't seem to be a lasting solution. Not sure what the exact problem is. I suspect something network related, but all I really have to go off of is:

Sep  4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" error="Too much time waiting for .|NS, timeouts: 5, throttles: 0, queries: 6, 7508msec" subsystem="housekeeping" level="0" prio="Error" tid="0" ts="1725428569.875" exception="ImmediateServFailException"
Sep  4 05:42:49 swiftproxy171 pdns-recursor[608]: msg="Failed to update . records" subsystem="housekeeping" level="0" prio="Warning" tid="0" ts="1725428569.875" rcode="-1"

This is likely due to those servers losing IPv6 connectivity and thus being unable to query the DNS.