cancel
Showing results for 
Search instead for 
Did you mean: 

Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

greygit
Rising Star
Posts: 196
Thanks: 20
Fixes: 1
Registered: ‎13-11-2021

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

That's "whooshed" me.

 

The only commonality with what you've posted and what it here is an android smartphone.

 

I think my original comment on this subject became entangled/crossed.

 

Although I'm happy to sit and watch this. It is certainly better/more intriguing that much of what is on broadcast (and streaming)TV!

 

Smiley

 

(P.S. I still think Hub2 has undiscovered problems, and there may yet be a connection).

 

P.P.S.

(tried to post this and got an error code 7509B6FE.)

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Awesome, thanks bobpullen.

Yeh, mine uses a fair bit of CPU too, running in a smallish VM (they're making a second gen server called dendrite which is claimed to be less resource intensive than synapse). I should look into switching to postgresql as I never did get around to it.

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,927
Thanks: 5,014
Fixes: 317
Registered: ‎04-04-2007

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

@amalon - checking again this morning, I've a little more work to do it seems. I have my Unbound DNS requests running through DNSMasq/Pihole and it seems simply opening the element client causes Pihole to start rejecting DNS requests due to the default rate limits (1000 DNS requests per minute per client).

That could be clouding my symptoms, so I'll up the limit later and carry out some more testing.

I've only joined two Matrix groups and it generates thousands of DNS requests in an extremely short period of time! I understand why it's doing this - but it's somewhat questionable when you consider the fact that an extremely popular ad-blocking solution has out of the box config that cripples it. The maintainers of Pihole clearly don't consider it 'typical' for a single client to be generating this much DNS traffic Wink

That said, I guess it doesn't detract from the fact that it 'works' with other hubs, and not the Hub Two. 

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Yeh. There are rate limiting options in synapse, such as the read receipt rate limit which can slow it down a bit when you just look at a room, but any sending of events still happens at top speed so it doesn't really work around it.

I did wonder if it was just a case of having a randomised TTL for each federated domain, but then if you leave it and come back to it, it wants to do them all at once anyway. Unbound's cache-min-ttl helps a bit for domains with unreasonably short TTLs.

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,927
Thanks: 5,014
Fixes: 317
Registered: ‎04-04-2007

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Do we know yet if it happens without Unbound in the equation? i.e. with the Hub dishing out 192.168.1.254 or a public DNS server addresses directly to the connected clients over DHCP?

If not, it's another scenario that will need testing.

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

I have systemd-resolved running, and had 127.0.0.1 (unbound) set in /etc/resolv.conf.

When I set /etc/resolv.conf to point directly at plusnet's servers, I don't appear to have the same issues. If I switch back to unbound, and disable DNSSEC in unbound and compare packet captures, the only notable different I can see in the packets are:

DNS -> Additional Records -> Z -> DO bit

  systemd-resolved: DO=0 (cannot handle DNSSEC security RRs)

  unbound: DO=1 (Accepts DNSSEC security RRs)

and when I use plusnet's dns servers directly and enable DNSSEC=true in /etc/systemd/resolved.conf I don't see the same issues, though there is some background packet loss detectable with ping, just not of the "dead for 5 minutes" variety.

The more I look at this, the more I think unbound is getting rate limited by plusnet's DNS server, and then its retrying every 50ms, and then building up a backlog. I'm guessing TCP rate limiting works differently. I can't figure out why the router would be unable to cope with a whole load of UDP packets though. Even if its sending 1000s of 200byte packets per second its nothing on a gigabit ethernet.

And the more I look at it the more I think I should simply use systemd-resolved for this. Unbound is overkill (I just use it for something else so it was conveniently familiar).

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

*sigh* I take it back. systemd-resolved with DNSSEC on can certainly take out the router for a few minutes in addition to a lot of packet loss. Back to unbound with TCP!

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,927
Thanks: 5,014
Fixes: 317
Registered: ‎04-04-2007

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Just so I'm clear: configuring the client running Element to go direct to the hub, Plusnet or Google DNS resolvers, either with or without DNSSEC configured, also kills the network? Correct?

The more I look at this, the more I think unbound is getting rate limited by plusnet's DNS server, and then its retrying every 50ms, and then building up a backlog. I'm guessing TCP rate limiting works differently. I can't figure out why the router would be unable to cope with a whole load of UDP packets though. Even if its sending 1000s of 200byte packets per second its nothing on a gigabit ethernet.

That's a reasonable assumption, however it wouldn't explain my ability to replicate the problem when my DNS traffic goes nowhere near Plusnet's servers. If solely an external DNS rate limiting issue, then it would also be odd for the hub interface to be non-responsive for the duration of the problem (although I think that's what you're saying here anyway).

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Sorry I was vague before. I haven't been changing the DNS setup on the client end (since all that needs to resolve is the synapse server domain), only on the synapse homeserver. So enabling DNSSEC in systemd-resolved on the server and pointing server's system DNS to plusnet in /etc/resolv.conf and network configuration (completely ignoring Unbound on localhost 127.0.0.1) also triggered it. In other words it isn't unbound specific but DNSSEC either triggered or increased the number of requests enough to trigger it.

Yeh that is what I was saying. My thinking was that the build up of a DNS request backlog increases UDP traffic because of retries enough that it then triggers whatever bug in router makes it unresponsive (and presumably for TCP the DNS server ACKs each request and the client waits patiently for the response in the knowledge the server's working on it, rather than UDP where it keeps retrying until the server has resolved it and responded). I can imagine if the hub is trying to keep track of DNS requests for some reason (e.g. hijacking them) it might accumulate state for DNS requests without responses.

The reason I thought it might be rate limited was because of a 10 second packet capture after the DNS request rate had already built up, filtered for a specific domain (which in that window got lots of retries), where it was sending several retries at 50ms intervals (along with retries of other records A, AAAA etc) before the DNS server finally responded to the last one, then the client would do another bunch of retries for the remaining records until the next reply.

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,927
Thanks: 5,014
Fixes: 317
Registered: ‎04-04-2007

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

No need to apologise, it was me that wasn't clear. What I meant to ask was...

Just so I'm clear: configuring the client running Synapse to go direct to the hub, Plusnet or Google DNS resolvers, either with or without DNSSEC configured, also kills the network? Correct?


Note the distinction between the three options, Plusnet, Google or the hub. They will return different DNS response sizes and I'm trying to establish if that's playing a part (enabling DNSSEC will also increase response size). So it would be useful to know, what the situation is when using systemd-resolved to do: -

  1. Plusnet DNS servers 212.159.6.10/212.159.6.9 without DNSSEC
  2. Plusnet DNS servers 212.159.6.10/212.159.6.9 with DNSSEC
  3. Google DNS servers 8.8.8.8/8.8.4.4 without DNSSEC
  4. Google DNS servers 8.8.8.8/8.8.4.4 with DNSSEC
  5. Hub forwarder 192.168.1.254 without DNSSEC
  6. Hub forwarder 192.168.1.254 with DNSSEC

It also helps simplify the steps for replication if/when I push this over to the hub development teams for further investigation. Anything we can do to make their lives easier will result in a quicker grasp of the situation Wink

We already know that going > Unbound > Root servers/upstream DNS resolver either with or without DNSSEC seems to kill things (pending further testing on my setup), and that using explicit TCP seems to negate the issue.

I've had to swap my hub out to help with something else for a few days so I might not be able to spend much more time on it at my side until after the Jubilee.

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵

amalon
Dabbler
Posts: 12
Thanks: 2
Registered: ‎24-05-2022

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Right, systemd-resolved (no unbound), with:

  1. Plusnet DNS servers 212.159.6.10/212.159.6.9 without DNSSEC:
    PASS - haven't succeeded in reproducing any issues so far, will test again tomorrow evening
  2. Plusnet DNS servers 212.159.6.10/212.159.6.9 with DNSSEC
    FAIL - router goes unresponsive
  3. Google DNS servers 8.8.8.8/8.8.4.4 without DNSSEC
    will test tomorrow
    evening
  4. Google DNS servers 8.8.8.8/8.8.4.4 with DNSSEC
    FAIL - router goes unresponsive
  5. Hub forwarder 192.168.1.254 without DNSSEC
    will test tomorrow
    evening
  6. Hub forwarder 192.168.1.254 with DNSSEC
    FAIL - router has gone unresponsive, but only for short periods so far (e.g. 12 seconds)

Something else I have noticed is that when the router goes unresponsive, my server is still getting DNS responses (e.g. from google dns), suggesting perhaps a prioritization issue.

No worries about other demands. I know how it is. I have a workaround so am just glad to be able to help get the issue squashed eventually.

bobpullen
Community Gaffer
Community Gaffer
Posts: 16,927
Thanks: 5,014
Fixes: 317
Registered: ‎04-04-2007

Re: Plusnet Hub 2 becomes unresponsive with Matrix Homeserver

Thanks @amalon - appreciate your efforts! Smiley

Bob Pullen
Plusnet Product Team
If I've been helpful then please give thanks ⤵