Last Friday we set up a lan to lan VPN between two sites using a pair of Juniper firewalls connected to plus.net lines. We stress tested the link over the weekend and put it into production on Monday. Tuesday morning I got a call from the customer gushing with praise due to the improvements in speed and reliability compared to their old BT Openworld / Cisco 800 VPN setup.
Tuesday lunchtime the customer rings again to say the VPN is down and they had lost the internet as well. I could still reach both sites from my office without problem. Checking the obvious showed no problems. We managed to set up a limited sniff of the network and it seems that traffic was getting segmented somewhere. Reducing the tcp-max segment size to 1400 on the Junipers restored the internet connection. The VPN came back up for a few minutes but went down again. We managed to work around the problem, with the intention of setting a decent sniffer on the case in the evening when both of the internal networks would have calmed down. At 6:00pm when the customer left work the VPN was still down. We had a look at 8:00pm and the VPN was working again and has remained up since.
So the questions...
1. Can anyone suggest a reason other than tcp-mss that might have caused the failure?
2. Can anyone suggest a tcp-mss setting that will ensure a reliable VPN link over Plus.net?
3. Did anything happen to the routing through Dorset Tuesday lunchtime?
Any help appreciated. The customer relies on this link to do business. Maybe they shouldn't but until SDSL gets to Dorset it is all they have.