Router randomly stopping traffic and needs to reboot

Hey Guys,

I have a location that is randomly going down and I can’t figure out why. It will be fine for months at a time, or go down a few times in a week and then stops messing up again. Site has an edge4 in use. What happens is randomly the lan port on the router stops letting traffic through. When this happens I can still vpn into the router pptp (I know) log into the gui and reboot and everything is fine. Here is what I have done so far trying to get it to stop. I have replaced the router, no change. Rebooted switches, disabled POE on uplink to router, changed patch cords. Changed outlets everything is plugged into. Today or tomorrow I am going to update the firmware on the switches, netgear GS752TPV2 in use. I think the edge4 is updated but will check it too. This is a multi tenant facility all sharing the same internet so it’s one of those places that I can only make changes usually in the evening. Any ideas are appreciated.

Hrm… have you checked the logs on the router or done any traffic captures?

this sounds internal, rogue device, spanning tree/routing loop, duplicate ip. Did you replace router like for like edge4 to edge4, have you tried something like an Edgerouter X? Maybe an autonegotiation issue between switch and router. Looks like that is a smart switch, so try disabling switch ports and see if router comes good. this would have to be done onsite though.

Hey @williehowe,
No I have not but looking at them now. I have done captures but never when this happens. It only happens when I am not available, always! :slight_smile:

Hey @datalinqsolutions , thanks for the reply and suggestions. I replaced the router with same, edge4 and loaded same config. Lots of firewall rules and vlans so was not ready to redo everything from scratch, though that may be what is needed. I have been looking for rogue devices, scanning all vlans just to see if anything weird going on or something eating up a lot of data. There is a GS phone system, I have remote wave users.

Around April of last year, I upgraded all the netgear switches to the same but POE version. Set all vlan settings the same as the old and all worked fine. Installed a ucm6304 and lots of phones. We switched from Comcast to Windstream fiber around the same time. We also updated a lot of the unifi aps to wifi6 ones. This was all done around mid may 2023. I think the disconnect issues started around July.

STP/broadcast storm is the route I was going down too.

get a packet squirrel and leave it inline to see when it happens.

Thanks @jhippl Never used one before but will check it out. Does it affect bandwidth at all?

This has crossed my mind. I just feel like it would happen more often. I am working on getting more logs to review.

the gen 1 one was only 100meg, i am not sure on the new one but it is a nice tool to have around for issues like this. toss it inline and put a large usb drive and have constant pcaps

Thanks @jhippl
I did go ahead and order one as I can see it as a useful tool.

1 Like

Got the packet squirrel in. Just testing tcpdump right now. Finally got it to do transparent passthrough. I keep getting the ports confused, time to put a label on it. The Mark II still only has 10/100 ethernet, that sux but I can live with it.

Yeah its a nice tool just to keep in your back pocket and just gather a bunch of pcaps