Welcome Guest, Not a member yet? Create Account  




BGP Session disconnects - SLOW THREAD: task smux_connect

#1
(This post was last modified: 04/07/2017, 14:09 by Phinitris.)

Hello,
I'm currently testing VyOS and the BGP session drops once every few hours because the BGP process freezes and therefore doesn't send keepalive messages in 40s intervals.

Keep Alive messages:

Code:
Jul  4 11:51:06 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:51:06 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE
Jul  4 11:51:46 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:51:46 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE
Jul  4 11:52:26 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:52:26 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:53:30 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:53:30 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE
Jul  4 11:54:23 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:54:23 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE
Jul  4 11:55:03 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:55:03 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE
Jul  4 11:55:53 vyos bgpd[3791]: $IPv4_PEER sending KEEPALIVE
Jul  4 11:55:53 vyos bgpd[3791]: $IPv6_PEER sending KEEPALIVE

Code:
Jul  4 11:45:16 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 15029ms (cpu time 0ms)
Jul  4 11:45:54 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 7010ms (cpu time 0ms)
Jul  4 11:46:33 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7009ms (cpu time 0ms)
Jul  4 11:47:10 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7009ms (cpu time 0ms)
Jul  4 11:47:47 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7020ms (cpu time 0ms)
Jul  4 11:47:53 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 15019ms (cpu time 0ms)
Jul  4 11:48:31 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 7008ms (cpu time 0ms)
Jul  4 11:50:16 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 7006ms (cpu time 0ms)
Jul  4 11:50:20 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 7007ms (cpu time 0ms)
Jul  4 11:51:06 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15019ms (cpu time 0ms)
Jul  4 11:51:07 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 7009ms (cpu time 0ms)
Jul  4 11:51:14 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7010ms (cpu time 0ms)
Jul  4 11:51:45 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 15030ms (cpu time 0ms)
Jul  4 11:52:04 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 7007ms (cpu time 0ms)
Jul  4 11:52:26 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 31050ms (cpu time 0ms)
Jul  4 11:53:30 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 63099ms (cpu time 0ms)
Jul  4 11:53:41 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 127260ms (cpu time 0ms)
Jul  4 11:53:47 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 7001ms (cpu time 0ms)
Jul  4 11:53:50 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 63099ms (cpu time 0ms)
Jul  4 11:54:23 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15023ms (cpu time 0ms)
Jul  4 11:54:42 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 31090ms (cpu time 0ms)
Jul  4 11:54:48 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15038ms (cpu time 0ms)
Jul  4 11:55:02 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 31080ms (cpu time 0ms)
Jul  4 11:55:23 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7010ms (cpu time 0ms)
Jul  4 11:55:53 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15019ms (cpu time 0ms)
Jul  4 11:56:04 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 31090ms (cpu time 0ms)
Jul  4 11:56:13 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7009ms (cpu time 0ms)
Jul  4 11:56:21 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 7009ms (cpu time 0ms)
Jul  4 11:57:05 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 31090ms (cpu time 0ms)
Jul  4 11:57:51 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 15029ms (cpu time 0ms)
Jul  4 11:57:54 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15025ms (cpu time 0ms)
Jul  4 11:58:28 vyos ripd[3780]: SLOW THREAD: task smux_connect (7f25dc700440) ran for 7009ms (cpu time 0ms)
Jul  4 11:58:43 vyos bgpd[3791]: SLOW THREAD: task smux_connect (7fbceda2c440) ran for 15036ms (cpu time 0ms)
Jul  4 11:59:05 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 7009ms (cpu time 0ms)
Jul  4 11:59:40 vyos ospfd[3785]: SLOW THREAD: task smux_connect (7f19839e6440) ran for 15019ms (cpu time 0ms)

Hardware Info:
Code:
----------------
Vyatta Version and Package Changes
----------------
Version:      VyOS 1.1.7
Description:  VyOS 1.1.7 (helium)
Copyright:    2016 VyOS maintainers and contributors
Built by:     maintainers@vyos.net
Built on:     Wed Feb 17 09:57:31 UTC 2016
Build ID:     1602170957-4459750
System type:  x86 64-bit
Boot via:     image
HW model:     ProLiant DL360e Gen8
HW S/N:       CZJ3500BFQ
HW UUID:      34373030-3635-435A-4A33-353030424651
Uptime:       12:01:36 up  8:11,  2 users,  load average: 0.00, 0.02, 0.05

BGP Configuration:
Code:
bgp $ASNUM {
   address-family {
       ipv6-unicast {
           network $IPv6_/29 {
           }
       }
   }
   neighbor $IPv4_PEER {
       remote-as $REM_ASNUM
       update-source $IPv4_IP
   }
   neighbor $IPv6_PEER {
       address-family {
           ipv6-unicast {
           }
       }
       remote-as $REM_ASNUM
       update-source $IPv6_IP
   }
   network $NET1_/24 {
   }
   network $NET2_/24 {
   }
   network $NET3_/24 {
   }
   network $NET4_/24 {
   }
   network $NET5_/24 {
   }
   network $NET6_/24 {
   }
   network $NET7_/22 {
   }
   parameters {
       router-id $IPv4_IP
   }
}

top:
Code:
pu0  :  0.1%us,  0.0%sy,  0.0%ni, 91.7%id,  0.0%wa,  0.0%hi,  8.2%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.1%us,  0.1%sy,  0.0%ni, 94.6%id,  0.0%wa,  0.0%hi,  5.1%si,  0.0%st
Cpu3  :  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  1.4%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.1%us,  0.1%sy,  0.0%ni, 98.1%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Cpu9  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.1%us,  0.0%sy,  0.0%ni, 98.1%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us,  0.0%sy,  0.0%ni, 98.4%id,  0.0%wa,  0.0%hi,  1.6%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.1%us,  0.0%sy,  0.0%ni, 98.1%id,  0.0%wa,  0.0%hi,  1.8%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

vmstat:
Code:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
0  0      0 7546832  83080 108264    0    0     0     0    1   18  0  1 99  0
0  0      0 7547668  83080 108264    0    0     0     8 137149  372  0  1 99  0
0  0      0 7550036  83080 108264    0    0     0     0 125155  266  0  0 100  0
0  0      0 7548612  83080 108264    0    0     0     0 126945  314  0  0 100  0
0  0      0 7545940  83080 108264    0    0     0     0 119227  307  0  1 99  0
0  0      0 7551220  83080 108264    0    0     0     0 117026  304  0  0 100  0
0  0      0 7548992  83080 108264    0    0     0     0 120597  389  0  1 99  0
0  0      0 7549240  83080 108264    0    0     0     0 136031  206  0  0 100  0
0  0      0 7550232  83088 108256    0    0     0    32 148895  315  0  0 100  0
0  0      0 7547544  83088 108264    0    0     0     0 147729  339  0  1 99  0
0  0      0 7546764  83088 108264    0    0     0    20 156677  362  0  1 99  0
0  0      0 7546668  83088 108264    0    0     0     0 159350  263  0  0 100  0
0  0      0 7545548  83088 108264    0    0     0     0 168053  341  0  1 99  0
0  0      0 7547564  83088 108264    0    0     0     0 173445  280  0  1 99  0
1  0      0 7547532  83088 108264    0    0     0     0 173424  232  0  1 99  0
0  0      0 7547788  83096 108256    0    0     0    60 171006  327  0  1 99  0
0  0      0 7547276  83096 108264    0    0     0     0 173452  350  0  1 99  0
0  0      0 7547340  83096 108268    0    0     0     0 167156  335  0  1 99  0
0  0      0 7545996  83096 108268    0    0     0     0 171909  307  0  1 99  0
0  0      0 7546348  83096 108268    0    0     0     0 163887  295  0  1 99  0
0  0      0 7547116  83104 108260    0    0     0    20 164847  376  0  1 99  0
0  0      0 7546412  83104 108260    0    0     0     8 159988  302  0  1 99  0
0  0      0 7545644  83104 108268    0    0     0     0 153223  339  0  0 100  0
0  0      0 7548268  83104 108268    0    0     0     0 163509  284  0  0 100  0
0  0      0 7548332  83104 108268    0    0     0     0 159174  297  0  1 99  0
0  0      0 7547308  83104 108268    0    0     0     0 165399  286  0  1 99  0
0  0      0 7548204  83104 108268    0    0     0     0 166225  237  0  1 99  0


I have even configured the remote router to only send the default route 0.0.0.0/0 but even then this issue occurs. The whole routing table has only 10 entries.
HP Server has Hyperthreading enabled and Power Mode set to High Static Performance.
8 GB RAM installed and CPU is 2x Intel® Xeon® CPU E5-2430 v2 @ 2.50GHz.

Traffic: 100k pps in / 100k pps out, 150 Mbit/s in, 200 Mbit/s out.

Any Ideas why this issue occurs?
Reply

#2

Hello,
most likely it's some obscure bug in quagga,
I found similar reports on ERL devices
============
Open source routing platform for everyone!
Donate to VyOS project!

Looking for professional services?
Get them here!
Reply





Users browsing this thread:
1 Guest(s)