openwrt swconfig stack trace分析
接上一篇博客 watchdog bite导致系统重启问题的调试 ,打开调试功能后开始压力测试,在测试过程中发现DUT每隔2s打印一次以下异常信息
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 9465, name: swconfig
INFO: lockdep is turned off.
CPU: 2 PID: 9465 Comm: swconfig Tainted: P W 3.14.77 #1
[<c021561c>] (unwind_backtrace) from [<c0211d44>] (show_stack+0x18/0x1c)
[<c0211d44>] (show_stack) from [<c062ea98>] (dump_stack+0x9c/0xd4)
[<c062ea98>] (dump_stack) from [<c06312d8>] (mutex_lock_nested+0x2c/0x450)
[<c06312d8>] (mutex_lock_nested) from [<c0499df8>] (swconfig_get_dev+0x70/0x88)
[<c0499df8>] (swconfig_get_dev) from [<c049a808>] (swconfig_list_attrs+0x20/0x20c)
[<c049a808>] (swconfig_list_attrs) from [<c054fde8>] (genl_rcv_msg+0x260/0x2e0)
[<c054fde8>] (genl_rcv_msg) from [<c054f2d0>] (netlink_rcv_skb+0x60/0xbc)
[<c054f2d0>] (netlink_rcv_skb) from [<c054fb74>] (genl_rcv+0x28/0x3c)
[<c054fb74>] (genl_rcv) from [<c054ec94>] (netlink_unicast+0x11c/0x1d0)
[<c054ec94>] (netlink_unicast) from [<c054f114>] (netlink_sendmsg+0x30c/0x368)
[<c054f114>] (netlink_sendmsg) from [<c050fb78>] (sock_sendmsg+0x78/0x8c)
[<c050fb78>] (sock_sendmsg) from [<c0511310>] (___sys_sendmsg.part.3+0x184/0x20c)
[<c0511310>] (___sys_sendmsg.part.3) from [<c0512340>] (__sys_sendmsg+0x54/0x78)
[<c0512340>] (__sys_sendmsg) from [<c020df40>] (ret_fast_syscall+0x0/0x50)
问题分析
每隔2s是因为在detcable
模块的主循环中执行了以下代码,并且在while循环中每2s执行一次。
system("/sbin/swconfig dev switch0 show |grep \"link: port\" > /tmp/switch);
根据log首行提示kernel/locking/mutex.c:616
找到相关代码:
在内核代码中搜索might_sleep
找到其定义于include/linux/kernel.h
从说明信息可以看出,这些stack trace
提示swconfig
进程运行过程中进入内核态时可能进入不被允许的睡眠状态。而这些信息是在启用CONFIG_DEBUG_ATOMIC_SLEEP
后打印的,该CONFIG是在启用lockup相关调试功能时打开,所以想要停止打印可以禁用该CONFIG。
但是实际上这个问题是swconfig的内核驱动导致的,具体代码如下:
spinlock
自旋锁不允许临界区有触发sleep的函数,而mutex_lock
正好就是可能进入sleep状态的函数,所以才触发了这个stack trace
mutex_lock — acquire the mutex
Lock the mutex exclusively for this task. If the mutex is not available right now, it will sleep until it can get it.
为了解决这个问题,可以将加锁方式由spin_lock
改为mutex_lock
,这个解决方案是组长google
来的,我这是拾人牙慧了,哈哈哈。
index 78569a9..e8a6847 100644 (file)
--- a/target/linux/generic/files/drivers/net/phy/swconfig.c
+++ b/target/linux/generic/files/drivers/net/phy/swconfig.c
@@ -36,7 +36,7 @@ MODULE_LICENSE("GPL");
static int swdev_id;
static struct list_head swdevs;
-static DEFINE_SPINLOCK(swdevs_lock);
+static DEFINE_MUTEX(swdevs_lock);
struct swconfig_callback;
struct swconfig_callback {
@@ -296,13 +296,13 @@ static struct nla_policy link_policy[SWITCH_LINK_ATTR_MAX] = {
static inline void
swconfig_lock(void)
{
- spin_lock(&swdevs_lock);
+ mutex_lock(&swdevs_lock);
}
static inline void
swconfig_unlock(void)
{
- spin_unlock(&swdevs_lock);
+ mutex_unlock(&swdevs_lock);
}
static struct switch_dev *
加入patch后完美解决问题。
参考文献
版权声明:本博客所有文章除特殊声明外,均采用 CC BY-NC 4.0 许可协议。转载请注明出处 litreily的博客!