挂载CephFS时出现failed: No such process的问题

今天在我们的环境中遇到了一个比较诡异的问题,我们在一台虚拟机上想要挂载一个CephFS,但是出现了一个failed: No such process的诡异问题,具体表现如下:

]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process

很奇怪,为什么会出现No such process这个错误,刚开始以为是内核模块加载的问题,于是就看了一下内核的模块加载情况:

]# lsmod |grep ceph
ceph                  358802  0
libceph               306625  1 ceph
libcrc32c              12644  1 libceph
dns_resolver           13140  1 libceph

发现不是内核模块的问题,因为一方面模块已经加载了,另外如果是内核模块的问题的话,应该会提示unknown filesystem type,而不是上面的错误。
怎么办呢,尝试strace看下具体的系统调用情况:

]# strace -f mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
...
stat("/sbin/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
mount("mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/", "/tmp/data", "ceph", MS_MGC_VAL, NULL) = -1 ESRCH (No such process)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3a36e4000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
read(3, "", 4096)                       = 0
close(3)                                = 0
...

问题出在了mount系统调用上,确实是返回了一个ESRCH错误,而这个错误的message就是显示出来的No such process,搜索了一下,发现这个错误主要就是出现在kill调用,找不到进程,但是为什么会在这里也返回这个错误呢?准备去看看相关的代码,在看代码之前,又用dmesg看了一下内核的日志,发现了一些信息:

]# dmesg
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
...
[  126.629937] Key type dns_resolver registered
[  126.685004] Key type ceph registered
[  126.685817] libceph: loaded (mon/osd proto 15/24)
[  126.717441] ceph: loaded (mds proto 32)
[  126.718859] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
[  126.718862] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

从内核消息里看似乎发现了一些重要信息,看起来是域名解析有问题,但是确认过本地DNS配置,包括DNS的解析是没有问题的,那为什么会报这个错误?先试试直接用IP地址挂载看看是不是真的是解析问题吧:

]# mount -v -t ceph 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ /tmp/data
mount: 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ mounted on /tmp/data.

成功了!,那说明肯定就是域名解析的问题了,于是就可以有目的性的去看看相关的代码了,就用resolve 'mon1.ichenfu.com' (ret=-3): failed这个消息里的resolvefailed为关键字去搜索内核代码,发现相关的逻辑在net/ceph/messenger.c这个文件里:

static int ceph_dns_resolve_name(const char *name, size_t namelen,
		struct ceph_entity_addr *addr, char delim, const char **ipend)
{
	const char *end, *delim_p;
	char *colon_p, *ip_addr = NULL;
	int ip_len, ret;

	/*
	 * The end of the hostname occurs immediately preceding the delimiter or
	 * the port marker (':') where the delimiter takes precedence.
	 */
	delim_p = memchr(name, delim, namelen);
	colon_p = memchr(name, ':', namelen);

	if (delim_p && colon_p)
		end = delim_p < colon_p ? delim_p : colon_p;
	else if (!delim_p && colon_p)
		end = colon_p;
	else {
		end = delim_p;
		if (!end) /* case: hostname:/ */
			end = name + namelen;
	}

	if (end <= name)
		return -EINVAL;

	/* do dns_resolve upcall */
	// 调用dns_query,查询DNS
	ip_len = dns_query(current->nsproxy->net_ns,
			   NULL, name, end - name, NULL, &ip_addr, NULL, false);
	if (ip_len > 0)
		ret = ceph_pton(ip_addr, ip_len, addr, -1, NULL);
	else
		// 如果失败,则返回ESRCH,但是不知道dns_query的实际返回的ip_len是什么
		ret = -ESRCH;

	kfree(ip_addr);

	*ipend = end;

	pr_info("resolve '%.*s' (ret=%d): %s\n", (int)(end - name), name,
			ret, ret ? "failed" : ceph_pr_addr(addr));

	return ret;
}

返回ESRCH的源头应该就在这里了,但是信息还是不足,不知道当时dns_query的实际返回值是啥,那我们继续看看dns_query的实现,这个实现的位置在net/dns_resolver/dns_query.c,是一个dns_resolver模块:

int dns_query(struct net *net,
	      const char *type, const char *name, size_t namelen,
	      const char *options, char **_result, time64_t *_expiry,
	      bool invalidate)
{
	struct key *rkey;
	struct user_key_payload *upayload;
	const struct cred *saved_cred;
	size_t typelen, desclen;
	char *desc, *cp;
	int ret, len;

	// 进入函数的日志
	kenter("%s,%*.*s,%zu,%s",
	       type, (int)namelen, (int)namelen, name, namelen, options);

	if (!name || namelen == 0)
		return -EINVAL;

	/* construct the query key description as "[<type>:]<name>" */
	typelen = 0;
	desclen = 0;
	if (type) {
		typelen = strlen(type);
		if (typelen < 1)
			return -EINVAL;
		desclen += typelen + 1;
	}

	if (namelen < 3 || namelen > 255)
		return -EINVAL;
	desclen += namelen + 1;

	desc = kmalloc(desclen, GFP_KERNEL);
	if (!desc)
		return -ENOMEM;

	cp = desc;
	if (type) {
		memcpy(cp, type, typelen);
		cp += typelen;
		*cp++ = ':';
	}
	memcpy(cp, name, namelen);
	cp += namelen;
	*cp = '\0';

	if (!options)
		options = "";
	// 内核debug日志
	kdebug("call request_key(,%s,%s)", desc, options);

	/* make the upcall, using special credentials to prevent the use of
	 * add_key() to preinstall malicious redirections
	 */
	saved_cred = override_creds(dns_resolver_cache);
	rkey = request_key_net(&key_type_dns_resolver, desc, net, options);
	revert_creds(saved_cred);
	kfree(desc);
	if (IS_ERR(rkey)) {
		ret = PTR_ERR(rkey);
		goto out;
	}

	down_read(&rkey->sem);
	set_bit(KEY_FLAG_ROOT_CAN_INVAL, &rkey->flags);
	rkey->perm |= KEY_USR_VIEW;

	ret = key_validate(rkey);
	if (ret < 0)
		goto put;

	/* If the DNS server gave an error, return that to the caller */
	ret = PTR_ERR(rkey->payload.data[dns_key_error]);
	if (ret)
		goto put;

	upayload = user_key_payload_locked(rkey);
	len = upayload->datalen;

	if (_result) {
		ret = -ENOMEM;
		*_result = kmemdup_nul(upayload->data, len, GFP_KERNEL);
		if (!*_result)
			goto put;
	}

	if (_expiry)
		*_expiry = rkey->expiry;

	ret = len;
put:
	up_read(&rkey->sem);
	if (invalidate)
		key_invalidate(rkey);
	key_put(rkey);
out:
	// 结束函数的日志
	kleave(" = %d", ret);
	return ret;
}
EXPORT_SYMBOL(dns_query);

先不关心整个函数的实现逻辑,先看看函数里打印日志的地方,先想办法把调试日志打开,拿到更详细的信息,其中kenterkleave是两个宏:

/*
 * debug tracing
 */
extern unsigned int dns_resolver_debug;

#define	kdebug(FMT, ...)				\
do {							\
	if (unlikely(dns_resolver_debug))		\
		printk(KERN_DEBUG "[%-6.6s] "FMT"\n",	\
		       current->comm, ##__VA_ARGS__);	\
} while (0)

#define kenter(FMT, ...) kdebug("==> %s("FMT")", __func__, ##__VA_ARGS__)
#define kleave(FMT, ...) kdebug("<== %s()"FMT"", __func__, ##__VA_ARGS__)

也就是说如果dns_resolver_debug的值不为0,就会使用printk输出调试的日志,那dns_resolver_debug很有可能是最为模块加载的参数传递的,在net/dns_resolver/dns_key.c中有模块定义的参数信息:

...
MODULE_DESCRIPTION("DNS Resolver");
MODULE_AUTHOR("Wang Lei");
MODULE_LICENSE("GPL");

unsigned int dns_resolver_debug;
module_param_named(debug, dns_resolver_debug, uint, 0644);
MODULE_PARM_DESC(debug, "DNS Resolver debugging mask");
...

看到dns_resolver_debug是由debug这个参数控制的,那么就简单了,我们手动重新加载这个模块,并加上debug参数就行:

]# rmmod ceph           # 先卸载依赖ceph模块
]# rmmod libceph        # 卸载libceph模块
]# rmmod dns_resolver   # 卸载dns_resolver
]# modprobe dns_resolver debug=1    # 加载dns_resolver模块,参数debug=1
]# modprobe ceph    # 加载ceph模块

再尝试mount一下,并且看一下dmesg信息:

]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process
]# dmesg
...
[ 3056.185724] [mount ] ==> dns_query((null),mon1.ichenfu.com,16,(null))
[ 3056.185733] [mount ] call request_key(,mon1.ichenfu.com,)
[ 3056.185910] [mount ] <== dns_query() = -2
[ 3056.185916] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
[ 3056.185921] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

看到dns_query的返回是-2,用perror看一下这个errno的意义:OS error code 2: No such file or directory,没有找到文件,没找到什么文件呢?再看看这个模块的文档吧,文档在https://www.kernel.org/doc/Documentation/networking/dns_resolver.txt

========
OVERVIEW

The DNS resolver module provides a way for kernel services to make DNS queries
by way of requesting a key of key type dns_resolver. These queries are
upcalled to userspace through /sbin/request-key.

These routines must be supported by userspace tools dns.upcall, cifs.upcall and
request-key. It is under development and does not yet provide the full feature
set. The features it does support include:

(*) Implements the dns_resolver key_type to contact userspace.

It does not yet support the following AFS features:

(*) Dns query support for AFSDB resource record.

This code is extracted from the CIFS filesystem.

这个模块给内核提供一个查询DNS记录的方法,查询通过用户空间的/sbin/request-key进行,也就是说,这个模块依赖/sbin/request-key这个程序。
于是在机器上看了一下,果然,这个程序不存在。。又查询了一下,发现这个程序由keyutils这个包提供,yum install -y keyutils安装了这个包之后,问题解决了。
文档里提到有个配置文件/etc/request-key.conf,再看看这个配置文件里的配置:

...
#OP     TYPE    DESCRIPTION     CALLOUT INFO    PROGRAM ARG1 ARG2 ARG3 ...
#====== ======= =============== =============== ===============================
create  dns_resolver *          *               /sbin/key.dns_resolver %k
...

发现还依赖/sbin/key.dns_resolver,不过这个也是包含在keyutils包里的。
好吧,问题也总算是解决了。