挂载CephFS出现wrong fs type的问题

最近在我们的集群中遇到一个问题,在一台机器上,尝试挂载CephFS失败了,报错如下:

~]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
mount: wrong fs type, bad option, bad superblock on mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

因为之前遇到挂载CephFS时出现failed: No such process的问题,这两个问题看起来有点相似,但是确实有些不一样。因为有前车之鉴,所以顺着之前的思路,首先根据报错的提示,看看dmesg里有没有什么有用的信息:

[  11457.592011] FS-Cache: Loaded
[  11457.617265] Key type ceph registered
[  11457.617686] libceph: loaded (mon/osd proto 15/24)
[  11457.640554] FS-Cache: Netfs 'ceph' registered for caching
[  11457.640558] ceph: loaded (mds proto 32)
[  11457.640978] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

看起来和之前的错误很相似,都是出在域名解析上,同样的手法,把域名换成IP地址试试,发现可以正常挂载,那么问题肯定还是出在域名解析上了,只是这次和上次不是同一个错误罢了。

从strace的结果,确实也能看出一些不同,上次mount调用返回值是ESRCH,而这次的返回值是EINVAL

stat("/sbin/fs/mount.ceph", 0x7ffd47a8b8d0) = -1 ENOENT (No such file or directory)
mount("mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/", "/tmp/ceph", "ceph", MS_MGC_VAL, NULL) = -1 EINVAL (Invalid argument)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3

于是尝试把相关内核模块的debug模块打开,看看能不能输出一些有用的日志信息,但是这次发现有些不太一样了:

~]# lsmod |grep ceph
ceph                  327680  0
libceph               245760  1 ceph
libcrc32c              16384  1 libceph
fscache                65536  1 ceph

通过lsmod看到libceph模块不再依赖dns_resolver模块了?这是什么意思?需要说明的是,这台机器和之前还有一个区别,就是安装了centos-altarch提供的更新的内核。当前的内核版本是4.19.113-300.el7.x86_64,难道是新内耗的代码做了修改,不再依赖dns_resolver了?于是赶紧又找到了当前内核的代码,看看和之前是不是有什么不同:

/*
 * Parse an ip[:port] list into an addr array.  Use the default
 * monitor port if a port isn't specified.
 */
int ceph_parse_ips(const char *c, const char *end,
                   struct ceph_entity_addr *addr,
                   int max_count, int *count)
{
        int i, ret = -EINVAL;
        const char *p = c;

        dout("parse_ips on '%.*s'\n", (int)(end-c), c);
        for (i = 0; i < max_count; i++) {
                const char *ipend;
                struct sockaddr_storage *ss = &addr[i].in_addr;
                int port;
                char delim = ',';

                if (*p == '[') {
                        delim = ']';
                        p++;
                }
                // ceph_parse_ips调用ceph_parse_server_name函数
                ret = ceph_parse_server_name(p, end - p, ss, delim, &ipend);
                if (ret)
                        goto bad;
                ret = -EINVAL;
        ...
        }

bad:
        // 失败了会在dmesg输出错误信息,正是我们看到的格式
        pr_err("parse_ips bad ip '%.*s'\n", (int)(end - c), c);
        return ret;
}
EXPORT_SYMBOL(ceph_parse_ips);

再看ceph_parse_server_name这个函数,发现会继续调用ceph_dns_resolve_name

/*
 * Parse a server name (IP or hostname). If a valid IP address is not found
 * then try to extract a hostname to resolve using userspace DNS upcall.
 */
static int ceph_parse_server_name(const char *name, size_t namelen,
                        struct sockaddr_storage *ss, char delim, const char **ipend)
{
        int ret;

        ret = ceph_pton(name, namelen, ss, delim, ipend);
        if (ret)
                ret = ceph_dns_resolve_name(name, namelen, ss, delim, ipend);

        return ret;
}

最后看到ceph_dns_resolve_name这个函数,发现在内核里有两个实现:

/*
 * Extract hostname string and resolve using kernel DNS facility.
 */
#ifdef CONFIG_CEPH_LIB_USE_DNS_RESOLVER
static int ceph_dns_resolve_name(const char *name, size_t namelen,
                struct sockaddr_storage *ss, char delim, const char **ipend)
{
        const char *end, *delim_p;
        char *colon_p, *ip_addr = NULL;
        int ip_len, ret;

        /*
         * The end of the hostname occurs immediately preceding the delimiter or
         * the port marker (':') where the delimiter takes precedence.
         */
        delim_p = memchr(name, delim, namelen);
        colon_p = memchr(name, ':', namelen);

        if (delim_p && colon_p)
                end = delim_p < colon_p ? delim_p : colon_p;
        else if (!delim_p && colon_p)
                end = colon_p;
        else {
                end = delim_p;
                if (!end) /* case: hostname:/ */
                        end = name + namelen;
        }

        if (end <= name)
                return -EINVAL;

        /* do dns_resolve upcall */
        ip_len = dns_query(NULL, name, end - name, NULL, &ip_addr, NULL);
        if (ip_len > 0)
                ret = ceph_pton(ip_addr, ip_len, ss, -1, NULL);
        else
                ret = -ESRCH;

        kfree(ip_addr);

        *ipend = end;

        pr_info("resolve '%.*s' (ret=%d): %s\n", (int)(end - name), name,
                        ret, ret ? "failed" : ceph_pr_addr(ss));

        return ret;
}
#else
static inline int ceph_dns_resolve_name(const char *name, size_t namelen,
                struct sockaddr_storage *ss, char delim, const char **ipend)
{
        return -EINVAL;
}
#endif

如果存在CONFIG_CEPH_LIB_USE_DNS_RESOLVER这个宏,则会调用dns_resolver提供的相关接口查询域名DNS,如果不存在这个宏,则直接就是另一个实现,这个实现直接返回EINVAL。从代码来看ceph_dns_resolve_name的实现没有什么变化,那难道是没有打开CONFIG_CEPH_LIB_USE_DNS_RESOLVER这个选项?于是转到内核的config文件,看看编译内核时的选项是什么:

~]# grep CONFIG_CEPH_LIB_USE_DNS_RESOLVER /boot/config-4.19.113-300.el7.x86_64
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set

发现确实当前用的4.19.113-300.el7.x86_64这个内核没有开这个选项。再看看其他的内核呢?

~]# grep CONFIG_CEPH_LIB_USE_DNS_RESOLVER /boot/config-*
/boot/config-3.10.0-1062.el7.x86_64:CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
/boot/config-4.19.113-300.el7.x86_64:# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
/boot/config-4.4.235-1.el7.elrepo.x86_64:# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
/boot/config-5.8.6-2.el7.elrepo.x86_64:# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set

最后发现除了CentOS 7默认的3.10.0-1062.el7.x86_64内核之外,CentOS altarchelrepo里提供的内核,都没有开启这个选项,也就是说如果使用这些内核,那挂载CephFS的时候,就不能使用域名,只能使用IP地址指定Monitor的地址了。

有点不明白这些内核为什么不打开这个选项。