容器网络接口(CNI) Plugin SPEC介绍

本文主要介绍一下容器网络接口(CNI) 的SPEC,主要参考SPEC v0.3.1,以及目前最新的SPEC,目前新SPEC针对v0.3.1版本改动不是很大,特别是相关接口输入输出方面,因此可以看作是一样的。

总览

所有的CNI Plugin,都必须实现为可以被容器管理系统(如rtk、Kubernetes等)调用的可执行文件。
CNI插件负责将网络接口插入容器网络命名空间(例如veth pair的其中一端),并在主机上进行任何必要的改变(例如将veth pair的另一端连接到网桥)。然后应该将IP分配给接口,并通过调用适当的IPAM插件将与“IP地址管理”部分一致的IP地址分配给该网络接口,并设置好对应的路由。

参数

所有的CNI Plugin必须实现以下操作:

  • 添加一个容器到网络

    • Parameters:
      • Version. The version of CNI spec that the caller is using (container management system or the invoking plugin).
      • Container ID. A unique plaintext identifier for a container, allocated by the runtime. Must not be empty.
      • Network namespace path. This represents the path to the network namespace to be added, i.e. /proc/[pid]/ns/net or a bind-mount/link to it.
      • Network configuration. This is a JSON document describing a network to which a container can be joined. The schema is described below.
      • Extra arguments. This provides an alternative mechanism to allow simple configuration of CNI plugins on a per-container basis.
      • Name of the interface inside the container. This is the name that should be assigned to the interface created inside the container (network namespace); consequently it must comply with the standard Linux restrictions on interface names.
    • Result:
      • Interfaces list. Depending on the plugin, this can include the sandbox (eg, container or hypervisor) interface name and/or the host interface name, the hardware addresses of each interface, and details about the sandbox (if any) the interface is in.
      • IP configuration assigned to each interface. The IPv4 and/or IPv6 addresses, gateways, and routes assigned to sandbox and/or host interfaces.
      • DNS information. Dictionary that includes DNS information for nameservers, domain, search domains and options.
  • 从网络中删除一个容器

    • Parameters:
      • Version. The version of CNI spec that the caller is using (container management system or the invoking plugin).
      • Container ID, as defined above.
      • Network namespace path, as defined above.
      • Network configuration, as defined above.
      • Extra arguments, as defined above.
      • Name of the interface inside the container, as defined above.
    • All parameters should be the same as those passed to the corresponding add operation.
    • A delete operation should release all resources held by the supplied containerid in the configured network.
  • 报告插件支持的CNI版本

    • Parameters: NONE.

    • Result: information about the CNI spec versions supported by the plugin

      {
        "cniVersion": "0.3.1", // the version of the CNI spec in use for this output
        "supportedVersions": [ "0.1.0", "0.2.0", "0.3.0", "0.3.1" ] // the list of CNI spec versions that this plugin supports
      }

容器运行环境需要根据网络种类的名称去对应的目录列表中寻找同名的可执行文件,一旦找到,就需要在执行时传入以下的环境变量:

  • CNI_COMMAND: indicates the desired operation; ADD, DEL or VERSION.
  • CNI_CONTAINERID: Container ID
  • CNI_NETNS: Path to network namespace file
  • CNI_IFNAME: Interface name to set up; if the plugin is unable to use this interface name it must return an error
  • CNI_ARGS: Extra arguments passed in by the user at invocation time. Alphanumeric key-value pairs separated by semicolons; for example, “FOO=BAR;ABC=123”
  • CNI_PATH: List of paths to search for CNI plugin executables. Paths are separated by an OS-specific list separator; for example ‘:’ on Linux and ‘;’ on Windows

JSON格式的网络配置通过stdin输入,这意味着不会和磁盘或者文件绑定,在调用之间也可以进行修改。

结果

IPAM Plugin应该返回一个Result结构,具体参考下文。

所有的Plugin在调用成功时应该返回code 0,并且当操作为ADD时,需要将下面的JSON输出到stdout中。 其中ipsdns应该和IPAM plugin的输出相同,除此之外Plugin需要将IPAM Plugin返回的IP地址设置到对应的接口上,因为IPAM Plugin并不知道具体的接口是什么。

{
  "cniVersion": "0.3.1",
  "interfaces": [                                            (this key omitted by IPAM plugins)
      {
          "name": "<name>",
          "mac": "<MAC address>",                            (required if L2 addresses are meaningful)
          "sandbox": "<netns path or hypervisor identifier>" (required for container/hypervisor interfaces, empty/omitted for host interfaces)
      }
  ],
  "ips": [
      {
          "version": "<4-or-6>",
          "address": "<ip-and-prefix-in-CIDR>",
          "gateway": "<ip-address-of-the-gateway>",          (optional)
          "interface": <numeric index into 'interfaces' list>
      },
      ...
  ],
  "routes": [                                                (optional)
      {
          "dst": "<ip-and-prefix-in-cidr>",
          "gw": "<ip-of-next-hop>"                           (optional)
      },
      ...
  ]
  "dns": {
    "nameservers": <list-of-nameservers>                     (optional)
    "domain": <name-of-local-domain>                         (optional)
    "search": <list-of-additional-search-domains>            (optional)
    "options": <list-of-options>                             (optional)
  }
}

cniVersion 描述了Plugin使用的Semantic Version 2.0描述的CNI SPEC版本。
interfaces 描述了Plugin创建的网络接口
如果环境变量CNI_IFNAME存在,则PLugin必须使用其指定的接口名称,如果无法设置,则返回错误。

  • mac (string): 接口的MAC地址。如果MAC地址是没有意义的,可以不用指定。
  • sandbox (string): 基于容器/namespace的运行环境需要返回网络命名空间的全路径。基于虚拟机的,需要返回虚拟化容器的唯一ID。针对在沙箱或者虚拟机中创建的接口,必须指定此项目。

ips是所有IP的列表
dns是DNS配置,以上两个结构参考下面的信息。

规范并不指定Plugin对DNS信息的处理方式,比如生成一个/etc/resolv.conf或者使用一个DNS转发器都是可以的。

如果出现错误,则必须返回一个非0的返回值,并且将如下的JSON输出到stdout。

{
  "cniVersion": "0.3.1",
  "code": <numeric-error-code>,
  "msg": <short-error-message>,
  "details": <long-error-message> (optional)
}

cniVersion 描述了Plugin使用的Semantic Version 2.0描述的CNI SPEC版本。
0-99的错误码保留为常用的错误,具体见下文,100以上的可以作为Plugin自定义的错误输出。

需要说明的是,stderr可以作为日志等非结构化内容的输出。

网络配置

网络配置以JSON格式描述。具体的配置可以存在磁盘中,或者由运行环境生成. 以下的字段是通用的,且具有以下含义:

  • cniVersion (string): Semantic Version 2.0 of CNI specification to which this configuration conforms.
  • name (string): Network name. This should be unique across all containers on the host (or other administrative domain).
  • type (string): Refers to the filename of the CNI plugin executable.
  • args (dictionary): Optional additional arguments provided by the container runtime. For example a dictionary of labels could be passed to CNI plugins by adding them to a labels field under args.
  • ipMasq (boolean): Optional (if supported by the plugin). Set up an IP masquerade on the host for this network. This is necessary if the host will act as a gateway to subnets that are not able to route to the IP assigned to the container.
  • ipam: Dictionary with IPAM specific values:
    • type (string): Refers to the filename of the IPAM plugin executable.
  • dns: Dictionary with DNS specific values:
    • nameservers (list of strings): list of a priority-ordered list of DNS nameservers that this network is aware of. Each entry in the list is a string containing either an IPv4 or an IPv6 address.
    • domain (string): the local domain used for short hostname lookups.
    • search (list of strings): list of priority ordered search domains for short hostname lookups. Will be preferred over domain by most resolvers.
    • options (list of strings): list of options that can be passed to the resolver

Plugin可以定义而外的附加字段,并且在接受到不认识的字段后可以返回错误,唯一的例外是args字段,如果Plugin不认识其中的某些字段,则应该直接忽略。

网络配置例子

{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "type": "bridge",
  // type (plugin) specific
  "bridge": "cni0",
  "ipam": {
    "type": "host-local",
    // ipam specific
    "subnet": "10.1.0.0/16",
    "gateway": "10.1.0.1"
  },
  "dns": {
    "nameservers": [ "10.1.0.1" ]
  }
}
{
  "cniVersion": "0.3.1",
  "name": "pci",
  "type": "ovs",
  // type (plugin) specific
  "bridge": "ovs0",
  "vxlanID": 42,
  "ipam": {
    "type": "dhcp",
    "routes": [ { "dst": "10.3.0.0/16" }, { "dst": "10.4.0.0/16" } ]
  }
  // args may be ignored by plugins
  "args": {
    "labels" : {
        "appVersion" : "1.0"
    }
  }
}
{
  "cniVersion": "0.3.1",
  "name": "wan",
  "type": "macvlan",
  // ipam specific
  "ipam": {
    "type": "dhcp",
    "routes": [ { "dst": "10.0.0.0/8", "gw": "10.0.0.1" } ]
  },
  "dns": {
    "nameservers": [ "10.0.0.1" ]
  }
}

网络配置列表

网络配置列表提供了针对一个容器按顺序运行多个CNI plugin的途径,将每个Plugin的结果传递到下一个Plugin。这个列表包含了几个字段,以及一个或多个上述CNI网络配置的列表。

网络配置列表以JSON格式描述。具体的配置可以存在磁盘中,或者由运行环境生成. 以下的字段是通用的,且具有以下下含义:

  • cniVersion (string): Semantic Version 2.0 of CNI specification to which this configuration list and all the individual configurations conform.
  • name (string): Network name. This should be unique across all containers on the host (or other administrative domain).
  • plugins (list): A list of standard CNI network configuration dictionaries (see above).

当执行一个plugin列表时, 运行环境必须将列表里各个网络配置中的namecniVersion字段替换成列表本身配置的namecniVersion字段,这样是为了避免因为列表中各项配置不同导致的版本冲突。
运行环境也应该将capability相关的配置作为上层配置runtimeConfig中的capabilities配置传递给支持特定能力的Plugin。

针对ADD操作, 运行环境必须在调用第一个Plugin之后添加一个prevResult字段到配置中,该字段必须是之前一个Plugin的输出,且必须是JSON格式。

运行环境也必须保持执行每个Plugin时的环境变量一致。

针对删除操作,运行环境必须按照逆序执行每个Plugin。

网络配置列表的错误处理

当执行过程中遇到错误时,运行环境必须停止运行接下来的Plugin。

如果是ADD操作失败了,如果运行环境决定处理错误,则需要按逆序执行DEL操作,即使该Plugin没有在ADD操作中被调用。

PLugin应该完成一个DEL操作,即使对应的资源不存在。

网络配置列表的例子

{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "plugins": [
    {
      "type": "bridge",
      // type (plugin) specific
      "bridge": "cni0",
      // args may be ignored by plugins
      "args": {
        "labels" : {
            "appVersion" : "1.0"
        }
      },
      "ipam": {
        "type": "host-local",
        // ipam specific
        "subnet": "10.1.0.0/16",
        "gateway": "10.1.0.1"
      },
      "dns": {
        "nameservers": [ "10.1.0.1" ]
      }
    },
    {
      "type": "tuning",
      "sysctl": {
        "net.core.somaxconn": "500"
      }
    }
  ]
}

运行环境执行网络配置列表的例子

需要注意的是运行环境添加了cniVersionname字段来保证一致性。

  1. 首先使用下面的JSON调用bridge plugin:
{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "type": "bridge",
  "bridge": "cni0",
  "args": {
    "labels" : {
        "appVersion" : "1.0"
    }
  },
  "ipam": {
    "type": "host-local",
    // ipam specific
    "subnet": "10.1.0.0/16",
    "gateway": "10.1.0.1"
  },
  "dns": {
    "nameservers": [ "10.1.0.1" ]
  }
}
  1. 然后使用下面的JSON调用 tuning Plugin,prevResult字段内容为上一步bridge Plugin的输出:
{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "type": "tuning",
  "sysctl": {
    "net.core.somaxconn": "500"
  },
  "prevResult": {
    "ips": [
        {
          "version": "4",
          "address": "10.0.0.5/32",
          "interface": 0
        }
    ],
    "dns": {
      "nameservers": [ "10.1.0.1" ]
    }
  }
}

对于上面的列表,容器运行环境按下面的步骤执行DEL操作,需要注意的是已经不需要prevResult字段了,并且所有操作都是逆序的。

  1. 首先使用下面的JSON调用tuning Plugin:
{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "type": "tuning",
  "sysctl": {
    "net.core.somaxconn": "500"
  }
}
  1. 然后使用下面的JSON调用bridge Plugin:
{
  "cniVersion": "0.3.1",
  "name": "dbnet",
  "type": "bridge",
  "bridge": "cni0",
  "args": {
    "labels" : {
        "appVersion" : "1.0"
    }
  },
  "ipam": {
    "type": "host-local",
    // ipam specific
    "subnet": "10.1.0.0/16",
    "gateway": "10.1.0.1"
  },
  "dns": {
    "nameservers": [ "10.1.0.1" ]
  }
}

IP分配

作为操作的一部分,CNI plugin需要针对网络接口分配(维护)IP地址,并且需要设置必要的路由信息。

为了减少负担并将IP管理策略和CNI Plugin分离开,我们设计了IP地址管理插件(IPAM plugin),CNI Plugin负责在执行过程中调用IP地址管理插件,IPAM Plugin将确定IP和子网,网关还有路由,并将信息返回给主插件,IPAM plugin可能会通过某个协议获取这些数据(dhcp等),并将数据存在本地磁盘配置文件中的ipam节。

IP地址管理 (IPAM) 接口

和CNI plugins一样,IPAM plugin也是作为可执行文件调用的。同样也是会在一系列路径下需要对应的可执行文件,同样会接受所有CNI plugin使用的环境变量,网络配置也是通过stdin进行传递。

调用成功必须返回code 0,并且如下的结构要打印到stdout中(针对ADD操作)

{
  "cniVersion": "0.3.1",
  "ips": [
      {
          "version": "<4-or-6>",
          "address": "<ip-and-prefix-in-CIDR>",
          "gateway": "<ip-address-of-the-gateway>"  (optional)
      },
      ...
  ],
  "routes": [                                       (optional)
      {
          "dst": "<ip-and-prefix-in-cidr>",
          "gw": "<ip-of-next-hop>"                  (optional)
      },
      ...
  ]
  "dns": {
    "nameservers": <list-of-nameservers>            (optional)
    "domain": <name-of-local-domain>                (optional)
    "search": <list-of-search-domains>              (optional)
    "options": <list-of-options>                    (optional)
  }
}

需要注意的是和CNI plugin不同,IPAM plugin不需要在Result中返回interfaces,因为IPAM plugin不关心调用者配置的网络接口是什么(除了像dhcp等需要知道的plugin)

cniVersion specifies a Semantic Version 2.0 of CNI specification used by the plugin.

The ips field is a list of IP configuration information.

The dns field contains a dictionary consisting of common DNS information.

错误和日志的处理方式同CNI plugin。

IPAM plugin examples:

  • host-local: Select an unused (by other containers on the same host) IP within the specified range.
  • dhcp: Use DHCP protocol to acquire and maintain a lease. The DHCP requests will be sent via the created container interface; therefore, the associated network must support broadcast.

Notes

  • Routes are expected to be added with a 0 metric.
  • A default route may be specified via “0.0.0.0/0”. Since another network might have already configured the default route, the CNI plugin should be prepared to skip over its default route definition.

常用数据结构

IPs

"ips": [
    {
        "version": "<4-or-6>",
        "address": "<ip-and-prefix-in-CIDR>",
        "gateway": "<ip-address-of-the-gateway>",      (optional)
        "interface": <numeric index into 'interfaces' list> (not required for IPAM plugins)
    },
    ...
]

ips字段是一个IP信息的列表,每个项都是描述一个网络接口的IP配置的。
针对多个网络接口,或者针对一个接口的多个IP配置会作为独立的IP信息放入列表中。
所有的IP配置字段如下:

  • version (string): either “4” or “6” and corresponds to the IP version of the addresses in the entry.
    All IP addresses and gateways provided must be valid for the given version.
  • address (string): an IP address in CIDR notation (eg “192.168.1.3/24”).
  • gateway (string): the default gateway for this subnet, if one exists.
    It does not instruct the CNI plugin to add any routes with this gateway: routes to add are specified separately via the routes field.
    An example use of this value is for the CNI bridge plugin to add this IP address to the Linux bridge to make it a gateway.
  • interface (uint): the index into the interfaces list for a CNI Plugin Result indicating which interface this IP configuration should be applied to.
    IPAM plugins should not return this key since they have no information about network interfaces.

Routes

"routes": [
    {
        "dst": "<ip-and-prefix-in-cidr>",
        "gw": "<ip-of-next-hop>"               (optional)
    },
    ...
]
  • Each routes entry is a dictionary with the following fields. All IP addresses in the routes entry must be the same IP version, either 4 or 6.
    • dst (string): destination subnet specified in CIDR notation.
    • gw (string): IP of the gateway. If omitted, a default gateway is assumed (as determined by the CNI plugin).

DNS

"dns": {
  "nameservers": <list-of-nameservers>                 (optional)
  "domain": <name-of-local-domain>                     (optional)
  "search": <list-of-additional-search-domains>        (optional)
  "options": <list-of-options>                         (optional)
}

The dns field contains a dictionary consisting of common DNS information.

  • nameservers (list of strings): list of a priority-ordered list of DNS nameservers that this network is aware of. Each entry in the list is a string containing either an IPv4 or an IPv6 address.
  • domain (string): the local domain used for short hostname lookups.
  • search (list of strings): list of priority ordered search domains for short hostname lookups. Will be preferred over domain by most resolvers.
  • options (list of strings): list of options that can be passed to the resolver.
    See CNI Plugin Result section for more information.

常用错误码

Error codes 1-99 must not be used other than as specified here.

  • 1 - Incompatible CNI version
  • 2 - Unsupported field in network configuration. The error message must contain the key and value of the unsupported field.