Packet: 通过网卡收发的报文,包括链路层、网络层、传输层的协议头和携带的数据
Data Buffer:用于存储 packet 的内存空间
SKB: struct sk_buffer 的简写
2. 概述
Struct sk_buffer 是 linux TCP/IP stack 中,用于管理Data Buffer的结构。Sk_buffer 在数据包的发送和接收中起着重要的作用。
为了提高网络处理的性能,应尽量避免数据包的拷贝。Linux 内核开发者们在设计 sk_buffer 结构的时候,充分考虑到这一点。目前 Linux 协议栈在接收数据的时候,需要拷贝两次:数据包进入网卡驱动后拷贝一次,从内核空间递交给用户空间的应用时再拷贝一次。
Sk_buffer结构随着内核版本的升级,也一直在改进。
学习和理解 sk_buffer 结构,不仅有助于更好的理解内核代码,而且也可以从中学到一些设计技巧。
3. Sk_buffer 定义
struct sk_buff {
struct sk_buff *next;
struct sk_buff *prev;
struct sock *sk;
struct skb_timeval tstamp;
struct net_device *dev;
struct net_device *input_dev;
union {
struct tcphdr *th;
struct udphdr *uh;
struct icmphdr *icmph;
struct igmphdr *igmph;
struct iphdr *ipiph;
struct ipv6hdr *ipv6h;
unsigned char *raw;
} h;
union {
struct iphdr *iph;
struct ipv6hdr *ipv6h;
struct arphdr *arph;
unsigned char *raw;
} nh;
union {
unsigned char *raw;
} mac;
struct dst_entry *dst;
struct sec_path *sp;
char cb[40];
unsigned int len,
data_len,
mac_len,
csum;
__u32 priority;
__u8 local_df:1,
cloned:1,
ip_summed:2,
nohdr:1,
nfctinfo:3;
__u8 pkt_type:3,
fclone:2;
__be16 protocol;
void (*destructor)(struct sk_buff *skb);
/* These elements must be at the end, see alloc_skb() for details. */
unsigned int truesize;
atomic_t users;
unsigned char *head,
*data,
*tail,
*end;
};
4. 成员变量
· struct skb_timeval tstamp;
此变量用于记录 packet 的到达时间或发送时间。由于计算时间有一定开销,因此只在必要时才使用此变量。需要记录时间时,调用net_enable_timestamp(),不需要时,调用net_disable_timestamp() 。
tstamp 主要用于包过滤,也用于实现一些特定的 socket 选项,一些 netfilter 的模块也要用到这个域。
· struct net_device *dev;
· struct net_device *input_dev;
这几个变量都用于跟踪与 packet 相关的 device。由于 packet 在接收的过程中,可能会经过多个 virtual driver 处理,因此需要几个变量。
接收数据包的时候, dev 和 input_dev 都指向最初的 interface,此后,如果需要被 virtual driver 处理,那么 dev 会发生变化,而 input_dev 始终不变。
(These three members help keep track of the devices assosciated with a packet. The reason we have three different device pointers is that the main 'skb->dev' member can change as we encapsulate and decapsulate via a virtual device.
So if we are receiving a packet from a device which is part of a bonding device instance, initially 'skb->dev' will be set to point the real underlying bonding slave. When the packet enters the networking (via 'netif_receive_skb()') we save 'skb->dev' away in 'skb->real_dev' and update 'skb->dev' to point to the bonding device.
Likewise, the physical device receiving a packet always records itself in 'skb->input_dev'. In this way, no matter how many layers of virtual devices end up being decapsulated, 'skb->input_dev' can always be used to find the top-level device that actually received this packet from the network. )
· char cb[40];
此数组作为 SKB 的控制块,具体的协议可用它来做一些私有用途,例如 TCP 用这个控制块保存序列号和重传状态。
· unsigned int len,
· data_len,
· mac_len,
· csum;
‘len’ 表示此 SKB 管理的 Data Buffer 中数据的总长度;
通常,Data Buffer 只是一个简单的线性 buffer,这时候 len 就是线性 buffer 中的数据长度;
但在有 ‘paged data’ 情况下, Data Buffer 不仅包括第一个线性 buffer ,还包括多个 page buffer;这种情况下, ‘data_len’ 指的是 page buffer 中数据的长度,’len’ 指的是线性 buffer 加上 page buffer 的长度;len – data_len 就是线性 buffer 的长度。
‘mac_len’ 指 MAC 头的长度。目前,它只在 IPSec 解封装的时候被使用。将来可能从 SKB 结构中
去掉。
‘csum’ 保存 packet 的校验和。
(Finally, 'csum' holds the checksum of the packet. When building send packets, we copy the data in from userspace and calculate the 16-bit two's complement sum in parallel for performance. This sum is accumulated in 'skb->csum'. This helps us compute the final checksum stored in the protocol packet header checksum field. This field can end up being ignored if, for example, the device will checksum the packet for us.
On input, the 'csum' field can be used to store a checksum calculated by the device. If the device indicates 'CHECKSUM_HW' in the SKB 'ip_summed' field, this means that 'csum' is the two's complement checksum of the entire packet data area starting at 'skb->data'. This is generic enough such that both IPV4 and IPV6 checksum offloading can be supported. )
· __u32 priority;
“priority”用于实现 QoS,它的值可能取之于 IPv4 头中的 TOS 域。Traffic Control 模块需要根据这个域来对 packet 进行分类,以决定调度策略。
· __u8 local_df:1,
· cloned:1,
· ip_summed:2,
· nohdr:1,
· nfctinfo:3;
为了能迅速的引用一个 SKB 的数据,
当 clone 一个已存在的 SKB 时,会产生一个新的 SKB,但是这个 SKB 会共享已有 SKB 的数据区。
当一个 SKB 被 clone 后,原来的 SKB 和新的 SKB 结构中,”cloned” 都要被设置为1。
(The 'local_df' field is used by the IPV4 protocol, and when set allows us to locally fragment frames which have already been fragmented. This situation can arise, for example, with IPSEC.
The 'nohdr' field is used in the support of TCP Segmentation Offload ('TSO' for short). Most devices supporting this feature need to make some minor modifications to the TCP and IP headers of an outgoing packet to get it in the right form for the hardware to process. We do not want these modifications to be seen by packet sniffers and the like. So we use this 'nohdr' field and a special bit in the data area reference count to keep track of whether the device needs to replace the data area before making the packet header modifications.
The type of the packet (basically, who is it for), is stored in the 'pkt_type' field. It takes on one of the 'PACKET_*' values defined in the 'linux/if_packet.h' header file. For example, when an incoming ethernet frame is to a destination MAC address matching the MAC address of the ethernet device it arrived on, this field will be set to 'PACKET_HOST'. When a broadcast frame is received, it will be set to 'PACKET_BROADCAST'. And likewise when a multicast packet is received it will be set to 'PACKET_MULTICAST'.
The 'ip_summed' field describes what kind of checksumming assistence the card has provided for a receive packet. It takes on one of three values: 'CHECKSUM_NONE' if the card provided no checksum assistence, 'CHECKSUM_HW' if the two's complement checksum over the entire packet has been provides in 'skb->csum', and 'CHECKSUM_UNNECESSARY' if it is not necessary to verify the checksum of this packet. The latter usually occurs when the packet is received over the loopback device. 'CHECKSUM_UNNECESSARY' can also be used when the device only provides a 'checksum OK' indication for receive packet checksum offload. )
· void (*destructor)(struct sk_buff *skb);
· unsigned int truesize;
