6 WHITE PAPER | Best Practices for Deployments using DCB and RoCE
• iSCSI Extensions for RDMA
Performance for Internet Small Computer System Interface (iSCSI) storage has also been enhanced with iSCSI extensions for RDMA
(iSER). The iSER protocols are defined in RFCs 5047 and 7145 and enable RDMA to be used to transfer data directly between memory
buers for computers and storage devices.
iSER promises to provide significant performance improvements over iSCSI due to eliminating the TCP/IP processing overhead,
this becomes significant with increased Ethernet speeds of 10GbE and beyond. iSER will provide higher throughput for storage
applications, lower latency and more ecient use of server and storage controller processing resources.
• NFS over RDMA
Network file system (NFS) is a distributed file system protocol that allows users on client computers to access files over a network
as if it was local storage. NFS is an open standard defined with request for comments (RFCs) that enable ongoing development
and implementation of new technologies. One focus area has been the remote procedure call (RPC) layer for NFS that provides
communication between the client and server. RDMA support has been added to the RPC layer with RFCs 5532, 5666 and 5667 to
provide enhanced data transfer performance.
Using RoCE for NFS over RDMA has the potential for similar performance benefits as SMB Direct for increasing performance of
applications servers that use network file storage. NFS clients and servers can expect higher throughput at smaller data block sizes as
well as increased I/O operations per second (IOPS), lower latency and reduced NFS client and server CPU consumption.
RoCE Evaluation Design
Ethernet uses a best eort delivery for network trac with delivery of packets based on the trac load at the moment of sending. As a result,
there is no guarantee that specific quality of service (QoS) trac will be preserved or prioritized. To support RoCE, the network must be
lossless, insuring no resending of packets that are lost due to congestion.
To achieve lossless delivery over Ethernet multiple mechanism are introduced. Link level flow control (IEEE 802.3x) is introduced in order to
signal the sender that the receiver is under congestion and that trac needs to be reduced. A more granular way to control the amount of
trac that is coming from the sender is to use a Priority-based Flow Control or PFC (IEEE 802.1Qbb) mechanism, that sends pause frames for
each specific Class of Service (CoS). In this way, there is a slowing down of only one class of trac that is under congestion. In order to control
bandwidth of each class Enhanced Transmission Selection or ETS (IEEE 802.1Qaz) is introduced. With ETS specific bandwidth is assigned to each
of the CoS. Specific bandwidth is propagated through DCBX.
RoCE provides low latency and high throughput for data transfer with the same standards that are used for network congestion. RoCE trac
is marked with a priority CoS value 5, default marking from Emulex network adapter, for identification and classification purposes. Bandwidth
is allocated using ETS and propagated to network adapters with DCBX. A scheduling algorithm makes decisions as to which queue will be
serviced next. Trac that is queued in a priority CoS gets served more often than trac in the default CoS, which preserves low latency for the
trac that belongs to a priority CoS.
For this evaluation, the network was configured to have trac over a designated VLAN with RoCE transmissions assigned to CoS 5. RoCE trac
was marked as a non-drop class and maximum transfer unit (MTU) values were assigned. As shown in Figure 3, 50% of bandwidth was allocated
to RoCE CoS 5 and the remaining bandwidth was allocated to the default CoS (all other classes of trac except the RoCE class). For trac
marked with COS 3 no bandwidth allocation, that trac is not used in the test.