Eye on Blades Blog: Trends in Infrastructure
Get HP BladeSystem news, upcoming event information, technology trends, and product information to stay up to date with what is happening in the world of blades.

Network Throughput Testing

Richard is known as "Mr. NetPerf" to all the people that seek his advice on networking standards, performance and all the nuiances that it entails. Recently he did a great reply to a question about Network Performance testing and I wanted to share it with you.

 

The original question came from Glen:

I am mucking about with some lab gear, trying to show server  Virtual Connect to IRF throughput.  So far I have managed to generate about 3 Gbps of traffic with just three VMs on a single BL460c Gen8 server using iPerf.  I am curious to know what tools folks have used for server traffic generation that might be better than iPerf for lab testing?

 

And here was Richard's reply:

 

********************

Actually, if one's goal is strictly bits per second iperf is probably no worse than anything else.  I will point-out there is more to networking than bits per second.

 

If the goal is to show the throughput achievable by the combination of server, VC and IRF, why the VMs?  It may not be iperf holding things back...

 

Now, having said that, some boilerplate I trot-out from time to time on the topic of networking performance, which may have some useful tidbits.  You may have to "translate" some of the Linux utilities to your OS of choice:

 

Some of my checklist items when presented with assertions of poor network performance, in no particular order, numbered only for convenience of reference:

 

1) Is *any one* CPU on either end of the transfer at or close to 100%

   utilization?  A given TCP connection cannot really take advantage

   of more than the services of a single core in the system, so

   average CPU utilization being low does not a priori mean things are

   OK.

 

2) Are there TCP retransmissions being registered in netstat

   statistics on the sending system?  Take a snapshot of netstat -s -t

   from just before the transfer, and one from just after and run it

   through beforeafter tools:

 

   netstat -s -t > before

   transfer or wait 60 or so seconds if the transfer was already going

   netstat -s -t > after

   beforeafter before after > delta

 

3) Are there packet drops registered in ethtool -S statistics on

   either side of the transfer?  Take snapshots in a manner similar to

   that with netstat.

 

4) Are there packet drops registered in the stats for the switch(es)

   being traversed by the transfer?  These would be retrieved via

   switch-specific means.

 

5) What is the latency between the two end points.  Install netperf on

   both sides, start netserver on one side and on the other side run:

 

   netperf -t TCP_RR -l 30 -H <remote>

 

   and invert the transaction/s rate to get the RTT latency.  There

   are caveats involving NIC interrupt coalescing settings defaulting

   in favor of throughput/CPU util over latency but when the connections are over a WAN latency is important and

   may not be clouded as much by NIC settings.

 

   This all leads into:

 

6) What is the *effective* TCP (or other) window size for the

   connection.  One limit to the performance of a TCP bulk transfer

   is:

 

   Tput <= W(eff)/RTT

 

   The effective window size will be the lesser of:

 

   a) The classic TCP window advertised by the receiver. This is the

      value in the TCP header's window field shifted by the window

      scaling factor which was exchanged during connection

      establishment. The window scale factor is why one wants to get

      traces including the connection establishment.

  

      The size of the classic window will depend on whether/what the

      receiving application has requested via a setsockopt(SO_RCVBUF)

      call and the sysctl limits set in the OS.  If the receiving

      application does not call setsockopt(SO_RCVBUF) then under Linux

      the stack will "autotune" the advertised window based on other

      sysctl limits in the OS.  Other stacks may or may not autotune.

 

   b) The computed congestion window on the sender - this will be

      affected by the packet loss rate over the connection, hence the

      interest in the netstat and ethtool stats.

 

   c) The quantity of data to which the sending TCP can maintain a

      reference while waiting for it to be ACKnowledged by the

      receiver - this will be akin to the classic TCP window case

      above, but on the sending side, and concerning

      setsockopt(SO_SNDBUF) and sysctl settings.

 

   d) The quantity of data the sending application is willing/able to

      send at any one time before waiting for some sort of

      application-level acknowledgement.  FTP and rcp will just blast

      all the data of the file into the socket as fast as the socket

      will take it.  Scp has some application-layer "windowing" which

      may cause it to put less data out onto the connection than TCP

      might otherwise have permitted.  NFS has the maximum number of

      outstanding requests it will allow at one time acting as a

      defacto "window" etc etc etc

 

7) Another magic forumla for TCP bulk transfer performance comes from

   Mathis, Semke, Mahdavi & Ott

   http://www.psc.edu/networking/papers/model_ccr97.ps

 

   Tput <= (MSS/RTT) * (1/sqrt(p))

 

   MSS is Maximum Segment Size

   RTT is Round Trip Time

   p   is the packet loss rate as a probability (eg values of 0 to 1.0)

 

   Which assumes a few things about the congestion control algorithm

   being used and that there is no classic TCP window limitation as

   mentioned in item 6.

 

8) Is the link/path between the sender and the receiver composed of

   single-link hops, or might some be aggregated link hops?  If the

   latter, does traffic from a single flow (eg TCP connection) get

   striped across each link, or does it stay on just one link in the

   aggregation(s)?  Striping across multiple links can lead to packet

   re-ordering which will affect TCP performance.  If there are

   aggregated links and no striping then the advertised "N Gbit/s" may

   really be "1/n Gbit/s" per flow.

 

************

 

Thanks to Richard and to get more info on Virtual Connect go to: www.hp.com/go/vitrtualconnect

 

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
  • I work within EMEA ISS Central team and a launch manager for new products and general communications manager for EMEA ISS specific information.
  • Hello! I am a social media manager for servers, so my posts will be geared towards HP server-related news & info.
  • HP Editor-Enterprise Group: ISS, BCS, Converged Infrastructure (CI), Converged Cloud, Converged App Systems (CAS), and ExpertOne
  • WW responsibility for development of ROI and TCO tools for the entire ISS portfolio. Technical expertise with a financial spin to help IT show the business value of their projects.
  • Global Marketing Manager with 15 years experience in the high-tech industry.
  • Network industry experience for more than 20 years - Data Center, Voice over IP, security, remote access, routing, switching and wireless, with companies such as HP, Cisco, Juniper Networks and Novell.
Follow Us