Technical Support Services Blog
Discover the latest trends in technology, and the technical issues customers are overcoming with the aid of HP Technology Services.

Memory performance and its effect on hypervisors, guests and Big Data Analytics

Memory Performance

I recently was engaged on a case where a customer was not able to achieve the network throughput required to satisfy business requirements, even though the network was a virtual switch (vswitch) within a VMware hypervisor and the communication was between two guests within the same hypervisor (HOST).

 

Overview:

Due to advancements in technology, computers are getting significantly more dense all while reducing physical dimensions.  Not too long ago, a machine with 512 cores would require significant real estate within the datacenter.  I’m sure there are those of us that remember a time when computers were the size of entire rooms -- even buildings.  The legacy HP Superdome is one such example of “big iron” when 64 processors (128 cores) with 1TB of RAM would equate to two refrigerators. 

 

Today, the new Superdome-2 Itanium blades can scale beyond 256 Itanium cores and the HP Proliant G7 & G8 blades can scale to four 16-core AMD Opteron 6200 series processors per blade, allowing for a density unparalleled in today’s compute Enterprise -- 512 Cores in one C7000 chassis (relativity the size of a collage dorm room refrigerator) and a memory footprint of 2TB.  See  http://www.hp.com/go/blades --Integrity, Non-Stop, and Proliant.

 

Today's density levels are bringing back a problem which I have seen many times over the years –CPU to memory performance.  These CPU’s densities combined with BUS speeds and vast amounts of memory leads to page faults which requires CPU’s to fetch memory.  Yes this is incredible fast; however, in the field of High Performance Computing, microsecond tend to be a significant cost when looking at every single transaction.

 

Memory BUS saturation and latency:

This case initially presented with an observed delta in network throughput of approximately 50-75% when testing with AMD and Intel Vmware hypervisor HOST (INTEL being faster of the two).  This Initial observation led to hypothesis for which further scrutiny debunked due to the various testing variables which were thought to be of no consequence – i.e. 48 cores vs 64 cores, 100GB of Ram Vs 1TB of RAM, AMD vs INTEL (core counts were not the same.. Intel had Hyper-threading enabled).  Other theories emerged surrounding processor architecture which led to core count (CPU switches – kernel scheduling for SMP); however, there was one relatively simple, albeit obscure fact when comparing different architectures and blades and testing paradigms – the amount of memory and the BUS layout.  My testing clearly illustrated a problem with the CPU's ability to fetch data from memory.  The memory BUS speed varies from 800MHz to 1600 MHz depending on these options. 

 

Today their are two main BUS architectures for the CPU to memory transport: the QuickPatch Interconnect (QPI ) for INTEL (G7 & G8) which replaced the legacy Intel Front Side Bus (FSB) (G5) and  the AMD Hyper Transport BUS (G7 & G8).  These BUS architectures are capable of performing 6.4GT/sec; however, this customer’s server was not achieving this level of performance.  The customer’s testing revealed that the AMD system had a problem while the INTEL system seemed to not have a problem with network performance.  Given that the AMD architecture allowed for a higher density core count and the overall architecture met all business requirements, the challenge was for my team to work with our partners and resolve the network latency issue.

 

The biggest issue with the test was that the INTEL system did not have the same amount of memory.

 

 

Identification of problem:

Further instrumentation was utilized and testing procedures identified to account for the observed delay in network packet generator delays.  Network traces only showed a delay of microseconds and a delta of 40 microseconds between the platforms.  With our instrumentation and testing we knew the problem was closely linked to the CPU’s ability to fetch (load store) and execute code; therefore the problem was with page faults and not the ability of the CPU to execute instructions or optimizations of instruction sets. Not finding any issue with the execution stack, my focus turned toward the amount of memory on the HOST and the configuration of that memory. 

 

It is not as easy as just throwing memory into a system.. variables exist -- the frequency at which the memory operates, size of DIMMS, slots occupied,  type of memory, etc.

 

An example chart: 

DDR3 memory comparison

 

RDIMMs

LRDIMM

UDIMMs

HDIMMs**

Maximum DIMM capacity

16 GB

32 GB

8 GB

16 GB

Maximum Server Capacity

AMD: 1 TB max capacity*

(48 slots; 32 GB quad rank DIMMs)

 

Intel: 2TB max capacity

(64 slots; 16 GB quad rank DIMMs)



N/A

 

Intel 2 socket: 768 GB

(24 slots; 32 GB LRDIMMs)

AMD: 64GB

(16 slots; 4 GB dual rank DIMMs)

 

Intel: 48 GB max capacity
(12 slots; 4 GB dual rank DIMMs)



N/A

 

Intel 2 socket: 384 GB

(24 slots; 16 GB HDIMMs)

Maximum # of DIMMs/channel

3 dual rank

3 quad rank (LRDIMM only)

2 dual rank

3 dual rank

Low power option

4 GB, 8 GB, 16 GB, 32 GB

32 GB

2GB, 4 GB, 8GB

N/A

Address error detection

Yes

Yes

No

Yes

 

Using the HP Memory Configuration tool: http://h18004.www1.hp.com/products/servers/options/tool/hp_memtool.html

We were able to quantify the exact BUS speeds between the INTEL and the AMD based on the fact that the INTEL only had a fraction of the amount of memory which was on the AMD, not to mention different DIMM types.  Even after this, I was far short of the performance required by the business. 

 

 

BIOS settings and interleaving

If you assume that the default memory layout is non-memory interleaved at the NODE level, then you would expect, as long as the thread did not perform a CPU switch then the L1 cache is still valid.  After I went into the BIOS, turns out the customer’s machines had NODE memory interleaving enabled (Not the same with INTEL.. anothe data point that explained the differences in performance).  Upon this finding, I immediately disabled this setting, booted and re-ran the test.  More than doubled the applications test results and passing business requirements for performance for performance.  This means the problem was everytime the CPU had a page faut, mostlikely the request had to traverse the HyperTransport BUS to fetch the page from a location near another CPU.

 

 

Though server technology advancements are keeping up with Big Data analytic requirements, we are not yet at the “plug&play” configuration for  High Performance Computer Clusters.  Teams of highly skilled programmers, professional technicians, Master Technologist and business leaders are required to design solutions as well as business requirements so that Solution Architects are able to design IT solutions which are successful.  HP Technical Services has the expertise to address these challanges -- helping you make the most of today's technology. 

 

Whether or not to use memory interleaving depends on use case.  In this situation, the application required node memory interleaving to be disabled due to the fact that the working set size of the application aligned with this mode; however, there are cases when the interleaving would be required.

 

 

What are the types of memory interleaving?

Memory bank interleaving

When you use memory bank interleaving, data goes alternately to memory banks through the common memory channel connecting the DIMM banks and the integrated memory controller. Memory bank interleaving increases the probability that more DIMMs will remain in an active state (requiring more power) because the memory controller alternates between memory banks and between DIMMs.

Memory bank interleaving is automatically enabled on a processor node under the following conditions:

• Two single-rank DIMMs per channel result in two-way bank interleaving.

10

 

• Two dual-rank DIMMs per channel result in four--way bank interleaving.

• Two quad-rank DIMMs per channel result in eight-way bank interleaving.

• Two dual-rank DIMMs and one quad-rank DIMM result in eight-way bank interleaving, in servers using three DIMMs per channel.

 

Memory channel interleaving

Memory channel interleaving transfers data by alternate routing through the two available memory channels. As a result, when the memory controller must access a block of logically contiguous memory, the requests don’t stack up in the queue of a single channel. Alternate routing decreases memory access latency and increases performance. However, memory channel interleaving increases the probability that more DIMMs must remain in an active state.

Memory channel interleaving is always active on AMD Opteron 6200 Series processors.

 

Memory node interleaving

Node interleaving can interleave memory across any subset of nodes in the multi-processor system.

 

Node interleaving breaks memory into 4 KB addressable entities and assigns blocks of addresses to the nodes in the sequence indicated in the following table.

Sequencing of memory node interleaving across multiprocessor systems Node

Assigned Addresses

0

0–4095

1

4096–8191

2

8192–12287

3

12888–16383

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
  • More than 30 years in Sales and Marketing in IT services business. Currently managing global campaigns for Datacenter Care.
  • I graduated in Software Engineering. Joined HP family five years ago, I deliver Insight Remote Support technical consulting for HP customers, in North America, Canada and Latin America. Assist setting up, installing and configuring the solution in customers' IT environments.
  • I am an identical twin. My brother’s name is Greg Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB).
  • I am an identical twin. My brother’s name is Chris Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB). We have always attended the same schools, studied the same material (big surprise, as we are identical twins), and have always worked as a close team and strive to demonstrate our teaming ability’s to others. We each have more than 11 years experience supporting mission-critical enterprise customers on a broad range of technologies. We’ve both won the HP MVP award multiple times as well as coauthored books, programs, and whitepapers in our spare time.
  • More than 25 years in the IT industry, managing ITSM, service development and delivery projects in Technology Services. Specialized in end2end support for ISV based business solutions. Certified ITIL and project management expert.
  • Eduardo Zepeda, WW TS Social Media Program Manager & Internal Communications for WW Technology Services Blogging on behalf of HP Technology Services (TS_Guest)
  • I have been with HP for 13 years, always in Services - first as a Services Channel Sales rep, then a Channel Services Segment Manager, and now, in WW Technology Services Marketing. These may be my formal job titles, but I'm really a Cheerleader for HP Services! I feel that HP has great services, exceptional Technical Experts and Delivery teams, and so many cool things are going on at HP Services. So, stay tuned...
  • I have 27 years of system, storage, and networking experience including detailed work with Data Protector (formerly Omniback II) for the past 14 years. My expertise includes StoreOnce deduplication technology, D2D appliances, performance tuning, complex remediation, and online backup integration with applications like Oracle and infrastructure like VMware. Traveling across the United States and Canada as a Sr. Technical Consultant, I deliver specialized consulting for a broad variety of HP customers.
  • MrCollaboration (aka Jim Evans) is an HP Global Services Alliance Manager. He has worked in the IT industry for more than 30 years, 22 of which were spent with Digital Equipment Corporation, Compaq and HP. He works with many third party vendors and partners to develop processes to facilitate excellent support and service for mutual customers. Jim is also HP’s representative to the Technical Support Alliance Network (TSANet).
  • I've been working in Customer Service for over 20 years. During my career I've provided support services for Languages, Programming Libraries and Operating Systems. During the last 10 years I've provide support for Linux and more recently VMware. My current role is as a Technical Account Manager working in the HP Custom Mission Critical Services Industry Standard Operating Systems team. I provide both reactive and proactive operating system support for proLiant servers and blades. Our services in the Custom teams are built on statement of work contracts for large HP customers who need a customized mission critical support offering.
  • I've been working in HP since 2007 like IT agent, developer, Web designer and then like Web Project Manager
  • I like to listen as much as I like to talk. Why? My 25+ years in the technology industry has taught me that the key to delivering value to customers is to understand what they value in the first place! I developed this passion for customers and consultative selling during my 12 years with Accenture, and I have continued to approach customers in a consultative way during my 12+ year tenure with HP. I also have a passion for HP given my knowledge of our Product and Service Portfolio and the differentiators we possess that position us as a leader in the areas our customers are telling us they want to go. Converged Infrastructure, Converged Cloud, Big Data – and the associated Service and Support implications – all such exciting technology trends where our success will hinge upon our ability to differentiate ourselves versus others in the areas that matter most to our customers. Right up my alley, and I am proud to be part of the great HP team where I know we have the best solutions in the industry!
  • Tom Clement has over 30 years experience in the areas of adult learning, secondary education, and leadership development. During this time Tom has been a consistent champion of “non-traditional” training delivery methods, including blended learning, virtual delivery (self paced and instructor led), the use of training games and simulations, and experiential learning. Tom has spent the past 25 years of his career at Hewlett Packard, focused most recently on HP’s global Virtualization, Cloud, and Converged Infrastructure customer training programs. Tom manages the strategic direction and overall performance of these training programs, ensuring these worldwide programs help HP’s customers capitalize on the business opportunities made available by IT advancements in each of these subject areas. Tom and his global teammates utilize best in class instructors, course content and supporting equipment infrastructure to deliver these training programs to HP’s customers. The team prides itself on providing the Virtualization, Cloud, and Converged Infrastructure content customers need when and where they need it, anywhere in the world. Tom is based in the Washington, DC suburbs and can be reached at tom.clement@hp.com.
Follow Us