Technical Support Services Blog
Discover the latest trends in technology, and the technical issues customers are overcoming with the aid of HP Technology Services.

FCOE, Data corruption, server will not boot, business is offline.

2:37 AM –The Chief Technology Officer along with other executives have their pagers, cells, and home telephones light up.  On the other end of the line is the data center manager. The manager begins to explain, frantically, that the business is offline and there is no ETA on a fix.

 Busy.png

Sounds like the making of a good novel, right?  Just imagine data integrity, OS’s unable to boot, hundreds if not thousands of Virtual Servers and Desktops impacted by a physical layer outage.  Not just a single point of failure – but all redundancy rendered useless due to the failure mode.  Data Integrity, servers offline, business unit crippled—to have any one of these issues would yield a bad day; however, having all of them at the same time is indeed a nightmare.  As I am sure most of you have heard at one time or another Murphy’s law– What can go wrong, will go wrong—so ask yourself if you need a higher level of support?

 

 

Recently I was engaged on an ongoing problem that sustained all three of the aforementioned catastrophic events.  To make matters worse, the customer had just bought the new solution utilizing a new technology (FCoE) which utilizes converged adapters—in this case, they were integrated Emulex dual port adapters.

 

To add complexity –as if that’s possible—the customer opted to go with a multivendor hardware, software, and multivendor support strategy; which, I’m sure you can imagine lead to some finger pointing. In this particular case, only the HP C7000 with BL620, and BL685 G7 were under HP support, the rest of the environment was vendor X for network and vendor Y for Storage – feel free to fill in x and Y  J… 

 

Prior to my engagement the 3rd party support teams, had been at it for more than 12 days.  Needless to say nerves were frayed and the customer’s patients were all but non-existent. This was further compounded by the executive staff questioning the IT directors decision to adopt such a vulnerable solution.  The technical team, which was comprised of technologist from  many leading companies as well as the customer’s own technical team, and all were convinced the problem resolved around the new FCoE technology.  In fact, the customer stated on numerous occasions that this new bleeding edge technology is riddled with bugs.  Suffice it to say, any new bleeding edge technology will inevitably encounter a defect for which safety logic will be scrutinized.  Though the customer and third party support teams felt confident that the underlying issue laid  solely on the new FCoE technology , all traces captured yielded no signs of ill behavior by the technology.  With that understanding, you can imagine my difficulties of turning around a ship lost in trying seas.

 

The key to overcoming a complex business impacting event such as this is having a technical support partner that understands both the business and technology applied. With my teams engagement, we cordially requested leads from all parties to collaborate on findings this far and all technical troubleshooting conducted in order to clearly understand the full gambit of the technology failure.  Hardware protocol analyzers were deployed, software-based network sniffer were engaged, and system memory dumps were being captured by the dozens, and although I could easily write a dissertation on the events that transpired over the hours that followed, I will sum it up by stating the troubleshooting efforts had went awry, chasing the wrong rabbit down a very dark hole.  We asked everyone to step back, rethink/regroup and think about Occam’s Razor in that the simplest hypotheses with fewest assumptions that explains the behavior is most likely the culprit.  Within hours, a hypothesis was derived and root cause isolated to LUN masking on the array.  Wow, I know what you’re thinking; seriously something as simple as LUN masking took this much time to resolve?  Unfortunately, this is all too often what happens in a multivendor solution.  When one vendor (utilizing known technology) tells a customer that the their product is configured correct, everyone then tends to steer blindly to the newest technology as the root of all ill behavior.  All in all, a rather simplistic LUN masking configuration allowed multiple host to see the same storage lun.  What transpired next is one hosts failure to boot followed by another host overwriting the boot sector, followed by--- well, I do not need to continue that vicious cycle.  As most technologist will know, it is ill advised to have multiple hosts that are not cluster members accessing the same storage luns.  In this situation, every time the system was rebuilt a separate system would once again overwrite the boot sector making it appear as if the new technology was failing. 

 

Key Lesson:Technical Reach To Business.png

Take your time when designing your IT infrastructure , think about who will support it , and whether we’re not you have the skilled technologist in your organization to handle the most critical situations conceivable .  Remember that always on support from HP is just that, always on.  HP has the technologist that handle complex technical issues, and business acumen on a daily basis tackling the world’s biggest crisis with speed and passion for our customers .  And lastly, remember to focus on the real technology problem.

Comments
TedSaul(anon) | ‎06-30-2012 01:59 AM

Nice write-up ... Thanks!!!

GregTinker | ‎07-31-2012 05:21 PM

Thanks Ted

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
  • More than 30 years in Sales and Marketing in IT services business. Currently managing global campaigns for Datacenter Care.
  • I graduated in Software Engineering. Joined HP family five years ago, I deliver Insight Remote Support technical consulting for HP customers, in North America, Canada and Latin America. Assist setting up, installing and configuring the solution in customers' IT environments.
  • I am an identical twin. My brother’s name is Greg Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB).
  • I am an identical twin. My brother’s name is Chris Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB). We have always attended the same schools, studied the same material (big surprise, as we are identical twins), and have always worked as a close team and strive to demonstrate our teaming ability’s to others. We each have more than 11 years experience supporting mission-critical enterprise customers on a broad range of technologies. We’ve both won the HP MVP award multiple times as well as coauthored books, programs, and whitepapers in our spare time.
  • More than 25 years in the IT industry, managing ITSM, service development and delivery projects in Technology Services. Specialized in end2end support for ISV based business solutions. Certified ITIL and project management expert.
  • Eduardo Zepeda, WW TS Social Media Program Manager & Internal Communications for WW Technology Services Blogging on behalf of HP Technology Services (TS_Guest)
  • I have been with HP for 13 years, always in Services - first as a Services Channel Sales rep, then a Channel Services Segment Manager, and now, in WW Technology Services Marketing. These may be my formal job titles, but I'm really a Cheerleader for HP Services! I feel that HP has great services, exceptional Technical Experts and Delivery teams, and so many cool things are going on at HP Services. So, stay tuned...
  • I have 27 years of system, storage, and networking experience including detailed work with Data Protector (formerly Omniback II) for the past 14 years. My expertise includes StoreOnce deduplication technology, D2D appliances, performance tuning, complex remediation, and online backup integration with applications like Oracle and infrastructure like VMware. Traveling across the United States and Canada as a Sr. Technical Consultant, I deliver specialized consulting for a broad variety of HP customers.
  • MrCollaboration (aka Jim Evans) is an HP Global Services Alliance Manager. He has worked in the IT industry for more than 30 years, 22 of which were spent with Digital Equipment Corporation, Compaq and HP. He works with many third party vendors and partners to develop processes to facilitate excellent support and service for mutual customers. Jim is also HP’s representative to the Technical Support Alliance Network (TSANet).
  • I've been working in Customer Service for over 20 years. During my career I've provided support services for Languages, Programming Libraries and Operating Systems. During the last 10 years I've provide support for Linux and more recently VMware. My current role is as a Technical Account Manager working in the HP Custom Mission Critical Services Industry Standard Operating Systems team. I provide both reactive and proactive operating system support for proLiant servers and blades. Our services in the Custom teams are built on statement of work contracts for large HP customers who need a customized mission critical support offering.
  • I've been working in HP since 2007 like IT agent, developer, Web designer and then like Web Project Manager
  • I like to listen as much as I like to talk. Why? My 25+ years in the technology industry has taught me that the key to delivering value to customers is to understand what they value in the first place! I developed this passion for customers and consultative selling during my 12 years with Accenture, and I have continued to approach customers in a consultative way during my 12+ year tenure with HP. I also have a passion for HP given my knowledge of our Product and Service Portfolio and the differentiators we possess that position us as a leader in the areas our customers are telling us they want to go. Converged Infrastructure, Converged Cloud, Big Data – and the associated Service and Support implications – all such exciting technology trends where our success will hinge upon our ability to differentiate ourselves versus others in the areas that matter most to our customers. Right up my alley, and I am proud to be part of the great HP team where I know we have the best solutions in the industry!
  • Tom Clement has over 30 years experience in the areas of adult learning, secondary education, and leadership development. During this time Tom has been a consistent champion of “non-traditional” training delivery methods, including blended learning, virtual delivery (self paced and instructor led), the use of training games and simulations, and experiential learning. Tom has spent the past 25 years of his career at Hewlett Packard, focused most recently on HP’s global Virtualization, Cloud, and Converged Infrastructure customer training programs. Tom manages the strategic direction and overall performance of these training programs, ensuring these worldwide programs help HP’s customers capitalize on the business opportunities made available by IT advancements in each of these subject areas. Tom and his global teammates utilize best in class instructors, course content and supporting equipment infrastructure to deliver these training programs to HP’s customers. The team prides itself on providing the Virtualization, Cloud, and Converged Infrastructure content customers need when and where they need it, anywhere in the world. Tom is based in the Washington, DC suburbs and can be reached at tom.clement@hp.com.
Follow Us