My 2TB or larger storage attached to an IBM X Series server is being corrupted. What is causing this?

Solution Verified - Updated -

Issue

We have 100 nodes (IBM System X 3550) running Red Hat 4.3 (kernel 2.6.9-34.ELsmp). These nodes are connected to SAN storage (IBM DCS 9550 through Qlogic QLE2460 HBAs) and are using SNFS as a parallel file system.

All nodes are interconnected through 1Gb force10 s50 switches for normal communication. The force10 switches are also used to transfer the filesystem metadata from metadata servers to the nodes.

At a specific time, the network speed on one of the force10 switches dropped from 1Gb to 100Mb, making all the nodes connected to that switch not accessible.

At that time the users reported data corruption on SNFS file system.

Environment

  • Red Hat Enterprise Linux
  • IBM X-Series servers (full list of affected models can be found in IBM's notice)

  • Storage larger than 2 TB.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.