Inventors:
Cynthia A. McGuire - San Jose CA, US
Timothy P. Haley - Boulder CO, US
Andrew M. Rudoff - Boulder CO, US
Michael W. Shapiro - San Francisco CA, US
Matthew T. Simmons - San Francisco CA, US
Assignee:
Sun Microsystems, Inc. - Santa Clara CA
International Classification:
G06F 11/00
G06F 11/07
US Classification:
714 48, 714 57, 714 26, 714 25
Abstract:
A method, apparatus, and computer program product diagnosing and resolving faults is disclosed. A disclosed fault management architecture includes a fault manager suitable having diagnostic engines and fault correction agents. The diagnostic engines receive error information and identify associated fault possibilities. The fault possibility information is passed to fault correction agents, which diagnose and resolve the associated faults. The architecture uses logs to track the status of error information, the status of fault management exercises, and the fault status of system resources. Additionally, a soft error rate discriminator can be employed to track and resolve soft (correctible) errors in the system. The architecture is extensible allowing additional diagnostic engines and agents to be plugged in to the architecture without interrupting the normal operational flow of the computer system.