If the behavior of a system deviates from the expected behavior, then this is called a fault. A transient fault is intermittent or short-lived fault. For example, when a user tries to update a table, the operation may fail due to timeout or network related errors or any intermittent database connection issues. Another scenario is the database resources are deadlocked. These issues are short-lived and usually if you retry the operation shortly after sometime the operation may succeed.
An enterprise level application has to be transient fault tolerant, if it is designed for high availability. For example, if a deadlock occurs, SQL engine will consider one resource as a deadlock victim and rollback the transaction and raise a 1205 error. Instead of raise this error to the application immediately, the code should catch this exception and retry issuing the statement a specified number times before raising the error to the user.
I’m currently reading the latest Enterprise Library version 6.0 documentation from Microsoft released on April, 2013 that contains a block for the Transient Fault Tolerance to include in your applications. Using this block you can define retry intervals and strategies targeting SQL Databases and Windows Azure services.
You can read more on EntLib and Transient Fault handling here:
Apart from Transient Fault Handling block, the EntLib 6.0 also contains Data Access application block, Exception handling block, Semantic logging block, validation block etc. Even if you decide not to use this library, at least reading the documentation and a code walk-through give you some insight on how to handle scenarios describing on it.