Consider the following setting: the availability of a critical computer system is monitored by two independent mechanisms. One is sending ping packets over the network and raises an alarm if there is no return packet within a second. The second measures the temperature of the computer system and if it is below
a threshold it will raise an alarm (as the system is then probably not running).
The computer system itself is certified by the manufacturer to be up and running 99.9% of the time. The ping mechanism can be faulty as well, the likelihood for this is 1%. If the ping mechanism works correctly, it is reliable in showing uptime: if the critical computer system is up and the ping mechanism is not faulty, then in
9999 out of 10000 cases the ping mechanism will report the computer system to be up. It is less precise when the computer system is down, in that case even if the ping mechanism is not faulty, the ping mechanism will report the computer system to be down only in 700 out of 1000 cases. If the ping mechanism is faulty and the computer system is up, then the ping mechanism will report the system to be up only in 1 out of 10000 cases. If the ping mechanism is faulty and the computer system is down, then the ping mechanism will however report the computer system to be unavailable in 800 out of 1000 cases. The reliability of the thermometer (the second mechanism) is influenced by the computer system, as it tends to be more faulty with high temperature (so when the system is available). In particular, when the temperature is above the threshold, the thermometer is faulty in 30% of all cases, while it is faulty only in 3% of the cases when the temperature is below the threshold. When the thermometer works correctly, it will identify high temperature correctly in 95 out of 100 cases, and it will identify low temperature correctly in 999 out of 1000 cases. If the thermometer is faulty, it will still identify high temperature correctly in 93% of all cases, but low temperature will be misidentified in 98% of all cases.
Create a Bayesian network that models this situation.
Answer the following questions:
_ How likely is it that the ping mechanism raises an alarm when the system
is available?
_ Both mechanisms signal unavailability of the computer system. How likely
is it that the computer system is really unavailable?
_ How likely is it that computer system unavailability goes undetected (i.e.
neither the ping mechanism nor the thermometer indicate unavailability
when the system is actually unavailable)?
_ The computer system is unavailable, but neither the ping mechanism nor
the thermometer indicate unavailability. How likely is it that the ping
mechanism is faulty? How likely is it that the thermometer is faulty?