spiegelhalter:etal:93 describe techniques for estimating parameters (i.e. the conditional probabilities) of such a network, where these parameters can be represented by additional nodes connected to a set of networks corresponding to each of a set of cases. The parameters can be given independent Dirichlet distributions and, with complete data, standard conjugate Bayesian updating is straightforward. With incomplete data a number of different analytic approximations have been suggested, but in fact a simulation solution is easily implemented.
Figure 24 illustrates the asia2 network in which
represents the unknown conditional probability of
bronchitis? given smoking?. Note that
replaces the known conditional probability matrix p.bronchitis used in the first asia network described above. The observed part of the network is represented by the replicated plates.
We illustrate learning about
with
a dataset of five cases, in which
the true value for smoking is not observed for case 2, who has
bronchitis and dyspnoea, and case 3, whose only positive feature is
an x-ray.
Figure 24: Graphical model for asia2 example, with additional node
representing the unknown conditional probability of
bronchitis? given smoking?
The data file now does not contain values for p.bronchitis, but does have observed data for five cases.
Data for asia2 example
list(p.asia = c(0.99, 0.01),
p.tuberculosis = c(0.99, 0.01,
0.95, 0.05),
p.smoking = c(0.50, 0.50),
p.lung.cancer = c(0.99, 0.01,
0.90, 0.10),
p.xray = c(0.95, 0.05,
0.02, 0.98),
p.dyspnoea = c(0.9, 0.1,
0.2, 0.8,
0.3, 0.7,
0.1, 0.9)
asia = c(1,1,1,1,1),
smoking = c(2,NA,NA,1,1),
tuberculosis = c(1,1,1,1,1),
lung.cancer = c(2,1,1,1,1),
bronchitis = c(2,2,1,2,1),
xray = c(2,1,2,2,1),
dyspnoea = c(2,2,1,2,2))
The BUGS code (shown below) now requires the observables to be vectors, and has put independent Dirichlet prior probability distributions with parameters (1,1) (i.e. uniform priors) on each of the unknown conditional distributions p(bronchitis | smoking=no) and p(bronchitis | smoking=yes).
Asia2: model specification in BUGS
model Asia2;
const
N = 5; # number of cases
var
asia[N],smoking[N],tuberculosis[N],lung.cancer[N],
bronchitis[N],either[N],xray[N],dyspnoea[N],
p.asia[2],p.smoking[2],p.tuberculosis[2,2],theta.b[2,2],
p.lung.cancer[2,2],p.xray[2,2],p.dyspnoea[2,2,2],prior[2];
data in "asia2.dat";
{
for (i in 1:N){
smoking[i] ~ dcat(p.smoking[]);
tuberculosis[i] ~ dcat(p.tuberculosis[asia[i],]);
lung.cancer[i] ~ dcat(p.lung.cancer[smoking[i],]);
bronchitis[i] ~ dcat(theta.b[smoking[i],]);
either[i] <- max(tuberculosis[i],lung.cancer[i]);
xray[i] ~ dcat(p.xray[either[i],]);
dyspnoea[i] ~ dcat(p.dyspnoea[either[i],bronchitis[i],])
}
# priors for unknown probabilities
for (j in 1:2){
theta.b[j,] ~ ddirch(prior[]); # theta.b = p(bronchitis | smoking)
prior[j] <- 1;
}
}
Analysis
100000 iterations after a 1000 iteration burn-in took 36 seconds and led to posterior mean estimates (standard deviations) of p(bronchitis | smoking=no) = .52 (.21) and p(bronchitis | smoking=yes) = .66 (.23). In addition we estimate that for cases 2 and 3 respectively, there is a probability .56 and .37 that they are smokers.