vignettes/likelihood-contributions.Rmd
likelihood-contributions.Rmd
You can install the development version of likelihood.model
from GitHub with:
#> Loading required package: devtools
#> Loading required package: usethis
#> Loading required package: algebraic.mle
#> Registered S3 method overwritten by 'algebraic.dist':
#> method from
#> print.dist stats
#> Loading required package: likelihood.model
Consider a series system with \(m\) components, where the \(p\)th component in the \(i\)-th system has lifetime \(T_{i p}\) with rate \(\lambda_p\), and \(T_{i j}\) for all \(i,j\) are independent. The system lifetime is the minimum of the component lifetimes, i.e., \(T_i = \min\{T_{i 1}, \ldots, T_{i m}\}\).
Let’s draw a sample for this model for \(m=3\) and \(\lambda = (1.1, 1.2, 1.3)\) for a sample size of \(n=90\).
n <- 150
rates <- c(1.1, 1.2, 1.3)
set.seed(1235)
df <- data.frame(
t1 = rexp(n, rates[1]),
t2 = rexp(n, rates[2]),
t3 = rexp(n, rates[3])
)
df$t <- apply(df, 1, min)
# map each observation to the corresponding index (component) that was minimum
df$k <- apply(df, 1, which.min)
for (i in 1:n) {
for (p in 1:3) {
if (p == df$k[i]) next
df[i, p] <- NA
}
}
head(df)
#> t1 t2 t3 t k
#> 1 NA 0.320 NA 0.320 2
#> 2 NA 0.045 NA 0.045 2
#> 3 NA 0.049 NA 0.049 2
#> 4 0.083 NA NA 0.083 1
#> 5 NA NA 0.14 0.141 3
#> 6 0.035 NA NA 0.035 1
Column \(t_k\) is the failure time of the \(k\)-th component, column \(t\) is the column for the series system lifetime, the same as \(t_k\), and column \(k\) is the component that caused the system failure.
We see that the component cause of failure is observed and the other component lifetimes are not observed (NA).
In what follows, we consider two cases of masking or censoring, or otherwise incomplete data of some form.
We know the system lifetime and the component cause of failure. (Columns \(t\) and \(k\) are observed.
We observe a system lifetime with a right-censoring mechanism and the component cause of failure.
Suppose we have a series system. As a series system, whenever a component fails, the system fails, and thus the system lifetime is equal to the minimum of the component lifetimes.
Suppose the component lifetimes are exponentially distributed with rate \(\lambda_k\) for component \(k\).
In this case, we assume we can observe the system lifetime and the component that caused the system failure, which means we know a few things about each system in the sample:
We might be tempted to think that we can just consider each observation, where we know the system lifetime and the component cause, and then just estimate the component’s parameter based on its failure time. However, this is incorrect, because when we condition on the component that caused the system failure, we are not observing a random sample of that component’s lifetime, but a conditional sample. For instance, if \(k = 1\), then we are observing \(T_{i 1} | T_{i 1} < T_{i 2} \text{ and } T_{i 1} < T_{i 3}\).
What is this distribution? We can derive it as: \[ f_{T_i|K_i}(t_i | k_i) = f_{T_i,K_i}(t_i,k_i) / f_{K_i}(k_i) = f_{K_i|T_i}(k_i|t_i) f_{T_i}(t_i) / f_{K_i}(k_i). \] By the memoryless property of the exponential distribution, we have \[ f_{K_i|T_i}(k_i|t_i) = f_{K_i}(k_i), \] since the failure rates are constant (and thus independent of the time), and thus the probability that a componet is the cause of a system failure is independent of the time of the system failure.
That means that \[ f_{T_i|K_i}(t_i | k_i) = f_{T_i}(t_i), \] which in the exponential case is just the pdf of the series system. Since the minimum of exponentially distributed random variables is exponentially distributed with a falure rate equal to the sum of the failure rates of the components, this estimator is an estimator of the sum of the failure rates (or the failure rate of the series system).
Instead, we just consider this to be a kind of right-censoring problem, where we have \(m=3\) components, and thus when component \(1\) is the component cause of failure at \(t_i\), that means we observe an exact failure time \(T_{i 1} = t_i\) and right-censor the other two components at \(t_i\). The loglikehood contribution for this observation is \[ L_i = \log f_1(t_i|\lambda_1) + \log R_2(t_i|\lambda_2) + \log R_3(t_i|\lambda_3), \] which simplifies to \[ L_i = \log \lambda_1 - (\lambda_1 + \lambda_2 + \lambda_3) t_i, \] which has a score \[ s_i = \frac{1}{\lambda_1} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} - t_i \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \] and a Hessian \[ H_i = -\frac{1}{\lambda_1^2} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. \]
To get the total log-likelihood, score, and Hessian, we just sum them up (by i.i.d.).
In fact, that’s what we do here. We show how to use the likelihood contribution model. We show two ways to configure the model to dispatch the loglik
and score
functions for the observation type.
We implement methods for the loglik
and score
functions. This allows dispatching based on the observation type, as defined in obs_type
.
score.observed <- function(df, rates, ...) {
rep(-sum(df$t), length(rates)) + (df |> dplyr::count(k))$n / rates
}
loglik.observed <- function(df, rates, ...) {
sum(log(rates[df$k])) - sum(rates) * sum(df$t)
}
model.observed <- likelihood_contr_model$new(
obs_type = function(df) { "observed" }
)
Now that we have the likelihood model, we can use it to estimate the parameters. We use the optim
function to do the numerical optimization. We use the hessian = TRUE
argument to get the Hessian matrix, which we can use to get the standard errors of the estimates.
optim(rates, fn = loglik(model.observed), df=df, control=list(fnscale=-1))
#> $par
#> [1] 1.4 1.3 1.2
#>
#> $value
#> [1] -112
#>
#> $counts
#> function gradient
#> 66 NA
#>
#> $convergence
#> [1] 0
#>
#> $message
#> NULL
Note that likelihood_model
objects have a default method for fitting MLEs, fit
. Internally, this method uses optim
to maximize the log-likelihood, but it packs in some additional information and returns the result as a mle_numerical
object.
rates.hat <- fit(model.observed)(df, par = rates)
summary(rates.hat)
#> Maximum likelihood estimator of type mle_likelihood_model is normally distributed.
#> The estimates of the parameters are given by:
#> [1] 1.4 1.3 1.2
#> The standard error is 0.19 0.18 0.18 .
#> The asymptotic 95% confidence interval of the parameters are given by:
#> 2.5% 97.5%
#> param1 1.00 1.7
#> param2 0.91 1.6
#> param3 0.89 1.6
#> The MSE of the individual components in a multivariate estimator is:
#> [,1] [,2] [,3]
#> [1,] 3.5e-02 1.1e-13 1.7e-12
#> [2,] 1.1e-13 3.3e-02 -1.1e-12
#> [3,] 1.7e-12 -1.1e-12 3.2e-02
#> The log-likelihood is -112 .
#> The AIC is 229 .
We implement hess_loglik.observed
for completeness.
hess_loglik.observed <- function(df, rates, ...) {
p <- length(rates)
H <- matrix(0, p, p)
counts <- df |> dplyr::count(k)
for (j in 1:p) {
H[j, j] <- -counts$n[j] / rates[j]^2
}
H
}
observed_fim(rates.hat)
#> [,1] [,2] [,3]
#> [1,] 2.8e+01 -9.9e-11 -1.5e-09
#> [2,] -9.9e-11 3.1e+01 1.0e-09
#> [3,] -1.5e-09 1.0e-09 3.1e+01
hess_loglik(model.observed)(df, params(rates.hat))
#> [,1] [,2] [,3]
#> [1,] -28 0 0
#> [2,] 0 -31 0
#> [3,] 0 0 -31
Now, we use the Bootstrap method.
model.samp <- sampler(model.observed, df = df, par = rates)
dist <- empirical_dist(model.samp(n = 5000)$t)
print(expectation(dist, function(x) x))
#> [1] 1.4 1.3 1.2
print(vcov(dist))
#> [,1] [,2] [,3]
#> [1,] 0.03483 -0.00063 -0.00178
#> [2,] -0.00063 0.03377 -0.00051
#> [3,] -0.00178 -0.00051 0.03158
print(ginv(vcov(dist)))
#> [,1] [,2] [,3]
#> [1,] 28.80 0.56 1.64
#> [2,] 0.56 29.63 0.51
#> [3,] 1.64 0.51 31.76
We add right-censoring to the model. We assume that the right-censoring times are independent of the component lifetimes and the candidate sets.
In prticular, we consider the data described earlier, but with right-censoring at time \(\tau = 0.5\). The presence of this right-censoring means that we know the event occurred after time \(\tau\) for these censored observations, but we do not know exactly when (and it may still be ongoing).
As a first stab, you might be inclined to just ignore the censored data. However, this is not a good idea, because it will lead to biased estimates. This is problematic because then we are estimating the parameters of a truncated distribution, \(T_i | T_i < \tau\), rather than the original distribution, \(S_i, \tau_i\) where \(S_i = T_i\) if \(T_i < \tau_i\) and otherwise \(S_i = \tau_i\). This is a well-known problem in statistics, and it is why we need to use survival analysis techniques to handle right-censored data.
This is similiar to the earlier problem we described where we said it was a mistake to only focus on the component cause of failure and estimate the parameters of the components independently. In that case, we would be estimating component \(j\)’s parameters from the truncated distribution, \(T_{i j} | K_i = j\), rather than the original distribution, \(T_i = \min\{T_{i 1}, \ldots, T_{i m}\}\). In either case, the censoring or masking mechanisms must not be ignored, otherwise we will get biased estimates.
Here’s what that right-censored data looks like, with right-censoring time of \(\tau = 0.5\).
# if df$t > .5, then the system was right-censored
df.censored <- df |> mutate(
censored = t > .5,
t = ifelse(censored, .5, t)
)
df.censored[df.censored[, "censored"], "k"] <- NA
df.censored[df.censored[, "censored"], paste0("t", 1:3)] <- rep(NA, 3)
head(df.censored)
#> t1 t2 t3 t k censored
#> 1 NA 0.320 NA 0.320 2 FALSE
#> 2 NA 0.045 NA 0.045 2 FALSE
#> 3 NA 0.049 NA 0.049 2 FALSE
#> 4 0.083 NA NA 0.083 1 FALSE
#> 5 NA NA 0.14 0.141 3 FALSE
#> 6 0.035 NA NA 0.035 1 FALSE
We cosntruct their censoring functions with:
loglik.censored <- function(df, rates) {
-sum(rates) * sum(df$t)
}
score.censored <- function(df, rates) {
rep(-sum(df$t), length(rates))
}
hess_loglik_censored <- function(df, rates) {
p <- length(rates)
matrix(0, p, p)
}
model.censored <- likelihood_contr_model$new(
obs_type = function(df) {
ifelse(df$censored, "censored", "observed")
}
)
mle.censored <- mle_numerical(
optim(
rates,
fn = loglik(model.censored),
df = df.censored, hessian = TRUE,
control = list(fnscale = -1)
)
)
summary(mle.censored)
#> Maximum likelihood estimator of type mle_numerical is normally distributed.
#> The estimates of the parameters are given by:
#> [1] 1.3 1.3 1.2
#> The standard error is 0.2 0.2 0.19 .
#> The asymptotic 95% confidence interval of the parameters are given by:
#> 2.5% 97.5%
#> param1 0.93 1.7
#> param2 0.88 1.6
#> param3 0.83 1.6
#> The MSE of the individual components in a multivariate estimator is:
#> [,1] [,2] [,3]
#> [1,] 4.0e-02 -5.4e-12 -2.6e-12
#> [2,] -5.4e-12 3.8e-02 -2.6e-18
#> [3,] -2.6e-12 -2.6e-18 3.6e-02
#> The log-likelihood is -96 .
#> The AIC is 199 .
So, recall the true parameters are \(\lambda = (1.1, 1.2, 1.3)'\). We can see that the estimates are reasonable, particularly for the relatively small sample size.