Usman Yusuf ABUBAKAR^{1}, Sunday A. REJU^{2} and Bamidele O. AWOJOYOGBE^{3}
^{1}Department of Mathematics and Computer Science, Federal University of Technology, Minna, Nigeria
^{2}National Open University, Lagos, Nigeria
^{3}Department of Physics, Federal University of Technology, Minna, Nigeria
Abstract
A Markov decision model is applied to determine the optimal cost of drugs in the treatment of leprosy disease. The result shows that the minimum cost is obtained if the Doctor is visited and the patient is placed on the MDT instead of traditional treatment and Dopsone monotherapy. The paper presents a theoretical framework to study the cost of treatment of leprosy where the relevant data could be obtained.
Keywords
Markov Chain, Decision; Leprosy; Treatment; Cost; Dopsone  Monotherapy; MultiDrug Therapy
Introduction
Leprosy is an infectious disease of the skin and nerves caused by mycobacterium Leprae. It is wide spread in the tropical and sub tropical countries including Nigeria. Leprosy can be treated and cured with the MultiDrug Therapy (MDT) or DapsoneMonotherapy even without the associated deformity if it was detected early [1],[2]. Prevalence rate and incidence of new cases of leprosy has be studied [3]. In addition, kinship coefficient has been used to study leprosy rates in villages of different distance apart [4]. In a related study, [5] has studied the relapses in leprosy after treatment with Dapsonemonotherapy and MultiDrugTherapy (MDT) respectively. In this paper, we determine the best alternative in terms of drugtype in the treatment of leprosy disease.
Let X_{n} denote the state of the process at time n and a_{n} the action chosen at time n, then the above is equivalent to stating that:
P(X_{n+1}=jX_{0},a_{0}, X_{1,a1},…..X_{n}=i, a_{n}=a) = P_{ij}(a) 
(1) 
This is the transition probability of a visit to state j given that state i was last visited when action a was chosen. Thus the transition probabilities are dependent on the present state and subsequent action [6].
A policy; by a policy we mean a rule for choosing actions. A policy is a sequence of decisions, one for each state of the process. It is denoted by d_{i}^{(n)} where n = 1, 2, … represent time and i = 1, 2,….represent state.
Following [[7]], we consider an aperiodic, irreducible Markov chain with m states (m<_{}) and the transition probability matrix:
_{} 

With every transition i to j associate a reward R_{ij}. If we let V_{i}^{(n)} be the expected total earnings (reward) in the next n transitions, given that the system is in state i at present . A simple recurrence relation can be given for _{} as follows:
_{} where i = 1,2,3,…..,m; n = 1,2 … 
(2) 
Let _{}
Equation (2) can now be written as
_{} 
(3) 
setting n= 1,2,…. we get
_{} 

_{} 

where P_{ij}^{(n)} is the (i,j)^{th} element of the matrix P^{n}
Let _{} _{} 

Equation (3) can be put in matrix notation as
_{} 

Extending this to a general n, we have
_{} 

Let _{} be the nstep transition probability of an aperiodic, irreducible, mtate Markov chain. Let _{} be the limiting probabilities. Then there exist constants c and r with c > 0 and 0 < r < 1, such that _{} where _{}
Proof
From [7], we know that
_{} and hence _{} 

From [7], we also have _{}; setting n = kN + 1, we get k = (n  1)/N and
_{}_{} 

The theorem now follows by setting c = (12ε_{n})^{1} and r = (12ε_{n})^{1/n}.
Consequently, _{}
Where _{}is the limiting distribution of the Markov chain and _{}with c > 0 and 0 < r < 1. Therefore, as _{}, _{}_{}_{} geometrically.
Let _{} and _{}
In Matrix notation, we can write P^{n} as _{}
Substituting this in (3) we get:
_{} 
(4) 
In deriving (4), we have noted that _{} _{}
Now consider the sum
_{} 
(5) 
It should be noted that each term in _{}is less than or equal to cr^{k} (c > 0,0 < r < 1) in absolute value, and hence for large n the second term on the right hand side of (5) approaches an m x m matrix with zero elements. For the same reason the last term in (4) approaches a null matrix for large n.
Thus, asymptotically we have
_{} 

which gives
_{} 
(6) 
where we have written _{}
writing
_{} and _{} 

so that (6) can be put in the form
_{} 

which shows that for large n, _{}is a linear function of q for every i. Further, for different values of i,_{}are represented by parallel straight lines with slope q and intercept _{}
So far, we have considered the transition probability matrix P and the reward matrix R as given.
Instead, suppose that the decision maker has other alternatives and so is able to alter elements of P and R. To incorporate this feature, we define D as the decision set, and under decision K_{}D, let ^{K}P_{ij} and ^{K}R_{ij} be the probability of the transition from i to j and the corresponding reward respectively for ^{K}v_{i}^{(n). }The expected total earnings in n transitions under decision K; we have the recurrence relations (K=0 represents the optimal decision)
_{} 
(7) 
giving
_{} 

where we have written _{}
Recursive relation (7) gives an iteration procedure to determine the optimum decision d_{i}^{(n)}_{}D. For i=1,2…m and n=1,2…
This is a Standard technique in Dynamic programming and it has been shown by Bellman [8],[9], that this iteration process will converge on best alternative for each state as_{}.
The method is based on recursively determining the optimum policy for every n, that would give the maximum value.
However, one major drawback of the method is that, there is no way to say when the policy converges into a stable policy; therefore, the value iteration procedure is useful only when n, is fairly small
We consider a new case of leprosy, that is, a person showing clinical signs of leprosy, with or without bacteriological confirmation of the diagnosis (state 1), or a new case with WHO disability grade 1 (state 2), or a new case with WHO disability grade 2 (state 3) [1,2]. Thus, we have a three states model. We assumed that all the states communicate (Ergodic). Let the transition between the states be represented by the transition diagram shown in figure 1, governed by the following transition probabilities of Markov chain in (1) and Markov reward matrices respectively.
Figure 1. The Transition Diagram for Leprosy Patients
_{}….where 0 £ p_{ij }£ 1 and _{}, i = 1,2 and 3 and _{}; i, j = 1, 2, 3.
Figure 2 presented the possible realization of the state transitions.
Figure 2. A possible realization of the state transitions
At any given point in time and state of the disease; the patient has the opportunity or a privilege to make a choice of drugs. The choice of drugs may however be dependent on the availability of the drugs, resources available to the patient and the state or severity of the disease. The two major types of drugs used for the treatment of leprosy are Dapsone Monotherapy and MultiDrug therapy (MDT) [1,2] respectively.
For the sake of simplicity we may further divide the drugs into Low priced drugs and high priced drugs to correspond to Dapsone and MDT respectively. We shall have the following alternatives for the states [10]:
· State1:
o Alternative 1: Selfmedication (Traditional treatment)
o Alternative 2: Go to see a doctor (cost of seeing a Doctor).
· State 2:
o Alternative 1: Low priced drugs
o Alternative 2: high price drugs
· State 3:
o Alternative 1: Continue without change of drugs
o Alternative 2: Change drugs (from low to high priced or vice versa).
Costs are associated with each of these alternatives.
Our objective is to obtain the policy or alternative that minimizes the costs of the treatment at any given state and time.
Let the transition probabilities (P_{ij}) and the corresponding reward (R_{ij}) be given as follows:
_{}; i, j = 1 ,2, 3
and
_{}; i, j = 1 ,2, 3
Let D be the decision set and we have two Alternative decisions available to the patient. That is, Alternative 1; and Alternative 2; Thus in every state we have k=1,2_{}D.
We shall now determine the best policies for every n using equation (7).
Since we are concerned about minimizing the costs of drugs to be administered; the alternative that yields the least value constitutes the best policy for the state and time.
Results
Suppose we have the following stationary transition probabilities and the corresponding cost matrices.
_{}
and
_{}in N 1000
respectively
_{} and _{}
i, j = 1, 2, 3 for k = 1 i, j = 1, 2, 3 for k =1
_{} and _{}
i, j = 1, 2, 3 for k = 2 i, j = 1, 2, 3 for k = 2
We shall use these values to determine the best polices for each n.
The summary of the results is presented in Table1.
n 
d_{1}^{(n)} 
d_{2}^{(n)} 
d_{3}^{(n)} 
^{o}V_{1}^{(n)} 
^{o}V_{2}^{(n)} 
^{o}V_{3}^{(n)} 
1 2 3 4 5 
2 2 2 2 2 
1 2 2 2 2 
2 2 2 2 2 
1,200 2,850 4,340 5,900 7,480 
2,300 4,450 6,260 7,880 9,470 
2,400 4,160 6,020 7,680 9,270 
Discussion
The results indicate the
best policies for each n. d_{i}^{(n)} where n = 1, 2, 3,4, 5
and i = 1, 2, 3.Thus, we have obtained the best policies for the three states for
five years. In addition to the best policies, the corresponding expected
minimum costs are also provided. For instance, d_{1}^{(1)} = 2
with ^{o}V_{1}^{(1)} = 1,200 means that the best policy
for state 1 for the first year is to see the doctor and the corresponding
expected minimum cost is one thousand and two hundred Naira (N 1200.00).
The results revealed that except for the d_{2}^{ (1)} = 1 with the corresponding ^{o}V_{2}^{(1)} = 2,300, the best policy for the other states at each time is the alternative 2. Which means that the best policy for each other state at each time is the second alternative. This is a kind of convergence to a stable policy The convergence is so smooth that further iteration beyond five years is not necessary. This type of convergence is not generally true of this iterative algorithm, however, it makes this illustration very interesting.
References
[1]. WHO, WHO Expert Committee on Leprosy. Seventh report, Technical report Series 874, 1997.
[2]. ILEP, ILEP Medical Bulletin No 14 Operational Guidelines for the Introduction of new MDT Regimens for the Treatment of leprosy, 1998.
[3]. Lechat M. F., The torments and blessings of the Leprosy epidemometric model, Leprosy Review, 1981, 52, p. 187196.
[4]. Bechelli L. M., Barrai I., Gallego P. et all., Correlation between leprosy rates in villages different distances apart, Bulletin World Health Organization, 1973, 48, p. 257378.
[5]. Abubakar U. Y., A Study of Relapses in Leprosy: A SemiMarkov Approach, Abacus, The Journal of the Mathematical Association of Nigeria, 2004, 31(2B), p. 370377.
[6]. Ross S. M., Introduction to Probability Models. Academic Press, Inc. New York, 1989.
[7]. Bhat U. N., Element of applied Stochastic Processes, John Wiley New York, 1984.
[8]. Bellman R., Dynamic Programming. Princeton University Press, Princeton, 1957.
[9]. Bellman R., A Markov Decision Process, J. Math. Mech., 1957, 6, p. 679684.
[10]. Abubakar U.Y., The Application of Markov and SemiMarkov Models to the Control of Catarrh and Leprosy diseases, PhD Thesis (Unpublished), Federal University of Technology Minna, 2004.