Free Essay: case based reasoning in data mining - 5221 Words

case based reasoning in data mining

A Case-Based Retrieval System for Diabetic Patients
Therapy
Stefania Montani 1 Riccardo Bellazzi1 , Luigi Portinale 2 Stefano Fiocchi 3 and Mario Stefanelli1
,
,
Abstract. We propose a decision support tool based on the Case
Based Reasoning technique, meant to help physicians in the retrieval of past similar cases, able to provide a suggestion about the revision of diabetic patients’ therapy scheme. A case is deﬁned as a set of features collected during a visit. A taxonomy of prototypical situations, or classes, has been formalized; a set of cases belonging to these classes has been stored into a relational data-base. For each input case, the system allows the physician ﬁnd similar situations that already took place in the past, both for the same patient and for different ones. The reasoning process consists of two steps: 1) ﬁnding the classes to which the input case could belong; 2) ﬁnding the most similar cases from these classes, through a nearest neighbor technique. The tool is integrated in the EU funded T-IDDM (Telematic management of Insulin Dependent Diabetes Mellitus) project.

1 INTRODUCTION
Case-based Reasoning (CBR) [1, 2, 3] is a problem solving paradigm that utilizes the speciﬁc knowledge of previously experienced situations, called cases. Each case is usually described by a set of features, and is associated to a solution (or decision) and an outcome. CBR basically consists in retrieving past cases that are similar to the current one and in reusing (by, if necessary, adapting) past successful solutions; the current case can be retained and put into the base of cases.
CBR can hence be viewed as a methodology able to combine reasoning and learning steps and to produce solutions to problems by taking into account past experience.
The use of CBR technologies is particularly appealing in health care, where the capability of intelligently retrieving similar situations in a data-base of past cases can be a useful instrument for maintaining and improving the “intellectual capital” of the hospital institution over years. In several application contexts, and in particular in chronic patients’ management, a large amount of data is collected for a long time; in this situation, CBR may considerably decrease the probability of making the same wrong decision in similar situations, even when the decision-maker is different (different physicians in the same or in different hospitals).
In this paper we will describe an application of case-based retrieval techniques in the context of clinical monitoring of insulindependent diabetic patients. In current medical practice, diabetic patients are visited several times a year. During these visits the patient condition is assessed by analyzing the self-monitoring data; the therapeutic protocol is then revised. In the control visits, physicians are interested in detecting if the same situation has already been experienced by the patient or by a similar patient, and, in that case, in
1 Dipartimento di Informatica e Sistemistica, Universit` di Pavia, via a Ferrata 1, I-27100 Pavia, Italy, stefania@aim.unipv.it, ric@aim.unipv.it, mstefa@aim.unipv.it 2 Dipartimento di Informatica, Universit` di Torino, corso Svizzera 185, Ia
10149 Torino, Italy, portinal@di.unito.it
3 I.R.C.C.S. Policlinico S. Matteo, P.le Golgi 2, I-27100 Pavia, Italy, gidan@ipv36.unipv.it seeing what decision has been taken and what was the outcome of that decision. This kind of support may be particularly interesting, considering that each diabetologist may visit more than a hundred patients every month; in this context, keeping track of the history of all patients is quite complex. For these reasons, our proposed tool was designed to select in the data-base past cases that are in some sense similar to the current one; the result of the search is shown to physicians for comparison and further adaptation to the current case.
A crucial capability of the proposed tool is the possibility of analyzing the overall history related with the retrieved cases, showing the sequence of decision steps that precede and follow the retrieved situation. The paper is structured as follows: Section 2 presents, from a methodological point of view, the peculiarities of the CBR system herein proposed and its relationships with IDA. Section 3 deals with a short description of the application domain, while Section 4 and 5 describe in detail the current implementation of the system. Section
6 presents some results obtained on a data-base of 100 cases collected at the Policlinico S. Matteo Hospital of Pavia. Finally, Section
7 addresses some conclusions and some directions for future work.

2 CBR AND IDA: THE SYSTEM DESIGN
Several medical applications may require the improvement of classical CBR solutions, in order to explicitly take into account the prior domain knowledge. In fact, the “intelligence” of a CBR system resides in its case base, since the knowledge is implicit in cases [4].
While classical approaches restrict the role of the prior knowledge in coping with the task of features selection, a further step is to encode background knowledge as a set of prototypical cases. Such knowledge may be helpful in all phases of the CBR process, from feature selection to adaptation. In particular, case retrieval, the most computationally expensive step of the method, can gain in efﬁciency from a knowledge-based structuring of the case-base. Rather interestingly, case retrieval involves different substeps that are related with an intelligent analysis of the available data 4 such steps may be summarized
;
as follows [1]:
1. situation assessment, to elaborate case description and to clarify the relevant context to work with;
2. case memory search, to retrieve partially matching cases;
3. best case selection.
In biomedical applications, situation assessment may be exploited to identify the contextual framework on which the actual retrieval has to take place; therefore, this step may be very critical in order to focus the search of the case library on relevant parts. Moreover, as a by-product, retrieval may perform a context recognition that can be useful for decision making.
4 CBR can be viewed as a generalization of Instance-Based learning [7]

Situation assessment and case search are strongly inﬂuenced by the organization structures on which the case memory is based on.
Such structures range from ﬂat memories where cases are stored sequentially in a very simple way, such as in lists or feature vectors, to more complex organizations based on graphs and trees like shared features nets and discrimination nets [2]. Obviously a trade off is present between memory structure maintenance and computational cost of retrieval; in fact, while simple structures like those used in ﬂat memory organizations are easy to update, more complex structures may be severely impacted in their organization where an update takes place.
In our application we have implemented a method to make case retrieval more ﬂexible, by structuring the case memory through the partitioning induced by a set of prototypical classes. A taxonomy representing such classes is assumed to be available and cases are suitably associated with each basic class. Background knowledge is hence exploited to deﬁne a retrieval strategy in which situation assessment step is obtained through a classiﬁcation procedure, that associates the current case to a class in the taxonomy.
In more detail, we consider a tree-structured taxonomic knowledge. Each class in the hierarchy is a prototypical description of the set of problems or situations it summarizes and is connected to classes representing its specializations through sub-class links.
Classes corresponding to leaves in the taxonomy are called basic classes: each case in the case base is associated to a unique basic class and can be retrieved through such a class (see Figure 1).
Figure 1 shows a schematic view of this organization.
GENERIC CASE

C1

C2

CN

CASE BASE

Figure 1. Case Base Organization

The root class of the hierarchy (the C ASE class) represents the most general class describing the whole set of cases we may store into the case base. Cases in this organization represent instances of basic classes, with the constraint that a given case can be instance of at most one class 5
.
Finally, a single case is viewed as a collection of features or variables describing the problem represented by the case, a possible solution applied in the case under examination and the outcome obtained by applying the solution itself.
More formally a case C can be viewed as a triple

C = fhV : vi; hS : si; hO : oig where v is a vector of values for the set of descriptive variables V , s is the solution schemata selected from the solution space S and o

is the outcome of the solution selection in the space of the possible outcomes O. In general the outcome is expressed as a change in some of the variables in V , but it may also be more qualitative such as a positive or negative outcome.
In the following, we will apply the above presented methodology to the problem of IDDM patient monitoring.

3 INFORMATION TECHNOLOGIES FOR
MANAGING DIABETES MELLITUS
Diabetes Mellitus is one of the major chronic diseases in the industrialized countries, characterized by an insufﬁcient secretion of insulin by the pancreatic beta-cells. In particular, patients affected by Insulin Dependent Diabetes Mellitus (IDDM) need exogenous insulin injections to regulate glucose metabolism, thus preventing hyperglicemia, ketoacidosis and coma. Moreover, IDDM may lead to long term invalidating complications, such as nephropaties, neuropaties and retinopaties. In 1993 the DCC prospective clinical trial
[5] has proved that complications may be prevented or at least delayed by keeping blood glucose level (BGL) into normality ranges.
This result can be achieved by the application of Intensive Diabetes
Therapy (IDT), consisting in three to four insulin injections a day. To apply IDT, an IDDM patient has to measure BGL three to four times a day, before insulin injections and meals, and to record BGL and insulin doses data on hand-written diaries. Every two-four months, the physician visits the patient, and evaluates the diary data, together with additional parameters coming from physical examination and from blood analysis. On the basis of these indicators, therapy has to be revised, in order to solve the eventual metabolic alterations. To help patients and physicians in the phases of data collection, data analysis and therapy revision, the European Community has funded the T-IDDM (Telematic management of Insulin Dependent Diabetes
Mellitus) project [6], within which we have implemented two modules, a Patient Unit and a Medical Unit, connected by a telecommunication system. The Patient Unit is a set of tools to be used at home, meant to guide the patient in the self monitoring activity and to let the automatic download of the BGL data, collected by commercial instruments for BGL measurements such as reﬂectometers. Data can then be sent to the Medical Unit, that provides the physician with data analysis and decision support tools, to help in the identiﬁcation of the patient’s problems and therefore in the revision of the therapy.
Finally the new therapeutic protocol chosen after the visit can be sent back from the hospital to the Patient Unit.
During a visit, the physician may identify a metabolic condition similar to one that already occurred to the same patient, or to another patient she is following. Retrieving the therapy scheme adopted on that occasion, and evaluating if it proved to be helpful or not, may give a useful indication on how to solve the current situation. Therefore we are providing the Medical Unit with a Case Based Reasoning
(CBR) tool for decision support.

4 REPRESENTING AND RETRIEVING CASES
IN THE DIABETES DOMAIN
4.1 Cases representation

5 This constraint, together with the assumption of a strict hierarchy among

Our approach is based on an analysis of the decision making process involved in the periodic control visits. In this context, the variables of set V (see Section 2) are extracted from three sources of information:

classes can clearly be a limitation for some applications (when a case may belong to more than one class or when a class may be subclass of more than one class); however, this kind of approach is reasonable in our application domain and greatly simpliﬁes both retrieval and classiﬁcation.

Historical characterization, summarized by variables that generically describe the patient, such as sex, age, distance from diabetes onset, and so on.

Puberty

Puberty with associated diseases
Typical puberal problems
No motivation

Behavioural problems Falsifier
Change
life style
Clinical
remission

Clinical course Patient’s problems Insulin resistant Stabilized metabolism Nutritional disorders Anorexia
Bulimia

Celiac disease
Other
alterations

As described in Section 3, the case-base has been structured into a
.
set of classes, whose taxonomical organization is shown in Figure 2 7
The classes describe the set of prototypical situations that may occur during the periodic control visits.
The classiﬁcation step is realized on a subset of the case features, and in particular on the set that have been considered “a priori” more useful to discriminate the classes. The following features have been chosen for this task: sex, job, puberal stage, other chronic diseases, distance from onset, weight excess, diet, control trend, requirement trend, metabolic control, hypoglycemias, physical activity.

Hormonal disorders Figure 2. Taxonomy of classes of prototypical situations that may happen during monitoring of pediatric IDDM patients.

Mid-term metabolic control, expressed through variables collected during the visits, like weight and Glycated Hemoglobin (HbA1c) values 6
.
Short term (day-by-day) metabolic control, derived by the collection of home monitoring data, like BGL and insulin doses data.
On the basis of this information, the physician decides the therapeutic protocol to be applied, as well as the diet and/or some recommendations about the patient’s life-style. The outcome of the therapeutic decision is then evaluated at the following visit, usually considering HbA1c and the number of hypoglycemic events.
In more detail, we have deﬁned 27 features for pediatric patients,
21 nominal and 6 linear (continuous); these features comprise the three sources of information expressed above, and in particular 11 refer to the historical characterization, 13 to the mid-term metabolic control and 3 to the short-term one. It is important to remark that some features are abstractions of raw data; for example, we exploited the control trend and the requirement trend features to describe the variation of requirement and HbA1c during the last months. The set can be reﬁned by applying temporal abstractions on the patients monitoring data (see some details in [9]).
The decision is expressed considering only insulin protocols. In particular, it is represented by an array containing the insulin types and doses to be injected.
Finally, as it happens in clinical practice, the case outcome is summarized by the HbA1c value and the number of hypoglycemic events at the following visit.
6 HbA1c expresses the percentage of Hemoglobin that is bounded with glu-

cose. Since the half-life of red-blood cells (and hence of hemoglobin) is around 60 days, HbA1c is an index of the metabolic control achieved in the last two months: if HbA1c is higher than normal, this means that the mean level of BGL has been excessively high in the last monitoring period

4.2 Retrieving similar cases in IDDM patients monitoring Classiﬁcation. As explained above, the retrieval phase is not carried out on the whole cases data-base; a classiﬁcation algorithm reduces the search space to the most probable classes to which the input case could belong.
Various approaches are now available for coping with the classiﬁcation tasks[8]. Among them, we have chosen a Bayesian classiﬁcation method that allowed us to explicitly consider the available prior knowledge, thus making up for reduced training data sets.
In particular, we have used a Naive Bayes strategy [10], that makes the hypothesis of conditional independence among the different features given a certain class. Although this approach makes such a strong assumption, it is known to be quite robust in a variety of situations [10, 11]. Moreover, in the same Bayesian context it will be possible to improve the classiﬁcation performances moving towards a Bayesian Network representation [12].
The probability that a case belongs to class ci given that the set of
^
its features f = f1 ; : : : ; fM is f may be calculated as:

f

g

P (ci j f = f^) /

M
Y p ci p f j j =1

( ) (

f j ci )

= ^ j The method classiﬁes a case as belonging to the class that maxi^ mizes P (ci f = f ).
^
The conditional probabilities p(fj = fj ci ) are calculated by applying the Bayesian update formula for discrete distributions [13,
14] ; in particular, we used a re-parameterized version of the update formula known as m-estimate of probability [8], that modiﬁes the prior knowledge with the information coming from the cases of the data-base as follows:

j

j

p(fj = f^j j ci )

=

^ m pij + Nij
^
m + Di

^ where Nij represents the number of cases in the data-base of class i whose feature fj assumes the value f^j , while Di is the total number of cases in class i. The medical knowledge is synthesized by the prior probability distribution( pij ), whose reliability is expressed by
^
the implicit number of samples m. This means that the expert judging the prior states the probability by relying on a ﬁctitious sample size of m; thus larger is m, larger is the conﬁdence of the expert on the prior.
7 The taxonomy should be of course revised for adults. Note for example

the presence of the puberal stage class and the absence of cardiovascular complications, that are frequent in adult patients

Table 1.

feature name sex height age neuropaties other chronic diseases puberal stage job retinopaties anti insulin antibodies nephropaties distance from onset weight weight excess
HbA1c
other hormonal disorders requirement trend control trend regular insulin
NPH insulin premixed insulin premixed ratio number of injections diet requirement metabolic control hypoglycemias physical activity

type

The features deﬁning a case

values characterization nominal
[male, female] linear (continuous) linear (continuous) nominal [yes,no] nominal [unrelated, related to hyperglycemia, related to hypoglycemia, absent] nominal [infant, beginning puberal, puberal, adult] nominal [not-sedentary-worker, sedentary-worker, not-sedentary-student sedentary-student] nominal
[yes,no]
nominal
[yes,no]
nominal
[yes,no]
nominal
[short, long] mid-term features linear (continuous) nominal [overweight, underweight, normal] linear (continuous) nominal [yes,no] nominal [increase, decrease, stationarity] nominal [increase, decrease, stationarity] nominal [Regular, Actrapid] nominal [monotard, protaphane, intermediate] nominal [isophane, actraphane] nominal [90/10, 80/20, 70,30, 60/40, 50/50] linear (continuous) nominal [free, prescribed, controlled] linear (continuous) short-term features nominal [good, hypoglycemias, hyperglycemias, instable] nominal [none, some, many] nominal [none, intensive-continuous, medium-continuous, light-continuous, intensive-occasional, medium-occasional, light-occasional] 4.3 Retrieval
Physicians will be allowed to retrieve cases of the same patient (intrapatient retrieval) and of other patients (inter-patients retrieval). To keep track of the cases history each case belonging to the same patient is connected to the previous and to the following one by two chains of pointers: in this way, it is possible to retrieve all patient’s history, so verifying the outcomes of the therapeutic choice on the metabolic control, in both short and long periods. Once the case has been classiﬁed, similar cases belonging to the most probable class are retrieved through a nearest-neighbor technique, implementing the
Euclidean-Overlap Metric (HEOM)[15]:

HEOM

=

sX df x; y f (

)2

where
- df (x; y ) = 1, if x or y are missing
- df (x; y ) = overlap(x; y ) if f is a symbolic feature, (i.e. 0 if x = yx?yotherwise)
,1
j
- rangej if f is a numeric and continuous feature. f The search for similar cases can be extended to the set of most probable classes by using the Heterogeneous Value Difference Metric (HVDM) formula [15], able to take into account numeric and symbolic variables as well:

HV DM

=

sX f df (x; y)2

where
- df (x; y ) = 1, if x or y are missing
- df (x; y ) = normf (x; y ) if f is a symbolic variable x - j4 ?yj if f is a numeric and continuous variable. f In more detail

5. Prune the intervals whose lower bound is bigger than BEST , and remove the Pivot from the set of cases
6. Back to step 2.
We are currently testing an algorithm able to adapt the past effective solutions to the current situation, by implementing a voting strategy. The therapeutic adjustments applied in the past cases are weighted proportionally to the opposite of the distance between the retrieved situation and the current one. The more the cases are similar, the more the therapy revision suggested by the system will be similar to the one adopted in the past. Our tool will just propose a possible solution, but the physician will decide whether to accept it or to edit a new one.

5 IMPLEMENTATION
The CBR decision support tool is integrated in the T-IDDM Webbased environment, managed by lispweb, an HTML/HTTP server written in Common Lisp (see [17] for further details). The interaction between the CBR reasoning tool and the user (deﬁnition of a new case, classiﬁcation and retrieval) takes place through a set of HTML pages, containing dynamically generated information, such as multicolumn tables and forms. Figure 3 shows the list of retrieved cases from the “celiac disease” class after the HEOM distance formula has been applied to an example case. Each case is summarized by the
HbA1c value measured at the control visit time and by the insulin protocol adopted since the case date; the new prescribed protocol represents the case solution, whose outcome is synthesized by the
HbA1c measured on the following periodic visit. The search space for retrieval is the collection of cases stored in an Oracle TM database, whose table structure mirrors the classes taxonomy; each leaf of the taxonomy tree matches a table, whose columns correspond to the case features, and whose rows are instances of the class at hand.

P

normf (x; y) = c j Nfxc ? Nfyc j
Nfx
Nfy where Nfxc is the number of cases in which f = x in class c, and Nfx is the number of cases in which f = x in all the considered classes; the same applies to value y .
When the number of cases in the data-base is very high, a Pivoting strategy [16] can be adopted. The mechanism consists in: computing the distance between a representative case and all the other cases of the class at hand computing the distance between the representative case and the input case estimating the distance between the input case and all the remaining cases in the class by using triangle inequality, thus ﬁnding a lower and an upper bound for the distance value.
The intervals whose lower bound is higher than the minimum of all the upper bounds can be pruned. Then the following iterative procedure is applied:

1

fg

1. Initialization: BESTp = e SOL =
2. Choose the Pivot case as the minimum of the midpoints of the intervals; compute the distance between the input case and the
Pivot (DIST ); set BEST = DIST ;
3. If BESTp > BEST set SOL = PIV OT and BESTp =

BEST

4. If BESTp = BEST set SOL =

fPIV OT; SOLg

Figure 3. Output of the case retrieval.

6 RESULTS
An analysis of the system performances has been carried out on a data set that was made available by the Endocrinological Unit of the
Pediatric Department of the Policlinico S.Matteo Hospital of Pavia.
About 100 cases, extracted from the history of 11 patients, have been used to construct the initial case-base; 3 physicians were interviewed for deﬁning the prior probability distributions. In order to validate our classiﬁcation approach, we have ﬁrst applied a cross-validation

technique on the cases data-base. Each case has been removed from its class and re-classiﬁed; the probability distributions have been updated every time, so that the case at hand was not taken into account.
Classiﬁcation proved to be correct in more then 90% of the cases, a result improvable till 96% by excluding the class “falsiﬁer”. This class identiﬁes young patients that report wrong values of Blood Glucose Levels, in order to avoid complaints by parents and physicians on their life-style. It is hence clear that the classiﬁcation step is quite difﬁcult in this situation, even for physicians. In any case, the poor performances with this class of patients suggest that the prior knowledge should be revised, in order to express more clearly what are the relevant combination of features values. As a second step, we have automatically generated more then 10000 random cases, starting from the probability distribution of the case-base. The results of validation on this expanded data-base proved to be encouraging as well: an error of less than 10% occurred in the validation of each class, excluding again class “falsiﬁer” where the error rate reached
14%. The results of the cross-validation and validation experiments are shown in Table I. As a further veriﬁcation step, we plan to conduct a prospective validation on real patients’ cases, to compare the
Bayesian classiﬁer suggestions with a physician’s opinion.
A ﬁnal remark must be given on computational efﬁciency. The classiﬁcation time is of the order of milliseconds, while retrieval with the HEOM formula ranges from 2 seconds (in the 100 cases database) to 23 seconds (in the 10000 cases data-base). The complexity of
HVDM is known to be O(mnC ) [15], being m the number of features, n the number of cases and C the number of classes (proportional to the number of cases). Retrieval time with the HVDM algorithm ranges from 5 seconds (2 classes in the 100 cases data-base, i.e. about
20 cases), to more than 500 seconds on the whole 10000 cases database. The application of the Pivoting algorithm provides a speedup in the retrieval task, as the computation time grows linearly with the number of cases in the search space. In this situation retrieval time ranges from 3 seconds (on 1 class of the 100 cases data-base,
i.e. about 10 cases) to 170 seconds on the entire 10000 cases database. Hence, when the most suitable algorithm is selected, the system proves to be really efﬁcient in producing results to the user.
Table 2: Cross-validation and Validation Error rates expressed in terms of ratio of incorrect classiﬁed cases and related percentages.

Class stabilized metabolism insulin resistant clinical remission no motivation falsiﬁer change life style celiac disease hormones bulimia anorexia puberty with associated diseases typical puberal problems

Error Rate
Cross-Validation
on real data
0
0
0
0
6/9 (60%)
0
0
1/6 (15%)
0
0
1/3 (30%)

Error rate
Validation
on generated data
69/1014 (7%)
0
63/1011 (6%)
0
147/1009 (14%)
89/1011 (9%)
72/1014 (7%)
0
0
0
29/1003 (3%)

0

98/1005 (10%)

7 FINAL REMARKS AND FUTURE
RESEARCH DIRECTIONS

The proposed CBR system represents a melting-pot where different
IDA techniques are integrated together for decision support. Classiﬁcation, NN-retrieval, qualitative and temporal abstractions are merged for the purpose of coping with the complex problem of IDDM patients management. We believe that this tool will be useful in clinical practice, given that it will be fully integrated with the hospital information system. Of course, several issues must be still addressed, both from a methodological and a technical point of view.
i) In our proposed approach we assume that some taxonomic knowledge about classes of problems to be solved is available because an expert has provided it. When a sufﬁciently large databases of cases will be available, we plan to compare the performances of the proposed classiﬁcation system with the one inducted by means of machine learning techniques, able to perform hierarchical structuring
[18].
Another direction that we plan to investigate is related to the elicitation of prior knowledge by means of ﬁrst order rules; in this case, inductive logic programming techniques could be used to perform learning and classiﬁcation. Medical knowledge could be therefore integrated with information retrieved from the data, in order to identify the most discriminating features for the classiﬁcation task. ii) Prior probability elicitation still remains a crucial problem in the Bayesian approach. In our future research efforts, we plan to derive the prior probability ( pij ) from the concept of “relevance” of a variable. The relevance rij represents the importance that the variable j covers in classifying a case as belonging to class i, and assumes a value between 0 and 1, that can be chosen on the basis of medical knowledge. In terms of knowledge acquisition this approach is more intuitive than asking the physician to quantify directly the values of pij , as we have done so far. We propose an interpretation of the relevance in terms of the entropy Hij of variable j in class i:

Hij

= (1

? rij )Hmax

=

K
X pk log
?
k=1

ij

pk ) ij 2(

where Hmax is the maximum value of Hij , and k is the number of discrete states of the j -th variable.
The bigger is the value of Hij , the more the distribution of the values of feature fj is uniform in class i, and the less fj is useful to classify a case. Given the value of rij , and therefore of Hij , it is possible to ﬁnd a solution of the previous equation when an order among the pij is known, being derived from medical knowledge; therefore a prior probability can be chosen according to the relevance information.
The last step is to derive m, in order to apply the m-estimate update formula. iii) To be really helpful in the decision taking process, a CBR system should suggest a suitable solution to the current problem, by adapting the solutions of the retrieved cases that proved to be effective in the past. In our implementation the adaptation strategy is still on the way; adaptation will also take into account the problem of keeping past solutions up to changing clinical standards. Moreover, a careful evaluation of pros and cons with respect to rule-based systems will be considered; the data collection has already started within the EU project T-IDDM.

ACKNOWLEDGEMENTS
We would like to thank the anonymous referees for their comments, which helped to improve this paper.

REFERENCES
[1] D.B. Leake J.L. Kolodner. ’A tutorial introduction to CBR’, In: Case
Based Reasoning: Experiences, Lessons and Future Directions, AAAI
Press, 31-65, 1996.
[2] J.L. Kolodner. Case-Based Reasoning, Morgan Kaufmann, 1993.
[3] A. Aamodt, E. Plaza. ’Case-Based Reasoning: foundational issues, methodological variations and systems approaches’, AI Communications, 7, 39-59, (1994).
[4] C. Giraud-Carrier, S. Corley. ’Inductive CBR for customer support’,
Proceedings of the Second International Conference of Practical Application of Knowledge Discovery and Data Mining, Howard F. Arner Jr. and Neil Mackin eds., The Practical Application Company Ltd, London,
131-141, 1998.
[5] The Diabetes Control and Complication Trial Research Group, ’The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus’,
The New England Journal of Medicine, 329, 977-86, (1993).
[6] R. Bellazzi, C. Cobelli, E. Gomez and M. Stefanelli, ’The T-IDDM
Project: Telematic management of Insulin Dependent Diabetes Mellitus’, in: Health Telematics ’95, M. Bracale and F. Denoth eds., 271-276,
1995.
[7] D. Aha, D. Kibler, M. Albert, ’Instance-based learning algorithms’, Machine Learning, 6, 37-66, (1991).
[8] T. Mitchell, Machine Learning, Mc Graw Hill, 1997.
[9] R. Bellazzi, C. Larizza and A. Riva, ’Cooperative intelligent data analysis: an application to diabetic patients management’, In: Intelligent Data
Analysis in Medicine and Pharmacology, N. Lavraˇ , E. Keravnou, Blaˇ c z
Zupan eds., Kluwer, 81-98, 1997
[10] I. Kononenko, ’Inductive and Bayesian learning in medical diagnosis’,
Applied Artiﬁcial Intelligence, 7, 317-337, (1993).
[11] I. Zelic, I. Kononenko, N. Lavraˇ , V. Vuga, ’Induction of decision trees c and Bayesian classiﬁcation applied to diagnosis of sport injuries’, Proceedings of IDAMAP 97 workshop, IJCAI 97, Nagoya, Japan, 61-67,
1997.
[12] D. Heckerman, ’Bayesian networks for data mining’, Data Mining and
Knowledge Discovery, 1, 79-119, (1997).
[13] D. Spiegelhalter, A. Dawid, S. Lauritzen, R. Cowell, ’Bayesian Analysis in Expert Systems’, Statistical Science, 8, 219-283, (1993).
[14] A. Riva, R. Bellazzi, ’Learning temporal probabilistic causal models from longitudinal data’, Artiﬁcial Intelligence in Medicine, 8, 217-234,
(1996).
[15] D.R. Wilson, T.R. Martinez, ’Improved heterogeneous distance functions’, Journal of Artiﬁcial Intelligence Research, 6, 1-34, (1997).
[16] L. Portinale, P. Torasso, D. Magro, ’Selecting most adaptable diagnostic solutions through pivoting-based retrieval’, Lecture Notes in Artiﬁcial
Intelligence, 1266, 277-88, (1997).
[17] A. Riva, R. Bellazzi, M. Stefanelli, ’A web-based system for the intelligent management of diabetic patients’, MD Computing, 14, 360-64,
(1997).
[18] B. Zupan, M. Bohanec, J. Demsar, I. Bratko, ’Feature transformation by function decomposition’, IEEE Expert (to appear).

References: Press, 31-65, 1996. [2] J.L. Kolodner. Case-Based Reasoning, Morgan Kaufmann, 1993. [3] A. Aamodt, E. Plaza. ’Case-Based Reasoning: foundational issues, methodological variations and systems approaches’, AI Communications, 7, 39-59, (1994). and Neil Mackin eds., The Practical Application Company Ltd, London, 131-141, 1998. [7] D. Aha, D. Kibler, M. Albert, ’Instance-based learning algorithms’, Machine Learning, 6, 37-66, (1991). [8] T. Mitchell, Machine Learning, Mc Graw Hill, 1997. Zupan eds., Kluwer, 81-98, 1997 [10] I Applied Artiﬁcial Intelligence, 7, 317-337, (1993). and Bayesian classiﬁcation applied to diagnosis of sport injuries’, Proceedings of IDAMAP 97 workshop, IJCAI 97, Nagoya, Japan, 61-67, 1997. [12] D. Heckerman, ’Bayesian networks for data mining’, Data Mining and Knowledge Discovery, 1, 79-119, (1997). [13] D. Spiegelhalter, A. Dawid, S. Lauritzen, R. Cowell, ’Bayesian Analysis in Expert Systems’, Statistical Science, 8, 219-283, (1993). (1996). [15] D.R. Wilson, T.R. Martinez, ’Improved heterogeneous distance functions’, Journal of Artiﬁcial Intelligence Research, 6, 1-34, (1997). Intelligence, 1266, 277-88, (1997). [17] A. Riva, R. Bellazzi, M. Stefanelli, ’A web-based system for the intelligent management of diabetic patients’, MD Computing, 14, 360-64, (1997).

case based reasoning in data mining

You May Also Find These Documents Helpful

Hcr 220 Medical Billing Process Research Paper

Hcr 220 Medical Billing Process Research Paper

PSY 303 Week 2 Assignment Taking The Patient S History

PSY 303 Week 2 Assignment Taking The Patient S History

Dbm 381 Week 1 Individual

Dbm 381 Week 1 Individual

Case Study

Case Study

How To Achieve A Grant Proposal

How To Achieve A Grant Proposal

Psychology 110 Chapter 7 by Chiccarelli

Psychology 110 Chapter 7 by Chiccarelli

Literary Terms & Rhetorical Devices

Literary Terms & Rhetorical Devices

Great Gatsby Vocab

Great Gatsby Vocab

Annotated Bibliography-Psy

Annotated Bibliography-Psy

Formulations in Cbt

Formulations in Cbt

Drive And Homeostasis Contribute To Understanding Sexual Motivation

Drive And Homeostasis Contribute To Understanding Sexual Motivation

Biology 20 Article 1 Human Anatomy The Real Dr

Biology 20 Article 1 Human Anatomy The Real Dr

Record Formats

Record Formats

Parenteral Medication

Parenteral Medication

Clinical Decision Making

Clinical Decision Making

Related Topics