cran version rstudio mirror downloads

Case Based Reasoning using Statistical Models

The R package case-based-reasoning provides an R interface case-based reasoning using machine learning methods.

Introduction: What is Case Based Reasoning?

Case-Based Reasoning (CBR) is an artificial intelligence (AI) and problem-solving methodology that leverages the knowledge and experience gained from previously encountered situations, known as cases, to address new and complex problems. CBR relies on the principle that similar problems often have similar solutions, and it focuses on identifying, adapting, and reusing those solutions to solve new problems.

The CBR process consists of four main steps:

CBR has been successfully applied in various domains, including medical diagnosis, legal reasoning, customer support, and design optimization. Its ability to learn from experience and adapt to new situations makes it a valuable approach in fields where expertise and problem-solving skills are crucial.

In the context of observational studies, Case-Based Reasoning (CBR) can be integrated with statistical models to enhance the process of searching for similar cases, especially when dealing with large and complex datasets. By applying statistical techniques, the system can identify patterns, relationships, and associations among variables that are relevant to the problem at hand. This approach can lead to more accurate and efficient retrieval of relevant cases, ultimately improving the quality of the derived solutions (See our Vignettes).

Installation

CRAN

install.packages("CaseBasedReasoning")

GITHUB

install.packages("devtools")
devtools::install_github("sipemu/case-based-reasoning")

Features

This R package provides two methods case-based reasoning by using an endpoint:

Besides the functionality of searching for similar cases, we added some additional features:

Warning Message

“Warning: Cases with missing values in the dependent variable (Y) or predictor variables (X) have been dropped from the analysis. This may lead to a reduced dataset and potential loss of information. Please review your data and consider appropriate missing value imputation techniques to mitigate these issues.”

Example: Cox Beta Model

Initialization

In the first example, we use the CPH model and the ovarian data set from the survival package. In the first step, we initialize the R6 data object.

library(tidyverse)
library(survival)
library(CaseBasedReasoning)
ovarian$resid.ds = factor(ovarian$resid.ds)
ovarian$rx = factor(ovarian$rx)
ovarian$ecog.ps = factor(ovarian$ecog.ps)

# initialize R6 object
cph_model = CoxModel$new(Surv(futime, fustat) ~ age + resid.ds + rx + ecog.ps, data=ovarian)

Similar Cases

After the initialization, we may want to get for each case in the query data the most similar case from the learning data.

n <- nrow(ovarian)
trainID = sample(1:n, floor(0.8 * n), F)
testID = (1:n)[-trainID]
cph_model = CoxModel$new(Surv(futime, fustat) ~ age + resid.ds + rx + ecog.ps, data=ovarian[trainID, ])

# fit model 
cph_model$fit()

# get similar cases
matched_tbl = cph_model$get_similar_cases(query = ovarian[testID, ], k = 3)

To analyze the results, you can extract the similar cases and training data and combine them:

These columns help organize and interpret the results, ensuring a clear understanding of the most similar cases and their corresponding query cases.

Distance Matrix

The distance matrix is a square matrix that represents the pairwise distances between a set of data points. In the context of Case-Based Reasoning (CBR), the distance matrix captures the dissimilarities between cases in the training and test (or query) datasets, based on the fitted model and the values of the predictor variables.

The distance matrix can be helpful in various situations:

In summary, a distance matrix can provide valuable insights into the relationships between cases, facilitate the identification of similar cases for CBR, and aid in the validation of the chosen statistical models.

ditance_matrix = cph_model$fit$calc_distance_matrix()

cph_model$calc_distance_matrix() calculates the distance matrix between train and test data, when test data is omitted, the distances between observations in the test data is calculated. Rows are observations in train and columns observations of test. The distance matrix is saved internally in the CoxModel object: cph_model$distMat.

Contribution

Responsible for Mathematical Model Development and Programming

Medical Advisor

Funding

The Robert Bosch Foundation funded this work. Special thanks go to Professor Dr. Friedel (Thoraxchirugie - Klinik Schillerhöhe).

References

Main

Other