Healthcare cost analysis

Estimated read time 3 min read

DESCRIPTION

Background and Objective:

A nationwide survey of hospital costs conducted by the US Agency for Healthcare consists of hospital records of inpatient samples. The given data is restricted to the city of Wisconsin and relates to patients in the age group 0-17 years. The agency wants to analyze the data to research on healthcare costs and their utilization.

Dataset Description:

Here is a detailed description of the given dataset:

AttributeDescription
Age Age of the patient discharged
Female A binary variable that indicates if the patient is female
LosLength of stay in days
Race Race of the patient (specified numerically)
TotchgHospital discharge costs
AprdrgAll Patient Refined Diagnosis Related Groups

Analysis to be done: 

1. To record the patient statistics, the agency wants to find the age category of people who frequent the hospital and has the maximum expenditure.

2. In order of the severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis-related group that has maximum hospitalization and expenditure.

3. To make sure that there is no malpractice, the agency needs to analyze if the race of the patient is related to the hospitalization costs.

4. To properly utilize the costs, the agency has to analyze the severity of the hospital costs by age and gender for the proper allocation of resources.

5. Since the length of stay is the crucial factor for inpatients, the agency wants to find if the length of stay can be predicted from age, gender, and race.

6. To perform a complete analysis, the agency wants to find the variable that mainly affects hospital costs.

Read the Data from the CSV file

hospitalData <- read.csv(“/Healthcare/Healthcare/HospitalCosts.csv”)

hist(hospitalData$AGE)

attach(hospitalData)
age <- as.factor(hospitalData$AGE)
summary(age)
aggregate(TOTCHG ~ AGE,FUN = sum, data = hospitalData)
max(aggregate(TOTCHG ~ AGE,FUN = sum, data = hospitalData))

which.max(summary(as.factor(hospitalData$APRDRG)))
diagnosiscost <- aggregate(TOTCHG ~ AGE, FUN = sum, data = hospitalData)
diagnosiscost
diagnosiscost[which.max(diagnosiscost$TOTCHG)]

summary(as.factor(hospitalData$RACE))
head(hospitalData)
hospitalData <- na.omit(hospitalData)
hospitalData$RACE <- as.factor(hospitalData$RACE)
model <- aov(TOTCHG~RACE,data = hospitalData)
summary(model)
summary(hospitalData$RACE)
hospitalData$FEMALE <- as.factor(hospitalData$FEMALE)
model1 <- lm(TOTCHG ~ AGE + FEMALE, hospitalData)
summary(model1)
summary(hospitalData$FEMALE)
head(hospitalData)
hospitalData$RACE <- as.factor(hospitalData$RACE)
model2 <- lm(TOTCHG~AGE+FEMALE+RACE, data = hospitalData)

summary(model2)

model3 <- lm(TOTCHG~.,data = hospitalData)
summary(model3)

Loading...