Introduction

In this project, we intend to analyze the (Brain Stroke Dataset, n.d.) corresponding to brain stroke disease. A stroke is a medical condition in which poor blood flow to the brain causes cell death. There are mainly two main types of strokes: ischemic and hemorrhagic. Doctors have associated the cause of brain stroke with several factors, such as medical conditions, occupation, type of residence, marital status, and smoking. We intend to show if there is any correlation between these attributes and brain stroke.

The dataset provides attribute information about the patient, which includes:

  1. Gender,

  2. Age,

  3. Whether the patient has Hypertension,

  4. Whether the patient has heart disease,

  5. Work type,

  6. Average glucose level,

  7. Body mass index,

  8. Whether the patient smokes,

  9. Whether the patient has strokes.

In the project, we will give the correlation between these attributes, applying statistics-related fields ( that is, regression analysis and neural network to classify stroke counts between different variables. This will further help observe whether any of the attributes contribute to increasing the chances of brain stroke. In other words, we want to find out the dominating factors leading to brain stroke by fitting models with various combinations of causes. To accomplish this goal, we first perform data cleansing and then study different models formed with various combinations of factors, considering various algorithms.

We believe that analyzing this dataset will allow us to understand how the knowledge gained from the data analysis can be utilized to detect causes that amplify the risk of this fatal disease. Later, we’ll be able to apply this process to study other related areas. Further, this study can enable collaboration with researchers from fields of biology and medical sciences to examine related topics.

Overview of Dataset

The following table provides a summary of our dataset. The rows of the column depicts the proportion of gender impacted by the variables. For instance, in the row of heart_disease, the number 112 (3.9%) indicates 112 women out of 2907 have hear disease.

Table 1. Summary of Dataset
Variable N Overall, N = 4,9811 Gender
Female, N = 2,9071 Male, N = 2,0741
age 4,981 45 (25, 61) 45 (27, 61) 46 (22, 61)
hypertension 4,981 479 (9.6%) 264 (9.1%) 215 (10%)
heart_disease 4,981 275 (5.5%) 112 (3.9%) 163 (7.9%)
ever_married 4,981 3,280 (66%) 1,948 (67%) 1,332 (64%)
work_type 4,981


children
673 (14%) 317 (11%) 356 (17%)
Govt_job
644 (13%) 390 (13%) 254 (12%)
Private
2,860 (57%) 1,704 (59%) 1,156 (56%)
Self-employed
804 (16%) 496 (17%) 308 (15%)
Residence_type 4,981


Rural
2,449 (49%) 1,424 (49%) 1,025 (49%)
Urban
2,532 (51%) 1,483 (51%) 1,049 (51%)
avg_glucose_level 4,981 92 (77, 114) 91 (76, 112) 94 (78, 117)
bmi 4,981 28 (24, 33) 28 (23, 33) 29 (24, 32)
smoking_status 4,981


formerly smoked
867 (17%) 464 (16%) 403 (19%)
never smoked
1,838 (37%) 1,194 (41%) 644 (31%)
smokes
776 (16%) 441 (15%) 335 (16%)
Unknown
1,500 (30%) 808 (28%) 692 (33%)
stroke 4,981 248 (5.0%) 140 (4.8%) 108 (5.2%)
1 Median (IQR) or Frequency (%)

Graphical Analysis

In this section, we provide graphical correlation between different attributes that has hypothetical connection with brain stroke. The aim is to give pictorial description of the correlation with the variables.

Correlartion plot

The following plot depicts the association between each of the variable in the dataset. For instance, age and avg_glucose_level have approximately 0.6 correlation between indicating the strength of positive association.

Stroke counts considering various factors. (numerical observations)

The following picture shows the relation between age and percentage of people having brain stroke. It indicates that older population tend to have brain stroke.

The following picture shows the relation between average glucose level and percentage of people having a brain stroke. It indicates that people with an average glucose level between 60 to 120 do not have brain stroke whereas, people with brain stroke have a moderate correlation with average glucose level.

The following picture shows the relation between body mass index and percentage of people having a brain stroke. It indicates that people with brain stroke are likely to have bmi from 25 to 35 and people with no brain stroke does not have high or low bmi.

Based various catagories

The following bar chart shows the proportion of brain stroke among various qualitative categories such as residence type, work type marital status and etc.

Divided by gender

The following graph categorizes the stroke count with respect to gender. It displays the impact of variables on stroke count based on gender identification. The following graphs show how age is correlated with stroke count based on gender. The plot indicates the impact of age factor on stroke count is independent of gender.

The following graphs show how average glucose level is correlated with stroke count based on gender. The plot indicates the impact of the average glucose level on stroke count is slightly higher for females as compared to male.

The following graphs show how body mass index is correlated with stroke count based on gender. The plot indicates the impact of body mass indices on stroke count are independent of gender.

Fitting Logistic model

## [1] "Accuracy in test set=  0.938152610441767"

## [1] "Accuracy in training set=  0.934421841541756"

Prediction using neural network

## [1] "Accuracy is=  0.723750250953624"

Concluding remarks

References


  1. Dept. of Math, Tulane University, https://sites.google.com/view/moslemuddin↩︎

  2. Dept. of Math, Tulane University, http://naufilsakran.com/↩︎