Africa Health Indicators: A Machine Learning Approach

Adeyemo Emmanuel
5 min readJan 21, 2025

--

In this analysis, we explore health indicators in Africa, leveraging machine learning techniques to predict key outcomes and derive meaningful insights. This project employs a structured data pipeline, combining preprocessing, model training, evaluation, and visualization to assess health data effectively. The notebook for the analysis can be found here.

Objective

The main goal of this analysis is to predict health-related outcomes, focusing on life expectancy. By building a robust predictive model, we aim to understand the relationships between various health indicators and evaluate their importance.

Top 10 Countries by Life Expectancy

The green bar chart highlights the top-performing countries in terms of life expectancy, all exceeding the regional average. Countries like Tunisia, Cabo Verde, Seychelles, and Mauritius lead the way with life expectancies close to or above 70 years.

Factors Contributing to High Rankings:

  1. Strong Healthcare Systems: These countries have invested in robust healthcare infrastructure, ensuring better access to quality services.
  2. Economic Stability: Relatively higher GDPs per capita enable greater healthcare spending.
  3. Low Disease Burden: Effective management of communicable diseases like HIV and malaria contributes significantly.
  4. Public Health Initiatives: High vaccination rates, better maternal and child healthcare, and focus on preventative care play crucial roles.

Bottom 10 Countries by Life Expectancy

The red bar chart shows the countries with the lowest life expectancy, such as Sierra Leone, South Sudan, Somalia, and Lesotho, with figures hovering around 50 years or lower.

Challenges Leading to Low Life Expectancy:

  1. Conflict and Political Instability: Countries like South Sudan and Somalia have been plagued by long-standing conflicts, disrupting healthcare systems and displacing populations.
  2. High Disease Burden: HIV/AIDS, malaria, and tuberculosis remain rampant in countries like Lesotho and Zimbabwe.
  3. Maternal and Child Mortality: Limited access to prenatal care and skilled birth attendants contributes to high mortality rates in Sierra Leone and Central African Republic.
  4. Economic Constraints: Low government spending on healthcare due to poverty and economic instability exacerbates poor health outcomes.
  5. Undernourishment: Countries like Chad and Nigeria face significant food insecurity, which impacts overall population health.

Key Observations

  1. Regional Disparities: The Middle East and North African (MENA) countries generally perform better, while sub-Saharan Africa, particularly western and central regions, struggles.
  2. Policy Implications: The disparities underline the urgent need for targeted investments in healthcare, disease prevention, and economic development, especially in the bottom-ranking countries.
  3. Success Stories: Countries like Tunisia and Seychelles demonstrate that strategic investments in healthcare and education can significantly boost life expectancy.
BarCharts showing top 10 and the bottom 10 countries by Life Expectancy.

Correlation Analysis of Health Indicators

Positive Correlations:

Maternal Mortality, Infant Mortality, Neonatal Mortality, Under-5 Mortality: These variables show very strong positive correlations (close to +1). For example:

Infant mortality and under-5 mortality: 0.98. Neonatal mortality and infant mortality: 0.92. This suggests that higher values in one metric (e.g., infant mortality) tend to be associated with higher values in the others. This makes sense because these metrics are interconnected measures of child health. Prevalence of HIV & Incidence of Tuberculosis: There is a strong positive correlation (0.75), indicating that regions or countries with high HIV prevalence also tend to have a high tuberculosis incidence, likely due to weakened immune systems in HIV-positive populations.

Negative Correlations:

Life Expectancy & Mortality Metrics:

Life expectancy is strongly negatively correlated with: Maternal mortality (-0.76). Infant mortality (-0.86). Neonatal mortality (-0.80). Under-5 mortality (-0.87). This indicates that as mortality rates decrease, life expectancy improves, which is consistent with the general trend that better healthcare outcomes lead to longer lives. Life Expectancy ↔ Prevalence of HIV (-0.41): Countries with higher HIV prevalence tend to have lower life expectancy, which aligns with the significant health burden HIV imposes.

Weak or No Correlation:

Year & Health Metrics: The “year” variable has weak or negligible correlations with most variables. This could mean there is no direct temporal trend in these metrics across the dataset.

Health Expenditure & Life Expectancy (-0.13): Surprisingly, there is a weak negative correlation here, suggesting that higher health expenditure does not directly correspond to higher life expectancy in this dataset. This might indicate inefficiencies in healthcare systems or other confounding factors.

Implications:

Policies targeting maternal, infant, and child health could have the most substantial impact on improving life expectancy. Addressing the prevalence of HIV and tuberculosis is crucial for improving public health in affected regions. Health expenditure requires deeper analysis to understand its weak correlation with outcomes like life expectancy — perhaps stratifying by country income levels or efficiency could shed more light.

Correlation Heatmap.
BarCharts showing top 10 and the bottom 10 countries by Maternal Mortality.
BarCharts showing top 10 and the bottom 10 countries by Infant Mortality.
BarCharts showing top 10 and the bottom 10 countries by Neonatal Mortality.
BarCharts showing top 10 and the bottom 10 countries by Under 5 Mortality.
BarCharts showing top 10 and the bottom 10 countries by HIV Prevalence.
BarCharts showing top 10 and the bottom 10 countries by Tuberculosis Incidence.
BarCharts showing top 10 and the bottom 10 countries by Undernourishment Prevalence.

Model Development

We created the model predict health-related outcomes, focusing on life expectancy using the Random Forest Regressor model. The Random Forest Regressor strikes an excellent balance between performance, interpretability, and ease of use for this health dataset. It captures the complex relationships between health indicators and life expectancy while providing actionable insights into feature importance.

Chart showing Residuals vs. Actual Values.
Chart showing the comparison of Predicted vs. Actual Values.

Key Insights

  1. The model demonstrated reasonable predictive performance.
  2. Feature importance analysis highlighted critical health indicators impacting life expectancy, providing actionable insights for policymakers.
  3. Visual inspections of residuals and actual vs. predicted plots confirmed the model’s reliability, with minimal bias.

Conclusion

This analysis showcased the application of machine learning techniques to understand and predict health indicators in Africa. The results underscore the potential of data-driven solutions to address pressing health challenges, offering valuable tools for decision-making and strategic planning.

--

--

No responses yet