MATHEMATICAL MODELLING OF FACTORS FOR MEDICAL INSURANCE COST IN THE UNITED STATES USING ROBUST REGRESSION

Authors

  • Associate Professor Dr. Norizan Mohamed Associate Professor

DOI:

https://doi.org/10.46754/jmsi.2024.06.003

Keywords:

Robust Regression, Outliers; LTS-estimator, MM-estimator, S-estimator

Abstract

The rising cost of medical insurance in the United States requires a thorough understanding of the factors influence it. Many factors can affect the cost of medical insurance, including age, sex, BMI, smoking habits and number of children. Problems arise when analysing data that contain outliers, as individual observations can a large impact on results. Robust regression is one of the useful methods in decreasing the effect of outliers in modelling. Hence, this paper aims to determine the best estimator between three estimators and to test the robustness of the best estimator when the data contaminated with outliers. We then applied to the dataset collected from the US Census Bureau published by Brett Lantz in 2013.  The findings showed that R2 of LTS-estimator, MM-estimator and S-estimator were 0.9813, 0.6735 and 0.9728 respectively.  When the data was contaminated with 10%, 20% and 30% of outliers the R2 values of LTS-estimator were 0.9399, 0.9030 and 0.8678. Thus, it can be concluded that the LTS-estimator can help in producing results that are resistant to outliers.

References

Berenguer-Rico, V., Johansen, S., & Nielsen, B. (2023). A model where the Least Trimmed Squares estimator is maximum likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(3), 886-912. DOI: https://doi.org/10.1093/jrsssb/qkad028

Kula, K. S., Tank, F., & Dalkilic, T. E. (2012). A study on fuzzy robust regression and its application to insurance. Mathematical and Computational Applications, 17(3), 223-234. DOI: https://doi.org/10.3390/mca17030223

Gad, A. M., & Qura, M. E. (2016). Regression estimation in the presence of outliers: A comparative study. International Journal of Probability and Statistics, 5(3), 65-72.

Mahmudah, U., Chamdani, M., Tarmidzi, T., & Fatimah, S. (2020). Robust regression for estimating the impact of student’s social behaviors on scientific literacy. Jurnal Cakrawala Pendidikan, 39(2), 293-304. DOI: https://doi.org/10.21831/cp.v39i2.29842

Blatna, D. (2006). Outliers in regression. Trutnov, 30, 1-6.

Laurikkala, J., Juhola, M., Kentala, E., Lavrac, N., Miksch, S., & Kavsek, B. (2000). Informal identification of outliers in medical data. In Fifth international workshop on intelligent data analysis in medicine and pharmacology (Vol. 1, pp. 20-24).

Aleng, N. A., Naing, N. N., Mohamed, N., & Mokhtar, K. (2017). Outlier detection based on robust parameter estimates. International Journal of Applied Engineering Research, 12(23), 13429-13434.

Susanti, Y., Pratiwi, H., Sulistijowati, S., & Liana, T. (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349-360. DOI: https://doi.org/10.12732/ijpam.v91i3.7

Alma, O. G. (2011). Comparison of robust regression methods in linear regression. International Journal of Contemporary Mathematical Science, 6(9), 409-421.

Andriany, C. D., & Susanti, Y. (2021). Estimasi parameter regresi robust dengan metode estimasi Least Trimmed Squares (LTS) pada kemation ibu di Indonesia. Prosiding Seminar Nasional Aplikasi Sains & Teknologi (SNAST) 2021, 20 Maret 2021 (pp. 9-14).

Rousseeuw, P. J., & Yohai, V. J. (1984). Robust regression by means of S-estimators. In W. H. Franke, & R. D. Martin (Eds), In robust and nonlinear time series analysis. (pp. 256- 272). New York: Springer Verlag. DOI: https://doi.org/10.1007/978-1-4615-7821-5_15

Glen, S. (2021). Linear regression: Simple steps, video. find equation, coefficient, slope. Statistic How To. https://www.statisticshowto.com/probability-and-statistics/regressionanalysis/find-a-linear-regressionequation

Kasuya, E. (2018). On the use of r and r squared in correlation and regression. Ecological Research, 34(1), 235-236. DOI: https://doi.org/10.1111/1440-1703.1011

Brett, L. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications. Packt Publishing.

Friedman, J., Bridlington, E., Guarino, M., & Fisher, C. (2021). Unhealthy Debt: Medical costs and bankruptcies in Oregon (pp. 1-28). OSPIRG: Prontier Groop.

Rousseeuw, P. J., & Hubert, M. (2011). Robust statistics for outlier detection. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 1(1), 73-79. DOI: https://doi.org/10.1002/widm.2

Sakata, S., & White, H. (2001). S-estimation of nonlinear regression models with dependent and heterogeneous observations. Journal of Econometrics, 103(1-2), 5-72. DOI: https://doi.org/10.1016/S0304-4076(01)00039-2

Zuo, Y., & Zuo, H. (2023). Least sum of squares of trimmed residuals regression. Electronic Journal of Statistics, 17(2), 2416-2446. DOI: https://doi.org/10.1214/23-EJS2164

Downloads

Published

22-06-2024