I am trying to find the ICD10 codes which are causing certain disease. But ICD10 has alpha numeric classification e.g. A00.00 . There are 1000s of such classifications but I am not sure how to use them in my regression model. Any suggestion please.
Data Patient Existing ICD10 Diabetic (Y) P1 A00.10 1 P2 A00.20 0 P1 C00.1 1 P3 Z01 1 ....
An effective way to do this is to use the concept of comorbidities. My R package icd does this for standardized sets of diseases, e.g. "Diabetes", "Cancer", "Heart Disease." There is a choice of the comorbidity maps, so you can pick one which aligns with your interests, e.g. PCCC maps in icd can be used for pediatrics, the others are for adults and span a variety of disease states.
E.g., as described in the introduction vignette. These are actually ICD-9 codes, but you can use ICD-10.
With "DM" being Diabetes Mellitus, and "DMcx" for being diabetes with complications, e.g., retinopathy or renal failure. This is with the US AHRQ modification of the standard Elixhauser categories.
When you have binary flags for the disease states, you can use these in any statistical or machine learning model.