UKB is a repository of research data sourced from ~ 500,000 UK-wide participants aged around 40–70 years old, recruited from 22 assessment centers during 2006–2010 . We used data collected for each participant from enrollment to March 26, 2021. In brief, data in the UKB repository was grouped into 277 categories, and we retrieved those related to (i) socioeconomic factors (categories 100,066, 100,063, and 100,064); (ii) lifestyle factors (categories 100,058, 100,054, 100,052, 100,051, 100,057, and 143); (iii) environmental pollution factors (categories 114 and 115); (iv) health outcome factors (categories 2002, 100,074, 100,060, 137, and 100,092) (Additional file 1: Table S1) . Note that although an individual’s SES and lifestyle may change over time, we used the baseline survey data to define the socioeconomic and lifestyle status of each participant. A research protocol for our study has obtained all necessary approvals from the UKB’s review committees. We accessed to the UKB cohort consisting of 502,462 individuals. Following Yang and Zhou [30, 31], we removed individuals: (i) who have sex mismatched; (ii) who are redacted and thus do not have a corresponding ID; (iii) who have missing information on socioeconomic factors or other covariates. Finally, we retained 412,258 participants in UKB for subsequent analysis (Fig. 1a).
In US NHANES, we included 101,316 participants surveyed from 1999 to 2018, and followed Zhang et al. to remove individuals: (i) who were less than 20 years old; (ii) who were pregnant; (iii) who had missing information on socioeconomic factors or other covariates; (iv) who had non-positive sample weights for an interview or health examination in the datasets . Finally, we retained 45,671 participants in US NHANES for subsequent analysis (Fig.