Missing Data

3 sources - 15 claims

Robustness will be assessed by comparing KNN imputation with multiple imputation by chained equations. Major discrepancies between KNN and MICE results will be treated as methodological sensitivity. Missingness is expected to be low because assessments are completed on site before medication administration. Remaining missing values will be imputed with a K-nearest neighbour algorithm after data normalisation. Gower distance will be used for similarity, and k will be optimized by cross-validation, usually between 5 and 10 neighbours. The study will compare multiple imputation and random forest-based imputation to handle missing data. Multiple imputation will use the R package aregImpute and generate five fully imputed datasets in each machine learning iteration. Random forest-based imputation will use missForest to predict missing values by regressing each variable against all other variables. Participants with more than 30% missing data will be excluded before imputation. Outcome data will not be used in imputation to avoid leakage into predictor values.