Dataset

3 sources - 15 claims

Participants spoke in English for 1 to 3 minutes on prompts unrelated to mental health. The dataset was proprietary and lacked external validation on an independently collected public cohort. The dataset has limited generalizability because it came from one institutional data warehouse. The small dataset size may affect performance estimate stability, hyperparameter selection, feature-selection stability, and statistical comparisons. Remote personal-device recordings increased ecological diversity but introduced uncontrolled recording variation. Keyword-based labels and incomplete annotations may create annotation noise in the negative class. The binary classification dataset contained 304 human proteins balanced between Parkinson-associated and control classes. The study began with 285,033 pathology reports from the University of Tennessee Health Science Center Research Enterprise Datawarehouse covering 2020 to 2023. Protein records were obtained from UniProt using Homo sapiens filters. The final cohort contained 10,677 unique samples with at least one clinical staging variable after consolidation and filtering. The dataset used version 2.0 from Zenodo, with 304 sequences after d…