Citation Quality

1 sources - 5 claims

Grok and DeepSeek produced the highest reference scores and most complete references among the audited chatbots. No chatbot produced a fully complete and accurate reference list for any prompt. The chatbots returned about 81% of the requested scientific references. Some models acknowledged that generated references may be unreliable or fictional. Citation outputs frequently contained errors, fabrications, hallucinations, broken links, and incomplete elements.