Researchers develop AI lie detector

October 3, 2023

1 min.

Researchers at Yale and Oxford Universities have developed an AI lie detector that can identify falsehoods in large language models (LLMs) by asking a series of unrelated yes or no questions.

The new lie detector works by first establishing what is a normal truthful response for an LLM. This is done by creating a body of knowledge where the LLM can be reliably expected to provide the correct answer.

The researchers then induce falsehoods by using prompts crafted to explicitly urge the LLM to lie. Finally, they prompt the LLM with a series of unrelated yes or no questions that reveal the induced falsehoods.

The researchers trained the lie detector on a dataset of 1,280 instances of prompts, questions, and false answers, along with a matching set of truthful examples. The lie detector developed a highly accurate ability to score false question-answer pairs based on the answers to the elicitation questions.

The researchers tested the lie detector on a variety of unseen question-and-answer pairs from diverse settings, and found that it performed well in all cases. They also found that the lie detector could effectively distinguish lies from truths in real-world scenarios, such as when a chatbot was lying to sell a product.

The researchers are not entirely sure why the elicitation questions work, but they believe that it may be due to the ambiguity of some of the questions. They believe that this ambiguity may give the lie detector an advantage against lying LLMs in the future.

The sources for this piece include an article in ZDNET.

Tags
AI

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

North Korean hacker infiltrates US security vendor, loads malware

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

CrowdStrike CEO summoned by Homeland Security committee over software disaster

Canadian schools sue social media giants over alleged harm to children

ChatGPT mobile mania: Why users are flocking to ChatGPT Plus

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Researchers develop AI lie detector

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Homeland Security committee demands appearance by CrowdStrike CEO

SUBSCRIBE NOW

Related articles

Is Oracle killing off MySQL?

Research Raises Concerns Over AI Impact on Code Quality

Microsoft to train 100,000 Indian developers in AI

NIST issues cybersecurity guide for AI developers

Become a member