Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesCopyBotsEarn
Researchers develop method to potentially jailbreak any AI model relying on human feedback

Researchers develop method to potentially jailbreak any AI model relying on human feedback

CointimeCointime2023/11/27 20:30
By:Cointime

Researchers from ETH Zurich have developed a method to potentially jailbreak any AI model that relies on human feedback, including large language models (LLMs), by bypassing guardrails that prevent the models from generating harmful or unwanted outputs. The technique involves poisoning the Reinforcement Learning from Human Feedback (RLHF) dataset with an attack string that forces models to output responses that would otherwise be blocked. The researchers describe the flaw as universal, but difficult to pull off as it requires participation in the human feedback process and the difficulty of the attack increases with model sizes. Further study is necessary to understand how these techniques can be scaled and how developers can protect against them.

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.
APR up to 10%. Always on, always get airdrop.
Lock now!

You may also like

Visionary or 'financial comedy'? Market participants weigh MicroStrategy's stock premium amid bitcoin buying spree

MicroStrategy, with a market cap of around $85 billion, currently holds 331,200 bitcoin worth about $30 billion.The stock’s 440% year-to-date surge has baffled some financial pundits, while others have cheered its corporate strategy.

The Block2024/11/26 19:44

Congress’s top priorities this lame duck session

Here’s a look at what lawmakers are most focused on in these final weeks of the 118th Congress

Blockworks2024/11/26 18:33

BTC breaks through $94,000

Cointime2024/11/26 16:55