Akira Otsuka,
Graduate School of Information Security, Institute of Information Security (IISEC) / Prof. Dr.
The focus of security research on agentic AI is rapidly shifting from the safety of standalone models to the security of entire systems that integrate planning, tool use, memory, and multi-agent coordination. It is no longer sufficient for an LLM simply to refrain from generating harmful text. What matters increasingly is whether an agent can be manipulated—accidentally or deliberately—while formulating plans, invoking tools, consulting memory, and adaptively revising its goals in response to external inputs. Research from Stanford University, exemplified by Cybench, demonstrates that state-of-the-art LLMs can solve difficult Capture the Flag (CTF) challenges used in hacker competitions with striking speed, underscoring the inseparability of capability evaluation and misuse-risk assessment. Meanwhile, OpenAgentSafety, developed by researchers at Carnegie Mellon University and the Allen Institute for AI, introduces more than 350 multi-turn tasks in realistic tool-use environments involving browsers, code execution, file systems, bash, and messaging. Their results show that current models still exhibit unsafe or inappropriate behaviors at a high rate, revealing the limitations of existing safeguards and the need for more robust defensive techniques. This talk will examine the security threats posed by agentic AI through concrete case studies and discuss several emerging approaches to mitigation, drawing on recent research in the field.
...
Show More
Show Less