Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation
Preprint
Zhuang, Jun, Jin, Haibo, Zhang, Ye et al. (2025). Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation
. 10.48550/arxiv.2505.18556
Zhuang, Jun, Jin, Haibo, Zhang, Ye et al. (2025). Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation
. 10.48550/arxiv.2505.18556