PolyJuice Makes It Real: Black-Box, Universal Red-Teaming for Synthetic Image Detectors

Sepehr Dehdashtian*, Mashrur M. Morshed*, Jacob H. Seidman, Gaurav Bharaj, Vishnu N. Boddeti

NeurIPS 2025

Red-Teaming Setting

Intuition

distribution shift

PolyJuice Overview

Average image of professions
\( \underset{\mathbf{U}}{\argmax}\ \ \text{Tr}\Big\{\mathbf{U}^\top \mathbf{Z} \mathbf{H} \mathbf{K}_{\mathbf{YY}} \mathbf{H} \mathbf{Z}^\top \mathbf{U}\Big\} \\ \text{s.t.} \quad \mathbf{U}^\top \mathbf{U} = \mathbf{I} \)
\( h_{\mathbf{\delta_t}}(\mathbf{z}'_t) = \mathbf{z}'_t + \lambda_t \mathbf{\delta_t},\quad t=1, \ldots, T-1\)

Steering Visualization

steering visualization

How Successful is PolyJuice in Attacking Synthetic Image Detectors?

Average image of professions

How Effective is PolyJuice When Applied on a T2I Model-Specific Detector?

Average image of professions

How Effective is PolyJuice in Reducing False Negative Rate of Existing SIDs?

Average image of professions

How Transferable are the Directions from Lower to Higher Resolutions?

Average image of professions

Summary

  • PolyJuice:
    • is the first black-box and distribution-based attack,
    • significantly enhances attack success rate,
    • is effective in improving robustness of SIDs when attacks are used to fine-tune them.