Non-interactive K-mode clustering of high-dimensional categorical data under local differential privacy
Article
Ye, X, Zhu, Y, Zhang, S et al. (2025). Non-interactive K-mode clustering of high-dimensional categorical data under local differential privacy
. INFORMATION SCIENCES, 718 10.1016/j.ins.2025.122417
Ye, X, Zhu, Y, Zhang, S et al. (2025). Non-interactive K-mode clustering of high-dimensional categorical data under local differential privacy
. INFORMATION SCIENCES, 718 10.1016/j.ins.2025.122417
High-dimensional categorical data contains rich user-sensitive information, which poses huge privacy threats to users once leaked during data clustering and analysis. The existing K-mode method under local differential privacy (LDP) always requires multiple user-server interactions, which not only has high communication overhead and computational cost but also makes user privacy vulnerable to malicious attacks during interactions. In this paper, we propose a non-interactive LDP K-mode clustering estimation method for high-dimensional categorical data. We first perform dimensionality reduction on each user data locally through the Fsketch algorithm. Then, we perturb the sketch data, ensuring the perturbation satisfies LDP. The perturbed data is then submitted to the server for K-mode clustering. Finally, on the server, we directly estimate the Hamming distance on the perturbed data to achieve K-mode clustering analysis. It is theoretically proven that our perturbation method satisfies LDP and is unbiased. Compared with state-of-the-art methods, our scheme more accurately estimates the Hamming distance between high-dimensional categorical data, reducing communication overhead due to only one interaction. Extensive experiments demonstrate the effectiveness of our method on four real high-dimensional category data sets, in which our scheme has a smaller normalized intra-cluster variance and larger purity index under the same privacy budget.