Enhanced slicing-based anonymization approach for privacy-preserving data publishing with improved data utility

Mohammed Mahfoudh, Khamis Binjubeir (2024) Enhanced slicing-based anonymization approach for privacy-preserving data publishing with improved data utility. PhD thesis, Universti Malaysia Pahang Al-Sultan Abdullah (Contributors, Thesis advisor: Mohd Arfian, Ismail).

[img]
Preview
Pdf
Enhanced slicing-based anonymization approach for privacy-preserving data publishing with improved data utility.pdf - Accepted Version

Download (4MB) | Preview

Abstract

Data publication is a widely used method for sharing data, particularly in research fields, as it allows for data mining operations to extract valuable knowledge from published databases. This knowledge can be utilized for representation, interpretation, or the discovery of interesting patterns. However, the full potential of published partial data, derived from large datasets or a series of datasets, is yet to be realized, primarily due to various challenges faced by scholars during the extraction of knowledge from published data. One significant challenge is related to data privacy, which often results in the disclosure of individuals' identities, unauthorized access to private information, and the misuse of personal data for unintended purposes. This issue has become a major hindrance to the advancement of published data. To address these concerns and ensure data utility, several anonymization-based approaches have been developed in the field of Privacy-Preserving Data Publishing (PPDP). The effectiveness of data anonymization approaches relies on different protection methods employed to achieve privacy. However, these protection methods often either excessively falsify data or demand an impractically high level of trust in different data-sharing scenarios. Protecting private data from people who must not access this information and the individuals’ capability to determine or infer the identity of individuals who can access their personal information are crucial aspects of data protection methods. Improving protection methods for data publication is crucial to strike a balance between data utility and individuals' privacy, presenting a significant challenge. To achieve effective data anonymization, this study proposes an enhanced approach called the Upper Lower (UL) level-based protection approach, based on the slicing approach. The UL approach aims to strike a better balance between utility and privacy. The study proposes a methodology involving the division of data into horizontal and vertical partitions and leveraging the Lower Protection Level (LPL) and Upper Protection Level (UPL) to compute unique and identical attributes. By swapping these attributes, the published data can be effectively safeguarded against disclosure risks while still preserving adequate diversity. The key idea is to choose a set of attributes to determine the required level of protection and swap between them to improve published data privacy while preserving high data utility. The Adult dataset, which included a real dataset, was used, and according to the results, the UL approach could maintain the data’s usefulness while offering improved privacy preservation. The proposed approach delivers about 92.47% data utility, which is more than what is achieved when the percentage of exchange level is 2% using LPL and 98% using UPL with a 4.5K education dataset. With a 5% swap rate, the proposed approach obtains 92.19% using LPL and 95% using UPL. In conclusion, the UL approach minimizes the risk of data disclosure compared to existing works such as merging, e-DP, Mondrian, composition, probabilistic, and hybrid methods. By employing this approach, data publication can be carried out in a manner that ensures practical usability of data while protecting individuals' privacy. Striking a balance between utility and privacy is crucial, and the UL approach offers a promising solution to achieve this balance.

Item Type: Thesis (PhD)
Additional Information: Thesis (Doctor of Philosophy) -- Universiti Malaysia Pahang – 2024, SV: Dr. Mohd Arfian Ismail, NO.CD : 13616
Uncontrolled Keywords: Privacy-Preserving Data Publishing (PPDP)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty/Division: Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Mr. Mohd Fakhrurrazi Adnan
Date Deposited: 09 Jul 2025 07:50
Last Modified: 09 Jul 2025 07:50
URI: http://umpir.ump.edu.my/id/eprint/44920
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item