Survey of image deduplication for cloud storage
DOI:
https://doi.org/10.20535/SRIT.2308-8893.2023.4.09Keywords:
image deduplication, cloud computing, cloud storage, image copy detectionAbstract
Increased growth of real-life communication has motivated the creation, transmission, and digital storage of vast volumes of images and video data on the cloud. The explosive increase in virtual/visual image data on cloud servers requires efficient storage utilization that can be addressed using image deduplication technology. Even though the virtual and visual image properties are different, the existing literature uses a similar approach for deduplication checks, which motivated us to consider both image types for this review. This article aims to provide a detailed survey of state-of-the-art visuals as well as virtual image deduplication techniques in a cloud environment, summarizing and organizing them by developing a five-dimensional taxonomy for analysing the features and performance with several non-overlapping categories in each dimension. These include: 1) location of applying deduplication; 2) image feature extraction; 3) time of application; 4) image data partitioning strategy; 5) involvement of user dataset level. Existing image deduplication techniques are categorized into two main categories based on whether the technique involves security. A comparison of techniques is discussed across a set of functional and performance parameters. The current issues are highlighted with the possible future directions to motivate further research studies on the topic.
References
J. Xu, W. Zhang, S. Ye, J. Wei, and T. Huang, “A lightweight virtualmachine image deduplication backup approach in cloud environment,” in 2014 IEEE 38thAnnual Computer Software and Applications Conference, pp. 503–508.
M. Chen, S. Wang, and L. Tian, “A High-precision Duplicate Image Deduplication Approach,” JCP, 8(11), pp.2768–2775, 2013.
F. Rashid, A. Miri, and I. Woungang, “Secure image deduplication through image compression,” Journal of Information Security and Applications, 27, pp. 54–64, 2016.
D. Perra and J.M. Frahm, “Cloud-scale Image Compression Through Content Deduplication,” in BMVC, 2014.
J. Xu, W. Zhang, Z. Zhang, T. Wang, and T. Huang, “Clustering-based acceleration for virtual machine image deduplication in the cloud environment,” Journal of Systems and Software, 121, pp.144–156, 2016.
Z. Lei, Z. Li, Y. Lei, Y. Bi, L. Hu, and W. Shen, “An Improved Image File Storage Method Using Data Deduplication,” in 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 638–643.
J. Zhang et al., “IM-Dedup: An image management system based on deduplication applied in DWSNs,” International Journal of Distributed Sensor Networks, 9(7), p.625070, 2013.
S. Youjun and Z. Daxing, “Research on deduplication technology for massive image file storage,” Computer Applications and Software, 4, p. 15, 2014.
M. Chen, Y. Wang, X. Zou, S. Wang, and G. Wu, “A duplicate image deduplication approach via Haar wavelet technology,” in 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, vol. 2, pp. 624–628).
A.J. Zargar, N. Singh, G. Rathee, and A.K. Singh, “Image data-deduplication using the block truncation coding technique,” in 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), IEEE, pp. 154–158.
N. Yusof, A. Ismail, and N.A.A. Majid, Deduplication image middleware detection comparison in standalone cloud database.
H. Gang, H. Yan, and L. Xu, “Secure image deduplication in cloud storage,” in Information and Communication Technology-EurAsia Conference, pp. 243–251. Springer, Cham, 2015.
T.Y. Wen, Large Scale Image Deduplication. Available: http://vision.stanford.edu/teaching/cs231a_autumn1213_internal/project/final/writeup/nondistributable/Wen_Paper.pdf
F. Rashid and A. Miri, “Secure image data deduplication through compressive sensing,” in 2016 14th Annual Conference on Privacy, Security and Trust (PST), IEEE, pp. 569–572.
N. Yusof, N.A.A. Majid, and A. Ismail, “Framework deduplication image detection assisted multimedia system using multi technique,” in 2016 6th International Workshop on Computer Science and Engineering, WCSE 2016, pp. 402–406.
S.P. Bini and S. Abirami, “Secure image deduplication using SPIHT compression,” in 2017 International Conference on Communication and Signal Processing (ICCSP), IEEE, pp. 0276–0280.
T. Koike, M.Z. Nurshafiqah, and T. Kinoshita, “Data Deduplication for Similar Image Files,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 296–301, 2018.
R. Aathira and V.P. Poonthottam, “An efficient approach towards image deduplication using WATSON,” in 2017 International Conference on Inventive Computing and Informatics (ICICI), IEEE, pp. 180–183.
C. Lee, S. Kim, and E. Kim, “A Deduplication-Enabled P2P Protocol for VM ImageDistribution,” IEICE TRANSACTIONS on Information and Systems, 98(5), pp. 1108–1111, 2015.
A. Agarwala, P. Singh, and P.K. Atrey, “Client Side Secure Image Deduplication Using DICE Protocol,” in 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), IEEE, pp. 412–417.
M.S. Soofiya and S.V. Kumar, DCT Image Compression and Secure Deduplication with Efficient Convergent Key Management.
M. Ma, Kernel-space Inline Deduplication File Systems for Virtual Machine Image Storage; Doctoral dissertation, Chinese University of Hong Kong, 2013.
D. Nistr and H. Stewnius, “Scalable recognition with a vocabulary tree,” in IN CVPR, pp. 2161–2168, 2006.
S.E. Ebinazer and N. Savarimuthu, “An efficient secure data deduplication method using radix trie with bloom filter (SDD-RT-BF) in cloud environment,” Peer-to-Peer Networking and Applications, 14(4), pp. 2443–2451, 2021.
G. Zhang, H. Xie, Z. Yang, X. Tao, and W. Liu, “BDKM: A blockchain-based secure deduplication scheme with reliable key management,” Neural Processing Letters, pp. 1–18, 2021.
D.P. Akarsha, S. Chaudhari, and R. Apama, “Coarse-to-Fine Secure Image Deduplication with Merkle-Hash and Image Features for Cloud Storage,” in 2021 Asian Conference on Innovation in Technology (ASIANCON), IEEE, pp. 1–6.
V. Kanagamani and M. Karuppiah, “Zero knowledge-based data deduplication using in-line Block Matching protocol for secure cloud storage,” Turkish Journal of Electrical Engineering & Computer Sciences, 29(4), pp. 2067–2083, 2021.
J. Ouyang, H. Zhang, H. Hu, X. Wei, and D. Dai, “Enhanced Deduplication Protocol for Side Channel in Cloud Storages,” International Journal of Network Security, 23(2), pp. 270–277, 2021.
S. Vinoth Kumar, L. Kruthika, K. Pooja, H.J. Priyanka, and N.R. Rachana, “Image Deduplication in DriveHQ Cloud,” Journal of Computational and Theoretical Nanoscience, 17(9-10), pp. 3895–3898, 2020.
N.M. Tyj and G. Vadivu, “Adaptive deduplication of virtual machine images using AKKA stream to accelerate live migration process in cloud environment,” Journal of Cloud Computing, 8(1), pp. 1–12, 2019.
S. Saharan, G. Somani, G. Gupta, R. Verma, M.S. Gaur, and R. Buyya, “QuickDedup: Efficient VM deduplication in cloud computing environments,” Journal of Parallel and Distributed Computing, 139, pp. 18–31, 2020.
C. Lin, Q. Cao, J. Huang, J. Yao, X. Li, and C. Xie, “HPDV: A highly parallel deduplication cluster for virtual machine images,” in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), IEEE, pp. 472–481.
S.S. Patra, S. Jena, J.R. Mohanty, and M.K. Gourisaria, “DedupCloud: an optimized efficient virtual machine deduplication algorithm in cloud computing environment,” Data Deduplication Approaches: Concepts, Strategies, and Challenges, 281, 2020.
S.K. Nayak and S. Tripathy, “SEDS: secure and efficient server-aided data deduplication scheme for cloud storage,” International Journal of Information Security, 19(2), pp. 229–240, 2020.
D. Reinsel, J. Gantz, and J. Rydning, “Data Age 2025: The Evolution of Data to Life-Critical,” Seagate, an IDC White Paper 2017.
Q. He, Z. Li, and X. Zhang, “Data deduplication techniques,” in 2010 International Conference on Future Information Technology and Management Engineering (FITME), pp. 430–433.
Kirti Ashok Tayade and G.S. Malande, “Survey paper on a secure and authorized deduplication scheme using hybrid cloud approach for multimedia data,” in 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), IEEE, pp. 2966–2969.
Shieh Fatemeh, Mostafa Ghobaei Arani, and Mahboubeh Shamsi, “Deduplication approaches in cloud computing environment: a survey,” International Journal of Computer Applications, 120, no. 13, 2015.
W. Xia et al., “A comprehensive study of the past, present, and future of data deduplication,” Proceedings of the IEEE, vol. 104, pp. 1681–1710, 2016.
“Data deduplication in the cloud explained, part one,” ComputerWorld. Accessed on: Dec 1, 2021. [Online]. Available: https://www.computerworld.com/article/2474479/data-deduplication-in-the-cloud-explained--part-one.html
“Data deduplication in the cloud explained, part two: the deep dive,” Computer-World. Accessed on: Dec 1, 2021. [Online]. Available: https://www.computerworld.com/article/2475106/data-deduplication-in-the-cloud-explained--part-two--the-deep-dive.html