A selective mitigation technique of soft errors for DNN models used in healthcare applications: DenseNet201 case study

Adam, Khalid and Izzeldin, I. Mohd and Ibrahim, Younis (2021) A selective mitigation technique of soft errors for DNN models used in healthcare applications: DenseNet201 case study. IEEE Access, 9 (9419032). 65803 -65823. ISSN 2169-3536. (Published)

Preview

Pdf (Open access)
A selective mitigation technique of soft errors.pdf
Available under License Creative Commons Attribution.
Download (4MB) | Preview

DOI/Official URL: https://doi.org/10.1109/ACCESS.2021.3076716

Abstract

Deep neural networks (DNNs) have been successfully deployed in widespread domains, including healthcare applications. DenseNet201 is a new DNN architecture used in healthcare systems (i.e., presence detection of the surgical tool). Specialized accelerators such as GPUs have been used to speed up the execution of DNNs. Nevertheless, GPUs are prone to transient effects and other reliability threats, which can impact DNN models’ reliability. Safety-critical systems, such as healthcare applications, must be highly reliable because minor errors might lead to severe injury or death. In this paper, we propose a selective mitigation technique that relies on in-depth analysis. First, we inject the DenseNet201 model implemented on a GPU via NVIDIA’s SASSIFI fault injector. Second, we perform a comprehensive analysis from the perspective of kernel and layer to identify the most vulnerable portions of the injected model. Finally, we validate our technique by applying it to the top-vulnerable kernels to selectively protect the only sensitive portions of the model to avoid unnecessary overheads. Our experiments demonstrate that our mitigation technique achieves a significant reduction in the percentage of errors that cause malfunction (errors that lead to misclassification) from 6.463% to 0.21% . Moreover, the performance overhead (the execution time) of our technique is compared with the well-known protection techniques: Algorithm-Based Fault Tolerance (ABFT), Double Modular Redundancy (DMR), and Triple Modular Redundancy (TMR). The proposed solution shows only 0.3035% overhead compared to these techniques while correcting up 84.8% of the SDC errors in DenseNet201, remarkably improving the healthcare domain’s model reliability.

Item Type:	Article
Additional Information:	Indexed by Scopus
Uncontrolled Keywords:	Convolutional neural networks; DenseNet201; Healthcare; GPUs; Soft error , reliability
Subjects:	Q Science > QA Mathematics > QA76 Computer software T Technology > T Technology (General)
Faculty/Division:	Institute of Postgraduate Studies College of Engineering
Depositing User:	Mrs Norsaini Abdul Samat
Date Deposited:	30 Jul 2021 08:03
Last Modified:	30 Jul 2021 08:03
URI:	http://umpir.ump.edu.my/id/eprint/31730
Download Statistic:	View Download Statistics

Actions (login required)

View Item