The effect of different similarity distance measures in detecting outliers using single-linkage clustering algorithm for univariate circular biological data

Nur Syahirah, Zulkipli and Siti Zanariah, Satari and Wan Nur Syahidah, Wan Yusoff The effect of different similarity distance measures in detecting outliers using single-linkage clustering algorithm for univariate circular biological data. Pakistan Journal of Statistics and Operation Research, 18 (3). pp. 561-573. ISSN 2220-5810. (Published)

[img]
Preview
Pdf
Zulkipli et al. PJSOR.pdf

Download (528kB) | Preview

Abstract

Clustering algorithms can be used to create an outlier detection procedure in univariate circular data. The circular distance between each point of angular observation in circular data is used to calculate the similarity measure to appropriately group observations. In this paper, we present a clustering-based procedure for detecting outliers in univariate circular biological data using various similarity distance measures. Three circular similarity distance measures; Satari distance, Di distance and Chang-chien distance were used to detect outliers using a single-linkage clustering algorithm. Satari distance and Di distance are two similarity measures that have similar formulas for univariate circular data. This study aims to develop and demonstrate the effectiveness of the proposed clustering-based procedure with various similarity distance measures in detecting outliers. The circular similarity distance of SL-Satari/Di and other similarity measures, including SL-Chang, were compared at various dendrogram cutting points. It is found that a clustering-based procedure using a single-linkage algorithm with various similarity distances is a practical and promising approach to detect outliers in univariate circular data, particularly for biological data. According to the results, the SL-Satari/Di distance outperformed the SL-Chang distance for certain data conditions.

Item Type: Article
Uncontrolled Keywords: Similarity measure; circular distance; circular data; outliers; clustering algorithm
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Faculty/Division: Institute of Postgraduate Studies
Center for Mathematical Science
Depositing User: Ms. Ratna Wilis Haryati Mustapa
Date Deposited: 17 Oct 2022 05:00
Last Modified: 17 Oct 2022 05:00
URI: http://umpir.ump.edu.my/id/eprint/35453
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item