Ad Code

COVID-19 Symptoms Detection Using Iterative Dichotomiser (ID3) Algorithm: A Greedy Method

 




I. INTRODUCTION 

According to Gottingen [1], Greedy algorithm has been widely used to quickly find approximate solutions to combinatorial optimization problems. It tries to find a localized optimum solution, which may eventually lead to globally optimized solutions. Common variations of greedy algorithms are Dijkstra’s algorithm, Iterative Dichotomizer 3 (ID3) algorithm, A* algorithm, and Kruskal’s algorithm. COVID-19 is a global pandemic that has claimed many lives around the world from late last 2019 to date. Just like other diseases, it has particular symptoms, but the ultimate confirmation of the sickness is obtained from laboratory test results. Worldwide, conducting tests appears to be a challenging task due to the high population in some countries or limited access to state of the art molecular laboratories. Particularly in the third world and developing countries, the laboratory tests conducted are highly limited due to technological and financial constraints facing these countries [2]. 

In order to overcome this challenge, researchers have been working round the clock to come up with solutions that would ease testing procedures. It is clear that testing everyone or all the citizens in a country within this short time is not possible; hence who qualifies to get tested adds more to existing challenges. The process of prioritizing test candidates can easily be shadowed with challenges such as human relations and influence, especially in countries with poor governance and a high rate of bribery and corruption. 

This study proposes the use of ID3 to classify potentially infected people based on a set of exposed symptoms, after which they can proceed with the laboratory test. The ID3 algorithm adopts a greedy non-backtracking method to construct a decision tree in a top-down recursive approach [3]. The using of ID3 algorithm in this paper was motivated by the previous studies by Gottingen et al. [1] and Ma et al. [4], in which the greedy algorithm approach was used to analyze and classified medical diseases [1, 3-9]. The ID3 algorithm, just like other algorithms used for machine learning classification tasks, can be implemented using the holdout or cross-validation approach. In this study, both approaches are employed, and their performance is compared based on their accuracy and time complexity.


II. REVIEW OF RELATED LITERATURE 

CORONAVIRUS DISEASE 2019 (COVID-19) Covid-19 is the latest global threat, which was discovered in December 2019 in Wuhan, Hubei province of China, and has spread rapidly throughout the world to a full-blown pandemic. After virus identification and isolation, the pathogen for this new disease was initially referred to as 2019 novel coronavirus (2019-nCoV)2 but it was however renamed officially as severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) by the World Health Organization (WHO). Covid-19 is a respiratory disease caused by the virus SARS-CoV-2 which belongs to a group of viruses known as the coronavirus. The coronavirus also belongs to a family of viruses known as the Coronaviridae. The coronavirus was also responsible for the severe acute respiratory syndrome (SARS) outbreak and the Middle East respiratory syndrome (MERS). Compared with the SARS-CoV that caused an outbreak of SARS in 2003, SARS-CoV-2 has a stronger transmission capacity [5]. The Covid-19 outbreak has posed critical challenges for the public health, research, and medical communities as the disease is very communicable through air. The easy and rapid spread of the disease makes prevention and control a high priority issue. Several countries have already instituted a temporary restriction on travel in order to slow down the spread of the disease. The wearing of mask and practicing of social distance of 1meter minimum has also been encouraged and enforced in some countries [6]. Although the clinical symptoms of COVID-19 are predominately respiratory symptoms, some patients have severe cardiovascular damages. Patients with underlying cardiovascular diseases are at higher risk that could lead to death. Therefore, understanding the symptoms of this disease is important for early diagnosis which could give patients a fighting chance and helps control the spread of the diseases. 

USE OF COMPUTED TOMOGRAPHY (CT) FOR SYMPTOMATIC DIAGNOSIS OF COVID-19 There is great interest in developing symptom-based screen to prioritize who should be tested for Covid-19. However, the reliability of such symptoms has been a subject of debate, but it remains highly important to priorities screening in the face of limited resources. When coronavirus first broke out, details about the disease were sketchy especially among nonhospital patients. To better understand profiling of patients most nations used the option of questionnaires. In the United States for example, the CDC used an optional questionnaire to collect detailed information on confirmed Covid-19 patients [8]. The data was analyzed by age group, sex, hospitalization status, and symptom. It was discovered among 164 confirmed symptomatic patients that 96% reported fever, cough, or shortness of breath. 68% of 57 hospitalized adult patient reported all three symptoms [8]. It has been discovered that Covid-19 can cause a wide range of symptoms. There are over 22 multivariable models to diagnose covid-19 which target suspected cases of covid-19. Most of these models report a C index range of between 0.65 and 0.99. The most frequently diagnostic predictors were flu-like symptoms, age, body temperature, lymphocyte count, and neutrophil count [9]. Most studies used computed tomography images or chest radiographs, Others used spectrograms of cough sounds and lung ultrasound to diagnose patients with symptoms before proper test [10]. Because of the strong infectious rate of covid-19, medical practitioners need a faster way to diagnose patients as proper testing could take days to get results. With CT scan, results could be obtained much faster. In a study of 36 patients with covid-19 who underwent CT scan. The CT scan reported 29/36 covid-19 cases [11]. 


ID3 ALGORITHM 

ID3 Algorithm The ID3 algorithm is an important research concept for induction research on a data set. It is the most widely used algorithm in decision tree method. The ID3 algorithm was initially introduced by J. Ross Quinlan in 1975 as a Concept Learning System (CLS) algorithm. Over the years the CLS algorithm has developed into an ID3 algorithm that searches through attributes of the training instances and extracts the prefect attribute that best classifies the training set [12]. 

The ID3 generates decision rules from a set of training examples. ID3 derives it classes from a fixed set of training instances and there is a non-incremental algorithm. Once a class is created, it is used to predict all future instances. By using information theory to choose features, ID3 gives the greatest information gain [12]. To obtain decision rules that best classify the training examples, a test is carried out by selecting the characteristics and then dividing the examples into subgroups using selected characteristics. After grouping, the entropy is calculated to know how importance the feature is [13]. ID3 algorithm are specifically suited for certain data set and expected results. ID3 algorithm is often used when Native Bayes will not satisfy a problem. When a group of features are dependent on each other, the ID3 algorithm is better suited than the Native Bayes. The ID3 algorithm is also useful for categorical data, i.e. data with distinct attributes. For example: hot or cold. Another very useful application of ID3 is when objective values have discrete output values. Example “yes’ or “no”. 

Close Menu