Skip to main content

Symptom clusters in Covid19: A potential clinical prediction tool from the COVID Symptom study app

Our take —

This study, available as a preprint and thus not yet peer reviewed, found that among users of a symptom tracking app for COVID-19 who regularly reported their symptoms until either hospitalization or symptom improvement, those reporting more symptoms within five days after onset were generally more likely to require respiratory support. Six clusters of symptoms were observed that had some prognostic utility for the requirement of respiratory support beyond baseline health characteristics alone. However, the experience of participants in the study may be different from those who were not included, and the generalizability of results may be limited by the way symptoms were defined and recorded on the app.

Study design


Study population and setting

This study used data from the COVID Symptom Study smartphone app, which collects self-reported information from millions of participants (primarily in the UK) on symptoms, health care visits, SARS-CoV-2 test results, and outcomes. The aim of the study was to identify symptom clusters associated with severe COVID-19 disease, defined as a self-reported requirement of respiratory support. Participants had to report symptoms at least three times over 4+ days between symptom onset and outcome; they also had to either report a hospital visit or show signs of symptom decline. Those who showed signs of recovery required a self-reported positive test result; those hospitalized required either a positive test result, an imputed positive test result, or two or more days of fever and cough. The training dataset comprised 1,653 users, 107 of whom required respiratory support; an independent replication dataset comprised 1,047 users (88% from the UK, 8% from the US, and 5% from Sweden), 59 of whom required respiratory support. The authors used unsupervised time series clustering analysis to distinguish groupings of symptoms, then used symptom clusters to predict the need for respiratory support.

Summary of Main Findings

Six symptom clusters were identified (ordered by rate of hospital visits, with % requiring respiratory support in parentheses): 1) generally mild, with muscle pain, cough, anosmia, headache, and sore throat (1.5%); 2) similar to cluster 1 but with increased fever, more skipped meals, and less muscle pain (4.4%), 3) similar to cluster 1 but with diarrhea and skipped meals (3.7%); 4) more severe with fatigue, fever, chest pain, hoarse voice, and persistent cough (8.6%); 5) similar to cluster 4 but marked by confusion and more pronounced other symptoms including headache, muscle pain, and sore throat (9.9%); and 6) severe respiratory symptoms including shortness of breath and chest pain, along with abdominal pain and high prevalence of other symptoms (19.8%). To predict the need for respiratory support, a model using the first five days of reporting was used: independent variables included demographic and baseline health data, the sum of symptoms over the first five days, and the symptom cluster in which participants were predicted to fall. On the replication dataset, the model yielded a 78.8% area under the ROC curve, compared to 69.5% for demographic characteristics alone.

Study Strengths

Data from the symptom screening app allow tracking of symptoms in individuals over time; this can give a detailed picture of the clinical course of COVID-19 and provides the potential for early identification of symptoms associated with severe disease.


All data are from self-report, which was not clinically verified and is subject to misclassification. Moreover, the self-reported nature of the data along with the inclusion criteria leave room for considerable selection bias. Only patients who regularly recorded symptoms were included: patients reporting less frequently may have different symptoms and/or outcomes relative to those included. Study participants required an outcome, defined as either symptom decline or a self-reported hospital visit. However, the endpoint assessed was the requirement for respiratory support; some included participants may have recorded a hospital visit but stopped recording due to severe illness before requiring respiratory support. Similarly, the study may have excluded those who stopped logging symptoms due to severe illness before meeting inclusion criteria. If the clustering of symptoms and their relationship with outcomes were different in these missing participants, results could be biased. Although the first five days of symptoms are used in the predictive model, it is unclear how many of the included participants had five full days of symptom report. Data on the duration between symptom report and requirement of respiratory support are not provided. Finally, the general clinical utility of this study is limited by the specific way symptoms are defined, self-reported, and combined into clusters.

Value added

This study provides a unique longitudinal analysis of self-reported symptoms over time in SARS-CoV-2 infection, but its clinical utility as a prognostic tool appears limited.

This review was posted on: 1 August 2020