Characterizing variation in menstrual cycles using self-tracked data

Citizen Endo
4 min readMar 4, 2021

A member of our lab, Kathy Li, has published a paper in Nature’s partner journal Digital Medicine! The paper is titled “Characterizing physiological and symptomatic variation in menstrual cycles using self-tracked mobile-health data.” Below is a summary of the paper, but you can also read it here.

The menstrual cycle is an important indicator of overall health and wellness in women, but investigation into characterizing menstrual cycles has been limited by the lack of large-scale, reliable datasets. However, with the rise of menstrual tracking mobile apps, content-rich data of menstrual health experiences and behaviors are now accessible.

We explored a database of user-tracked observations from the Clue, a menstrual tracking app, of over 378,000 users and 4.9 million natural cycles. We 1) propose a definition of menstrual variability based on cycle length consistency; 2) develop a procedure to exclude cycles lacking user engagement, thereby allowing us to better distinguish true menstrual patterns and mitigate self-tracking artifacts; and 3) uncover that menstruators located at different ends of the menstrual variability spectrum exhibit statistically significant differences in their cycle characteristics and symptom tracking patterns.

Our definition of regularity is based on cycle length differences or CLDs — the absolute difference between subsequent cycle lengths. We consider those whose median CLD is greater than 9 as “consistently highly variable” in their cycle lengths and those whose median CLD is 9 or less as “consistently not highly variable” in their cycle lengths.

Figure 1. Histogram of maximum CLD before and after removing artifacts

Figure 1 is a histogram of maximum cycle length differences and it showcases the impact of our removal of self-tracking artifacts. The exclusion is based on where the difference between the median CLD and maximum CLD exceeds 10. The blue line is before we remove such artifacts and the red line is after removal. The peaks of the blue line around 30 and 60 days may correspond to users forgetting to track one or two of their periods, respectively. The right-hand tail of the red line indicates that we only removed anomalies and are still preserving natural variation in the data.

Figure 2. Time series embedding (a) and probability distribution (b) for cycle length

Figure 2 showcases cycle length variability for consistently highly variable (orange) and consistently not highly variable (teal) users. The left panel, Figure 2a, shows the cycle lengths of three consecutive randomly sampled cycles from users on the x, y, and z axes, respectively. We can see that the teal cluster of users occupies the region of the space around the x=y=z line (where all cycle lengths are equal), with the orange cluster fanning outward. The right panel, Figure 2b, shows the probability distributions of cycle length for both variability groups; we see that the orange group’s distribution has a much wider spread and is less peaked than the teal group. Cycle lengths are more widely distributed for the orange group, confirming that the consistently highly variable group represents those with more fluctuation in cycle length.

Figure 3. Time series embedding (a) and probability distribution (b) for period length

Figure 3 is analogous to the previous one, showcasing period lengths instead of cycle lengths. The orange and teal distributions for period length are largely overlapping, with the same median of 4 days. This indicates that period lengths are distributed very similarly for the two groups, and therefore shows that the variability in cycle length is not due to period length differences between the groups.

We find that users who are “consistently highly variable” and those who are “consistently not highly variable” self-track symptoms differently, including those related to period, pain, and emotion. For instance, users in the consistently highly variable group are much more likely associated with tracking headaches and tender breasts in at least 95% of their cycles.

We also find that cycle and period length statistics are stationary over the app usage timeline across the variability spectrum. The symptoms that we identify as showing statistically significant association with timing data (as measured by median CLD) can be useful to clinicians and users for predicting cycle variability from symptoms or as potential health indicators for conditions like endometriosis. Our findings showcase the potential of longitudinal, high-resolution self-tracked data to improve understanding of menstruation and women’s health as a whole.

To contribute to endometriosis research, you can download the Phendo app and track your symptoms. Tracking in Phendo may also help you understand your experience of the disease and communicate with your provider.

Download Phendo for iOS here.

Download Phendo for Android here.

For more information about our research and Phendo visit

Have any questions? Email us at



Citizen Endo

Patients and data science for an endometriosis cure: We bridge the gap between patient experience and clinical characterization of endometriosis.