Feed Polling for Unread Cloud

John Brayton

August 10, 2024

In order to retrieve new articles from feeds, Unread Cloud needs to check those feeds periodically. In general, Unread Cloud polling intervals range from every 5 minutes to every 60 minutes, depending on a variety of factors:

When the most recent article from that feed was published. (If the most recent article from a feed was published several years ago, it is less likely that a new one will be published within the next few minutes.)
The number of active accounts subscribed to the feed.
Whether Unread Cloud’s last attempt to check that feed was successful, versus the website being unreachable or returning an error.

If a server returns a 429 (Too Many Requests) response, Unread Cloud automatically slows its polling for that feed down to once every four hours. I do not want to be a burden on websites or fight the policies of those trying to publish a feed.

If a server returns a 403 (Forbidden) response, Unread Cloud will not check that feed for another four hours. That polling interval change is temporary if the next retrieval is successful.

Unread Cloud uses “If-Modified-Since” and “If-None-Match” headers to avoid retrieving an entire feed when it is unchanged. Unread Cloud retrieves feeds in a compressed form from servers that support HTTP/HTTPS compression.

There is more nuance to the timing than this, and it is subject to change over time. This is just a rough summary.

Feed Reader Score Project

Rachel at rachelbythebay.com recently started working on a feed reader score project. The project is a laudable effort to encourage RSS reader developers to behave respectfully towards websites hosting RSS feeds: to not poll more frequently than needed, and to use HTTP/HTTPS caching. This is important work. If RSS readers are a burden to websites, website owners are dissuaded from providing feeds.

I learned of this project about a week ago, when a follower on Mastodon kindly sent me a link to an article describing feed reader polling behavior. That article describes Unread Cloud’s behavior as follows:

Unread RSS Reader. Godawful poll timing. 6103 requests in 52 days is about one poll every 736 seconds _on average_, but they’re hugely spread out. WTF? Put it this way: the list of unique intervals (nn seconds, nn minutes, ...) is *four pages tall* on my web browser.

One poll every 736 seconds sounds great, but the description indicates that something is amiss. I got in touch with Rachel, who kindly shared with me the data used to arrive at these conclusions. I did some analysis on this data.

Chart showing minimum and maximum time intervals for each day. For the most part the minimum interval is a constant 13 minutes through 2024-06-28. It becomes very low and variable from 2024-07-29 through 2024-07-08. Then it goes to about 15 minutes. The maximum interval varies between 15 minutes and 30 minutes. But there is a 124-minute time interval on 2024-07-11. The minimum time interval goes way down from 2024-06-29 through 2024-07-08.

The first and last days have lower request counts (because they are partial days). The request counts through 2024-06-29 hover around 110 per day. The request counts for 2024-06-30 through 2024-07-08 go up to about 190 per day. Then they come back down to about 95 per day.

What happened from 2024-06-29 through 2024-07-08?

The request count for those days is roughly double that of the other days, and the minimum interval is absurdly low.

This coincides with my configuring a test instance, essentially a temporary clone of Unread Cloud, that started with a copy of the production database. I did this in part to do a test restore from the offline backup. The Unread Cloud production database has a warm backup that stays up-to-date via replication. The offline backup is there in case both database servers fail. Every few months I do a test restore from the offline backup to ensure that the backups are functioning as expected – and are not subtly broken in a way that my monitoring does not catch. I also did some testing of code changes against this database. I do not often test against a full copy of the production database, but in this particular case I was concerned that the changes I was making could introduce a performance issue.

Since there were two different Unread Cloud instances checking this feed, there were roughly twice as many requests. Each instance was determining the timing for its own polling independently.

It is fair to point out that an Unread customer being subscribed to a feed resulted in twice the number of requests to that feed during this time! I am seriously considering that impact.

What happened on 2024-07-11?

The maximum interval for that day is 124 minutes. One request had that high interval. The others were within the expected range. I did have some server issues this day, so this does not surprise me.

Why is the maximum interval so variable?

The minimum interval is fairly consistent, but the maximum interval varies between the minimum and almost 30 minutes (ignoring the specific date ranges above).

When a new customer signs up and imports a large number of feeds that Unread Cloud is not already checking, Unread Cloud will check those feeds right away and defer checking other feeds. Checking feeds also gets deferred when rebooting servers in order to deploy a security update that requires a reboot. Similarly when I deploy an update to the Unread Cloud server software, that pauses feed checking for a period of time.

Conclusions

I feel good knowing there is a good explanation for the anomalous polling intervals from 2024-06-29 through 2024-07-08. Even during that period, the highest daily request count is 192. On average that is one every 7.5 minutes. The rest, except for the 2-hour polling interval on 2024-07-11, are as expected.

I intend to continue the practice of occasionally creating a clone of Unread Cloud, starting with a backup of the production database. This does result in additional polling of feeds. But it is necessary to test the performance impact of changes under some circumstances, and it is necessary to ensure that my backups are functioning as expected.

I now have my own test feed polling on Rachel’s feed reader score project - so I can get an up-to-date external view of Unread’s polling to ensure that it aligns with my intentions. I have to pay the Unread Cloud server costs, so I am motivated to ensure that I am not polling feeds more often than necessary or using unnecessary bandwidth. I thank Rachel for this project.