Server-sent events for asynchronous API calls in a Roda app
Wiki Stumble, part 3
September 4, 2023 Ā· Felipe Vogel Ā·- The paths not taken
- Server-sent events
- An on-page buffer of next articles
- Whatās actually happening at this point
- A demo
- Smoothing out the rough edges
- Conclusion: outside-the-box lessons
In my last post, I rewrote a little Rails app with Roda and Turbo Streams. In this post Iāll show how I solved the appās last and biggest problem: slow API calls. So slow that the user had to wait several seconds between pressing the āNext articleā button, and actually seeing a new article.
The app is called Wiki Stumble. Hereās the live site and the GitHub repo. The app shows summaries of Wikipedia articles personalized to the userās likes and dislikes.
Due to Wikipedia APIs not having that capability built in, the app has to make multiple API calls for each article, fetching new articles over and over until a suitable one is found.
So the question I set out to answer was, āHow can I move the API calls to happen outside the request that shows new content, while still sending the final results from the API calls back to the page?ā
The paths not taken
If I were building Wiki Stumble like the large app at my work, Iād call those APIs in a background job and then store suitable articles in a database table. But my little app doesnāt use background jobs or even a database, and I donāt want to add them unless itās really necessary.
Another option would be to use WebSockets (in Rails, see Action Cable). There are WebSockets setups for Roda out there (e.g. 1, 2), but they seem more complex than I needāafter all, WebSockets enable two-way async communication between the client and server, but I only need to send async messages one-way from the server to the client.
My search for lightweight, async server-to-client communication led me toā¦
Server-sent events
This article by Julia Evans is a great introduction to server-sent events, a.k.a. EventSource. Itās like half of WebSockets, but over HTTP! š¤Æ
Adding server-sent events to Wiki Stumble was pretty straightforward, thanks to a Turbo Streams integration. (It doesnāt seem to be documented apart from this PR. Thanks to Ayush Newatia, author of The Rails and Hotwire Codex, for pointing me to that PR over at The Spicy Web Discord.)
All I had to do was add <turbo-stream-source>
elements to the page, like this:
<turbo-stream-source src="/next"></turbo-stream-source>
Then I added the GET /next
route, which you can see here in its first iteration. (Later I rewrote some of that code for performance reasons; see below on Rack hijacking.)
One gotcha is that connections need to be manually closed on the client side, or else the GET endpoint will keep getting hit even after the <turbo-stream-source>
is no longer on the page. So I wrote a Stimulus controller that closes each connection when its <turbo-stream-source>
element is disconnected.
If youāre not familiar with Stimulus, see the handy demo on the Stimulus home page.
An on-page buffer of next articles
Server-sent events allowed Wiki Stumble to return a response before the next article was done being fetched. Great!
But on its own, this isnāt enough. At best, in place of the old article, the user would see a placeholder with a loading spinner until the next article was loaded š¤® And the wait time wouldnāt actually be any shorter.
I needed a buffer of next articles, preferably several articles just in case the user is button-happy and advances to the next article several times in quick succession. The tradeoff is that changes in the userās category preferences wonāt immediately be reflected in the next articles, but I think the performance gain is worth it.
Since Iām not using a database, this buffer has to be stored on the page itself. I decided to use hidden inputs in a form. (I also thought about putting the buffer in the session cookie, which already stores the current article and the userās category preferences. But the session cookie isnāt big enough to fit all that.)
Whatās actually happening at this point
Hereās an outline of what happens when the user advances to the next article:
- The buffer of next articles is submitted as part of the form.
- On the server side, the first next article is taken from the buffer and immediately sent back in a response, using Turbo Streams to replace the old article with the new one.
- In the same response, Turbo also removes the first next article from the buffer (the article that the user is about to see), and adds a
<turbo-stream-source>
to the end of the buffer. - The user sees a new article and is happy š
- Meanwhile, the newly-added
<turbo-stream-source>
sends a GET request that fetches a new article. It takes a few seconds, but thatās not a problem thanks to the buffer of next articles. - After the new article is fetched, a Turbo Streams response replaces the
<turbo-stream-source>
with hidden inputs containing the fields of the new article. - My Stimulus controller sees that the
<turbo-stream-source>
has been replaced, and proceeds to close that particular connection with the server.
A demo
You can see all this in action by observing the invisible article buffer in the browserās DOM inspector. Hereās a view of when I refresh the page and then rapidly advance through articles:
Smoothing out the rough edges
The app wasnāt as smooth as in the demo above until I made a few improvements:
- The first page load was still slow, because the buffer was starting out empty.
- The next articles buffer was being filled sequentially rather than concurrently, so the user could easily advance through new articles faster than the buffer was being filled. And when the buffer was emptied, the app would stall and feel slow again.
- Each connection was occupying a server thread, and since connections were open for several seconds at a time waiting for next articles, the app would slow down if more than a handful of people (or tabs) were using the app simultaneously.
Hereās how I solved each of these in turn.
#1. Show a pre-selected few articles at the beginning. OK, Iāll admit I cheated a little bit here. Now when a user loads the app for the first time (i.e. when there is no session cookie), an article is shown and two others are buffered which are hard-coded into the app. While the user is busy with those first three articles, the app has time to fetch more and add them to the buffer. (If you still noticed a slow first load time, itās because Iām on Renderās free plan and it takes a moment for the instance to spin up.)
#2 and #3. Increase the thread count. Initially I mitigated these problems by raising Pumaās thread count and increasing the size of the article buffer. To raise the thread count, I simply created a file config/puma.rb
containing threads 0, 16
. (The first number is the minimum and the second is the maximum number of threads. Puma defaults to 0-5 threads for MRI Ruby.) However, more threads made these problems only slightly less bad, so I looked for another solution, and found it.
#2 and #3. Use the Rack hijacking API. I followed this guide to use Rack hijacking to offload the long API calls from the server threads. This way, all the next articles are fetched simultaneously, and most of the buffer is filled up in the same time it takes to fetch one article, even if I set a maximum of one server thread! š®
Here and here is what my streaming code looks like now.
Side note: Rack full hijacking feels pretty hacky, but I didnāt have much luck with the cleaner approaches that I tried: partial hijacking, returning a streaming body, and swapping out Puma for the highly concurrent Falcon web server. Still, I want to try Falcon again sometime because itās part of the async Ruby ecosystem, which looks great. Hereās an introduction to async Ruby. But for now, Iāve contented myself with Puma and Rack full hijacking because Wiki Stumble performs well enough that way.
Conclusion: outside-the-box lessons
The ābefore and afterā of Wiki Stumbleās performance is like night and day. Previously, as a user I had to wait up to several seconds to see a new article. Now I can merrily flick through articles, with instantaneous loading times.
But the real reason Iām happy with my work on Wiki Stumble is that in order to reach that performance improvement without adding more layers to my stack, I had to think outside the box, and I learned a lot as a result š”
I hope my little adventure has helped you learn something, too. Or if not, you could head over to Wiki Stumble and see if a bit of Wikipedia-surfing will change that š¤