Sometimes, it looks like it is not possible to avoid using an accumulating array, a pattern that feels unnatural in Ruby. Recently, I’ve need to chase down and unroll pagination links over a JSON / REST api. I don’t know how many pages there will be, and it’s probable (but not guaranteed) that I need to retrieve and use all of the content. Since each page is dependant on results from the previous page, there is no obvious Enumerable parallel. Here, I’ll demonstrate a quick refactoring that will provide in a clean, lazy enumerable object.
This being a HATEOAS API, the next page link is embedded in the response JSON,
tucked under a ["links"]["next"]
key. To fetch all of the data, I end up with
code that looks something like:
def retrieve_all_pages(url)
widgets = []
while url
response = connection.get url
json = JSON.parse response.body
url = json["links"]["next"]
widgets.concat json["widgets"]
end
widgets
end
As ruby goes, this is pretty ugly. When there’s something distinct to
enumerate over, it’s recommend to replace the accumulating widgets
array with
a #map
call, or working with some other Enumerable
method. Unfortunately,
there isn’t a clear parallel for this case.
Another issue is that this loop fetches every single page, before returning control, and regardless of how many results I actually end up using.
Fortunately, there is Enumerator
, Ruby’s answer to producing generators.
The enumerator class produces an enumerable, backed with any arbitrary
generation logic.
I can refactor the while loop to look more like this:
def retrieve_all_pages(url)
Enumerator.new do |yielder|
while url
response = connection.get url
json = JSON.parse response.body
url = json["links"]["next"]
Array(json["widgets"]).each do |widget|
yielder << widget
end
end
end
end
We’ve gotten rid of the accumulator array, and instead have something that looks much closer to idiomatic ruby. Additionally, the method begins yielding immediately after fetching the first page, only retrieving additional pages when needed, without client code needing to understand the mechanics of the underlying pagination.
It is important to note that instead of returning an Array
,
retreive_all_pages
now returns an Enumerable
, but generally quacks the same
– it’s rather unlikely that any client of the original implementation was
using direct array semantics; if so, a simple to_a
converts an Enumerable to
a normal Array.
Overall though, I find the resulting enumerator to be far easier to work with in standard ruby, providing a more flexible and versatile encapsulation of the remote interface.