Concurrency in Scala has come a long way from the humble beginnings of `scala.concurrent.Future`. What started as a minimal abstraction over callbacks that allowed easy sequencing of operations, thanks to the monadic composition has since evolved into a rich ecosystem of effect systems, each trying to solve real-world problems around type and resource safety, composability, and performance.
In this series of posts, I take a look at six notable, actively maintained or developed solutions across the evolutionary timeline of Scala: standard library's Future, Cats Effect, ZIO, Kyo, Gears and Ox. The goal isn't however to find the best tool as there is no such thing - tools are only as good as their fit to the purpose which is defined by the domain of the problem. It is also not to benchmark throughput on some synthetic scenario - in fact the conversation on performance in case of code that’s bound on I/O is largely pointless, unless the scale of the deployment is really massive.
The goal is to trace the evolution of ideas - why these tools exist, what problems were they aiming to solve at the time of their conception, what new issues were discovered during exploitation and finally - how do they hold up when facing a realistic problem with realistic constraints.
Comparing effect systems in Scala
Download full guideThe problem
The AI hype has led to a true renaissance of open source web crawlers that are able to compile websites down to markdown documents that can be then fed to LLMs. Web scraping is actually a really interesting problem - a website can be thought of as a directed graph with a high probability of cycles. Crawling is also mostly I/O-bound on network calls and therefore sequential traversal of websites is very inefficient. This makes it a perfect fit for us as we want to parallelize things. The website graph is not known in advance - each page can have an unknown amount of links to other pages in the same site - and its structure can be only made known during the traversal. Due to this we know up front that our backlog of paths to visit has to be unbounded as even the index page can contain an overwhelming amount of candidate links. The time-optimal solution to this problem is to traverse new edges as soon as they are discovered and any work remaining for any given worker thread (e.g. storing markdown document to a file) for any given node should be executed in parallel too. We should however be polite and not flood remote hosts with requests so some sort of rate limiting of our concurrent work is required. To summarise: the task is then to scrape a given website in parallel, preferably in breadth-first order, extract content from scraped pages, convert it to markdown and then store it in a file named after the page's URL and path. To make things a bit more spicy we shall specify some additional restrictions:
- Cycles have to be correctly handled and any given path should only be visited once.
- The maximum throughput is to be limited to X concurrent tasks, because we're politely borrowing the content and not performing a DoS attack.
- Should any of the tasks fail, all the other concurrent tasks should be cancelled and the original error should be propagated upwards and returned by the scraper.
- Depth of the crawl shall be limited to a known Y number. All pages that are deeper than Y should be skipped.
- The tasks of storing the markdown content and of appending new scraping targets should be executed concurrently so that we don't spend time waiting for I/O unnecessarily.
The setup
To make our code comparable between different technology stacks we will extract shared bits of our effectful code and hide it behind a set of common interfaces:
If this looks like a tagless final encoding to you, you’re right because it is - we need a way to encode the same interface for different effects. We also have one more bit of logic which doesn’t need this genericity as the task of processing HTML and extracting hyperlinks is purely CPU-bound and has no side effects:
With these commons we can easily make our code testable as we can inject mocks while still using library-specific idioms like, for example, sleeping and error handling. For all implementations we are going to use the same blocking, synchronous libraries to fetch websites and store files to disk, namely sttp with its default java.net.http.HttpClient backend and os-lib from Li Haoyi’s ecosystem to keep things arguably fair and comparable*. Implementation details of the coordination and backlog management layers will use idiomatic constructions provided by the effect and concurrency libraries.
The solutions
There are multiple ways to approach this problem but we will only concern ourselves with the simplest for now - parallel worker loops. This approach is common architecture used with effect systems and their structured concurrency APIs. In the future we’ll compare it with a lower-level approach of thread-per-task creation - an approach often seen in networking code in stacks that support lightweight threads (e.g. Go’s net/http or Erlang/Elixir’s Cowboy HTTP server) and with an even higher level approaches that are enabled by composable nature of effect systems.
The central task of scraping, extracting links, converting content to markdown and extending the backlog in parallel to dumping the markdown data to filesystem is fairly simple and can be visualized like this:

This flowchart already mentions one concept related to coordination of the work, namely the backlog of pages that are still left to visit. We can quickly extend this chart and get something that would, roughly, do what we need sequentially:

We only have to assume that the queue is unbounded and this simple loop will traverse the whole website graph correctly. We need this, however, to work in parallel. It turns out that with a little bit of coordination we can simply do this:

To make this work for us we only need to provide an unbounded, blocking queue and some way to coordinate exit conditions between separate, parallel worker loops. This is however where the algorithmic part of the story - at least for this first, high-level solution - ends and where the tire meets the road. Let’s see the Future[A].
Future[A] is <pending>, old man!
In the beginning we had chaos. Out of the chaos arose the SIP-14, which introduced Futures and Promises into the standard library as the common basic blocks of concurrent programming in Scala. The SIP was a response to the fact that several libraries like Scala-Actors, Akka, Twitter Util and Scalaz started to implement a datatype meant to model asynchronous values. The design that was delivered allows the user to easily submit computation to a thread pool of choice (the ForkJoinPool is the default global one) passed as an implicit parameter to the default constructor for Futures - the Future.apply function. Future is implemented using an AtomicReference-based state machine with a very simple graph of state transitions: Future[A] can be either pending or fulfilled with either a Success with the value of type `A` or a Failure containing a Throwable exception. Futures don’t keep any reference to the thread on which the computation is being evaluated and this fact is going to have an impact on our ability to fulfill all the requirements of the task - there’s simply no way to cancel a Future without external coordination. There’s also the problem of thread blocking with Futures - it’s not exactly prohibited as in node.js world but it’s strongly discouraged as Futures execute directly on system threads and blocking them wastes those precious resources. Since Future design does not incorporate a scheduler, it’s also impossible to express things like non-blocking sleep or semantic blocking. This means that to express our algorithm we are forced to block on dequeues from our blocking queue.
Let’s see the code:
Three elements of our implementation are something that’s going to show up everywhere - a blocking queue, an atomic integer inFlight used to track pending work and to stop the worker loops once we’re done and an atomic reference to an immutable set of visited urls. In the case of Scala’s standard library, we don’t really have anything built-in and idiomatic and therefore we’re left to use Java’s standard library constructs.
The design of the worker loop is quite simple - we always start with a blocking take from the queue. If we receive a poison pill here named Done - we terminate the loop. If we receive a Scrape task containing the url and depth, we verify we’re not out of depth, that we haven’t visited this url before and if everything is fine, we start scraping it. The important part of the shutdown protocol is that after we finish scraping, we have to decrement the inFlight counter and verify whether we have reached zero. If we did - there are no other pending tasks and it’s time to send poison pills to everyone to break out of all the worker loops. The complimentary of this is that we increment inFlight counter before pushing a link to the queue - this way we have a guarantee that the counter accounts for the new url before it can be taken of the queue by a parallel worker and that once it reaches zero, no other worker can be mid-processing and still push more work to the queue.
To run our workers in parallel we are using the Future.sequence combinator, which allows us to convert a sequence of futures into a future of a sequence. In this case the elements of our sequence are going to be our worker loops which return a Future[Unit]. Finally, we await the resulting Future[Vector[Unit]] until it’s completed.
The biggest limitation of this implementation is the fact that, outside of shutting down the Executor underlying the ExecutionContext used by all of our components, we have no way to interrupt or cancel tasks waiting on I/O and therefore one of our requirements cannot be fulfilled reliably. The best thing we can do is to use a PriorityBlockingQueue as our main queue implementation and bias it towards immediate propagation of poison pills to make other workers stop as soon as they finish their pending tasks, which are uncancellable. There’s also no way to avoid using poison pill messages in the general case either as the built-in blocking Queues do not provide open/closed semantics and therefore we have to manually implement the completion signalling. Finally - blocking system threads is bad but we know that we block a limited amount (parallelism parameter) of them and that we can at least signal that fact to the underlying ExecutionContext by wrapping blocking calls with a blocking { } expression and hope for the best.
The ergonomics are mostly fine given the limitations of standard library Futures. Our implementation doesn’t trigger any of the common footguns of Future: we are fine with the eager fire-and-forget model that Future.apply provides, we don’t need any resource management and we’ve already conceded on the cancellation front.
This concludes the first part of the series. The next article is Comparing effect systems in Scala: Cats Effect and ZIO.




