Why do we need an event loop?

Event loop

What is an event loop is a common question. But have you wondered why we need need one?

If you’re a software developer at The Magic Of Coding, your task for this sprint might look like this:

Find the best way to process concurrent I/O
operations in a single thread

Let’s explore why an event loop is crucial for this task by examining a concrete example. Suppose we have two files, file A and file B.

For each file we need to:
1. Read data from the file using a system call
2. Call a function processData(data)with the data read

What are our options?


Nodejs design patterns book cover
This is the only book you’ll need to learn Node.js.
Buy now!

What is an I/O operation?

Any operation that transfers data to or from a computer program is an I/O operation.

For instance,
1. Reading a file, transfers data from the RAM or the hard disk to the program.
2. A server, sending back a HTTP response with a JSON payload, transfers data from the server to the client.
3. Reading data from a socket, transfers data from the socket to the program.

I/O is the slowest fundamental operation that a computer performs. Hence, it adds a time delay between the moment the request is sent and the moment the operation completes.

The solutions below, differ in what the thread does during this delay.

Option 1 – Use blocking I/O

When a program makes a blocking function call, the call does not return until the operation is completed. For instance, in a blocking call like, file.read() below,

data = file.read() // blocks the thread until the data is available
print(data)

In this case, print(data) won’t execute until the data is read, which might take several seconds. If it takes 5s for the file to be read, line 2 executes after 5s.

Using blocking function calls, our solution to reading and processing data from two files would look like,

dataFromFileA = fileA.read() // executes at time T0 seconds
processData(dataFromFileA)

dataFromFileB = fileB.read() // executes at time T5s
processData(dataFromFileB)

Here, the read on fileB has to wait 5 seconds until fileA returns data. It is trivial to notice that a single thread will not be able to handle simultaneous I/O operations using blocking calls. Subsequent calls will need to wait for the preceding calls to finish.

What we need is a solution where fileB.read() also executes at the T0th second.

Option 2 – Use non-blocking I/O

Operating systems support another mechanism to access resources called non-blocking I/O. In non-blocking mode, function calls always return immediately, without waiting for data to be read or written.

If no results are available, the function will simply return a predefined constant, indicating that there is no data available to return at that moment.

The fcntl() system call

In Unix operating systems, fcntl() is used to manipulate an existing file descriptor for a resource to change its operating mode to non-blocking (with the O_NONBLOCK flag).

Once the resource is in non-blocking mode, any read operation will fail with a return code, EAGAIN, in case the resource doesn’t have any data ready to be read.

The most basic pattern for accessing this kind of non-blocking I/O is to actively poll the resource until some data is returned; this is called busy-waiting.

Our solution using non-blocking I/O would look like,

resources = [fileA, fileB];

while (!resources.isEmpty()) {
    for (i = 0;  i < resources.length; i++) {
        data = resource[i].read();
        if (data === NO_DATA_AVAILABLE)
            continue; // there is no data to read at the moment
        if (data === RESOURCE_CLOSED)
            resources.remove(); // remove from list if resource was closed
        else
            processData(data);
    }
}

With this, it is possible to handle multiple resources requiring I/O operations in a single thread, but it is highly inefficient.

The loops will consume precious CPU cycles for iterating over resources that are unavailable most of the time. Polling algorithms usually result in a huge amount of wasted CPU time.

If it takes 5s for data to be available for reading on the files, then the loops will iterate constantly for 5s without doing anything useful.

Option 3 – Use event demultiplexing

Operating systems provide another native mechanism to handle concurrent, non-blocking resources efficiently. This mechanism is called synchronous event demultiplexer or event notification interface.

This component collects and queues I/O events that come from a set of watched resources, and block until new events are available to process.

Applications register resources and events they are interested in with the event notifier. When the event occurs, the event notifier notifies the application.

A solution using event demultiplexer would look like,

watchedList.add(fileA, FOR_READ); // [1]
watchedList.add(fileB, FOR_READ);

while (events = demultiplexer.watch(watchedList)) { //[2]
    // event loop
    foreach (event in events) { // [3]
        data = event.resource.read() // This read will never block and will always return data
        if (data === RESOURCE_CLOSED)
            demultiplexer.unwatch(event.resource) // remove from watched list if resource was closed
        else
            processData(data)
    }
}
  1. Each resource, and the associated event that we’re interested in, is added to a data structure.
  2. Event notifier is told to watch these resources for the registered events. This call is synchronous and blocking. It only returns when the events occur. Line 6 only executes when this call returns.
  3. Each event is processed. At this point, the resource associated with each event is guaranteed to be ready to read and will not block during the operation. When all events are processed, the code will block again on line 4 until new events are available to process. This is called the event loop.

This solves the issues with previous options,

  1. At T0s, events are registered for both files. At some point in the future, read events are returned for both files and a non-blocking read is performed which is guaranteed to return data.
  2. Since the call to the notifier is blocking, the loop only executes when events are available. So CPU cycles are not wasted in trying to read resources that are not yet ready.

We can see that with this approach, a single thread can handle multiple I/O operations concurrently.

The non-blocking I/O engine of Node.js – libuv

Each OS provides its own interface for communicating with the Event Demultiplexer. epoll on Linux, kqueue on Mac OS X and IOCP on Windows.
libuv provides a layer of abstraction over these inconsistent interfaces and offers a uniform, platform independent interface to Node.js.

The spread of computers and the Internet will put jobs in two categories. People who tell computers what to do, and people who are told by computers what to do.

Marc Andreessen

If you have any questions, please don’t hesitate to leave a comment below. We’ll get back to you as soon as possible.


5 1 vote
Article Rating
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Anonymous
Anonymous
10 months ago

Super