Oct 24, 2022 3 min read

Thinking in events

How to start grokking event sourcing using your traditional relational database models

This is one of those articles that I wrote to solidify my understanding of something, it’s not meant to be educational (except for myself, hehe), but I find it helps a lot to write things down, and it’s a big plus if someone else finds it useful.

Like most people, you probably use a relational database to store data for your applications. And, if you’re like most people, you probably use that data in a way that is not event-sourced. In other words, you probably don’t think of every update to your data as an event that is timestamped and stored in a log.

But what if you did? What if you only allowed inserts into your database and never updated or deleted any data? This would force you to think of every incoming update as an event with a timestamp. This way, you would have a clear lineage of all the events, allowing you to easily track changes, not just the latest state of the records.

There are a few benefits to this approach. First, it would increase performance because you would never have to do any expensive updates or deletes. Second, it would make tracking bugs easier, because you would always know exactly what happened and when. And third, it would make it easier to add new features because you would never have to worry about breaking existing code.

Of course, there are also a few drawbacks. First, it would be more challenging to query your data because you would have to reconstruct it from the events. Second, changing your data model would be more challenging because you would have to migrate all your data from the old model to the new one. And third, it would be more difficult to share data with other applications because they would need to understand your event-sourced data model.

For example, consider a table that is used to store users.| id | name | email |
|----|-----------|-------------------
| 1 | John Doe | john@example.com |
| 2 | Jane Doe | jane@example.com |

Normally, the application logic that uses this table would do the following things when some operation happens:

Insert user

Add a new record using an INSERT statement

Update user

Find user by id
Update the record with an UPDATE statement

Delete user

Find user by id
Delete the record with a DELETE statement

The update and delete operations are expensive because they require finding the record first, and then doing an update or delete. You also have to take isolation levels into consideration to ensure that transactions are working properly.

In an event-sourced system, there would be no need for these operations. Instead, the application logic would just emit events, which would be timestamped and stored in a log. For example, when a user is created, an event would be emitted with the user’s data. When a user is updated, another event would be emitted with the updated data. And when a user is deleted, yet another event would be emitted.

A simple event-sourced version of our users table could look something like this:| id | timestamp | name | email | operation
| — | — - | — — — — -| — — — — — - - - -| — — —
| 1 | 100 | John Doe | john@example.com | insert
| 2 | 200 | Jane Doe | jane@example.com | insert
| 3 | 300 | John Doe | john@example.com | update
| 4 | 400 | John Doe | john@example.com | delete

In this example, the id is just a sequence number. The timestamp is the time at which the event was emitted. The name and email are the data that was emitted with the event. And the operation is the type of event that was emitted.

Now, let’s say you are interested in the change history of a certain user through time.With the traditional RDBMS model you would not be able to do this as everything but the current state is deleted as soon as we update a record.

With the event-sourced model, you can easily reconstruct a user's history by querying the events. For example, the following query would give you all the events for John Doe:

``` SELECT * FROM events WHERE name = ‘John Doe’ ```

As you would expect, this would give you the following results:| id | timestamp | name | email | operation
| — | - - | - — — — | — — — — — — — — -| — — -
| 1 | 100 | John Doe | john@example.com | insert
| 3 | 300 | John Doe | john2@example.com | update
| 4 | 400 | John Doe | john@example.com | delete

You can then use these events to reconstruct the history of the user. For example, you can see that the user’s email address was updated at timestamp 300. And at timestamp 400, his record was deleted.

One downside of this approach is that it can be difficult to change your data model. For example, let’s say you want to add a new field to the users table, such as a phone number.

With the traditional RDBMS model, you would just add the new field to the table and update the application logic to use it. With the event-sourced model, you would have to migrate all your data to the new data model or use a more unstructured type to hold your data, such as a json field. This can be a difficult and time-consuming process.

If you discovered the famous change data capture (CDC) pattern in this you are entirely correct in your observation! CDC is a great intro to event sourcing using the outbox pattern, which extends its base premise by allowing replayability, a key feature in event sourcing.

Conclusion

If you ever wondered how event-sourced systems are built, try out this exercise, even if only mentally — switching to an event-sourced mindset can be really helpful to understand how these systems work.

Insert user

Update user

Delete user

Conclusion

You might also like...

The Developer Experience formula

Implementing Data Contracts

YubiKey OTP generation from the command line

An introduction to Redpanda: Creating a chat application in Python in less than 100 lines of code.

High-performance open source Data Lakehouse at home