A few years ago I worked on a project where we needed to add email functionality to a web site.
You know the drill, a customer fills in their details on the site, clicks “Contact Us” and an email wings it’s way to someone in the office.
Read on to find out how not to do it before we look at a better solution using Azure Service Bus.
A naive approach
A quick and dirty way to get this working would have been to directly hook up the “Contact Us” button to hit our mail server and send the email.
The potential issues here were obvious, the customer would have to wait for the email to send before the user interface would update and more critically, any downtime for the email server would have meant a broken page on the site.
So we discounted that option and considered the alternatives.
The answer’s a database, what’s the question?
We gave it some thought and came to this conclusion.
The customer doesn’t want (or need) to wait for the email to be sent and isn’t concerned whether it takes a few seconds (or even minutes) for their email to arrive at it’s destination.
With this in mind we switched our focus to storing the send email requests and processing them later.
The obvious solution here was a database. Most of our existing code used SQL so we should too. We created a table in SQL to store the email subject, body, sender etc.
Separately we created a poller, a loop which regularly checked this table and sent any previously unsent messages. We used Topshelf and deployed our super email sender as a Windows Service.
To make this work we used a flag on the table to indicate if the message was sent.
Problem solved. The user hits the button, we store the email request in a database then display a message to the user.
Separately the super email sender would kick in and check our email request table for unsent entries and do the job of actually sending the email.
This worked fine until one day the boss stormed in and demanded to know why customers were reporting that we’d ignored their emails. Even worse, given the initial success of the super email sender we had rolled this out to other areas of the business and it transpired automated quotes had gone unsent.
We investigated and found that that an unhandled exception in the super email sender was halting execution at the first sign of failure and not attempting to send the rest of the emails.
Furthermore we were getting it in the ear from various quarters because of the size of the table and a poor choice of indexes also meant the polling query was taking a long time to complete each time it ran, effectively doubling our polling interval.
We fixed these problems. We tightened up our exception handling and also wrote code to automatically delete older messages. At the same time we added an alert so we would know if any messages got stuck for longer than a few hours.
Why have one email when you can have two?
A few months later the boss stormed back into the office and demanded to know why we were emailing customers multiple times with the same email.
Another look at the code identified the issue. In our wisdom we had threaded off the poller as a background task. As a result, the poller would kick off a task to start processing unsent messages, then the polling interval would come round again and the poller would kick off another task.
Because we were calling the database and retrieving all unsent messages then looping through them in memory, any kind of backlog in unsent messages meant that the poller could swing back around before the first instance had finished. Effectively this meant the second task could also pick up any requests which the first one hadn’t flagged yet.
Time for a rethink
We fixed this by ensuring the poller could only run once at any given time. The heat was off, but with fresh eyes we took another look at our solution and identified more potential issues.
- Any bulk update operation on the table ran the risk of sending thousands of emails again (accidentally set the “sent” flag to false)
- Any bulk operation could also cause emails not to be sent at all (set the “sent” flag to true)
- There was no accurate history of how often an email send was attempted (we were only logging the request, not send attempts) so we had no idea how reliable our mail server was, or how many times a message had to be retried
- Should we want to take some other action when the user sends an email, we would potentially need to add another flag to the table
- The same table was now being used for reporting meaning our simple transactional table was serving two purposes
- The “schema” for our send email request was tied to the database table. Should we wish to extend or alter that contract it would require a database migration.
Of course, all of these problems were surmountable, either with appropriate safeguards or more logging/tables.
However, then it hit us. We were basically re-implementing messaging but using a database table instead of a queue. Around the same time, we had started to use Azure Service Bus so we decided to try an alternative approach.