Update on Recent Outages
Posted: Wed Nov 06, 2019 11:17 am
First off let me apologize to everyone for the issues with the board as of late. I hadn't really had much time to look into it and had tried a few quick fixes that I thought would help and seemed to initially but never really resolved anything. Finally yesterday hit a breaking point and the board was completely offline starting yesterday afternoon somewhere around 3-4 o'clock. Thanks to @mslacat for alerting me on twitter as I was at work and then had plans last night and wouldn't have seen it until much later.
So on to the root cause of the issue. As hard as it is for me to understand why, it appears that we were hit by a sort of http flood denial of service attack from a ton of different IP's in China. This hit an extreme yesterday afternoon. If you go to the bottom of the home page and look at our "Most ever users online" you will notice that it is now showing as over 2,200 set last night. As I was trying to diagnose and resolve the issues, I was taking the server offline or blocking all web traffic and anytime I opened it back up within a minute I had over 2,000 anonymous users simultaneously hitting any publicly available pages on the board. Unfortunately it was just too much for the site to handle and is why it would become unavailable. The board becoming unavailable is a safety measure for when the server gets overloaded. The large mass of traffic wold hit, overload the server, shut down the board, the server would slowly recover until it could re-enable itself and then it would start all over again. This is why people could occasionally get on or get notifications and then be blocked out again soon after.
After some time and research and identifying that this was what was indeed happening, I was able to find that this is a common attack and there are lists of IP's that you can block at the network level to ensure that this isn't happening. After implementing those blocks, I was able to see our anonymous traffic count reduce drastically and the board returned to a normal operating level. I see that there are still some hitting the site, but the number is in the tens now rather than the thousands and I can monitor those to find additional address ranges to add to our block.
Finally, on to other better news. Since I was up late anyway and the board was non-functioning anyway while I worked through all of this I took the time to go ahead and do some updates. All board software and servers have been updated at this time which is great to have and not something I would typically be able to do during the busy part of the year. The one biggest change that you will all possibly notice is that our site now uses SSL for a secure connection. This should be transparent to all of you, as all old links that used http:// will automatically forward to https:// links but when logging in you won't have to see any warnings from your browser anymore about the site being insecure.
So again apologies to everyone for the issues and especially for being down during last night's great showing by the basketball team. As I have said on numerous occasions if you are seeing issues with the board please do not hesitate to ping me here with a mention or DM or if severe issues like last night send me an email or hit me up on twitter. I'm not always actively on the board to see issues right away first hand.
So on to the root cause of the issue. As hard as it is for me to understand why, it appears that we were hit by a sort of http flood denial of service attack from a ton of different IP's in China. This hit an extreme yesterday afternoon. If you go to the bottom of the home page and look at our "Most ever users online" you will notice that it is now showing as over 2,200 set last night. As I was trying to diagnose and resolve the issues, I was taking the server offline or blocking all web traffic and anytime I opened it back up within a minute I had over 2,000 anonymous users simultaneously hitting any publicly available pages on the board. Unfortunately it was just too much for the site to handle and is why it would become unavailable. The board becoming unavailable is a safety measure for when the server gets overloaded. The large mass of traffic wold hit, overload the server, shut down the board, the server would slowly recover until it could re-enable itself and then it would start all over again. This is why people could occasionally get on or get notifications and then be blocked out again soon after.
After some time and research and identifying that this was what was indeed happening, I was able to find that this is a common attack and there are lists of IP's that you can block at the network level to ensure that this isn't happening. After implementing those blocks, I was able to see our anonymous traffic count reduce drastically and the board returned to a normal operating level. I see that there are still some hitting the site, but the number is in the tens now rather than the thousands and I can monitor those to find additional address ranges to add to our block.
Finally, on to other better news. Since I was up late anyway and the board was non-functioning anyway while I worked through all of this I took the time to go ahead and do some updates. All board software and servers have been updated at this time which is great to have and not something I would typically be able to do during the busy part of the year. The one biggest change that you will all possibly notice is that our site now uses SSL for a secure connection. This should be transparent to all of you, as all old links that used http:// will automatically forward to https:// links but when logging in you won't have to see any warnings from your browser anymore about the site being insecure.
So again apologies to everyone for the issues and especially for being down during last night's great showing by the basketball team. As I have said on numerous occasions if you are seeing issues with the board please do not hesitate to ping me here with a mention or DM or if severe issues like last night send me an email or hit me up on twitter. I'm not always actively on the board to see issues right away first hand.