Critical maintenance
Incident Report for Hummingbot Miner
On March 17, we took down the Hummingbot Miner app and the backend Rewards Engine for a critical unscheduled maintenance. While the maintenance period was expected to last 6 hours, it took 36 hours to complete the fix.

We are taking down the Hummingbot Miner app and the backend Rewards Engine for maintenance. Due to the challenges involved in collecting and aggregating real-time order book data from a large number of bots, we are seeing our backend system slow down considerably. In order to fix it, we need to stop the system, make some changes to our database, test if those changes work, and then restart the system.

Unfortunately, this fix was much more problematic than we anticipated. We were overly aggressive in making too many changes at once, which made it impossible to troubleshoot once we pushed the new version online.

We plan roll back to the production backup and apply fixes incrementally. Unfortunately, this will take longer to resolve, and we hope to get to the Miner app back online today.

While the initial projection of a 6-hour outage turned into 36-hours marathon, we finally fixed the database issues that caused outages and slowness. This fix lets our backend system handle more users and more bots. All credit goes to our hard-working engineers 👷👷‍♀️who have worked continually over the past two days!

During the 36-hour outage period, any orders placed by your bots were still captured and rewarded. Our data collector continued operation, so afterwards the reward aggregator replayed the data to properly allocate rewards to bots running during that period. When you sign into Hummingbot Miner, these rewards should be reflected in the Activity view.
Posted Mar 18, 2020 - 21:00 PDT