Update on eClass Outages - Dec. 30th 2021
Posted by Chris Goetz on 30 December 2021 03:24 PM
Good afternoon everyone,
Since the end of Fall term, the IST infrastructure and eClass teams along with Amazon Web Services and external consultants have been working diligently to diagnose the issues experienced during the final exam period. However, a definitive root cause has not been identified. This suggests multiple issues added up to cause the outages. Over the last several days, work has concentrated on detailed reviews of all infrastructure configuration to check for any errors, and enhancing performance wherever possible.
We are now focusing on the data and database that makes up eClass. That database consists of over 500 tables (each table is like an Excel spreadsheet), with the 70 largest tables ranging in size from one million to 500 million rows. Yesterday the teams ran a process to clean and re-index all the tables in the database. This helped to remove stale data and reduced the size of the database by several hundred gigabytes. This also ensured that, if any indexes were corrupt, they have been rebuilt.
Over the next two to three days, we will be conducting load tests in our performance testing environment (not the public version of eClass) to simulate several thousand students accessing eClass quizzes. Our goal is to see whether this simulation can cause a failure in the test copy of eClass, and capture the detailed events leading to failure. In parallel, the teams are investigating further avenues to mitigate load and cascading interactions on the database for Winter term.
We know that instructors are preparing for the Winter term, creating courses and adding content. We understand that this week’s maintenance windows are disruptive to that preparation. But these windows are necessary and important for the teams to be able to update the public version of eClass with our tested changes. Due to the fact that the root cause has not yet been identified, we will need to schedule more maintenance windows leading up to the start of the term. These windows will be as much as possible in the late evening and early morning hours to minimize disruptions
Based on the work the teams complete and the findings of that work we will provide you with further updates over these next few days before the start of the term.
IST eClass Support