Outages and Maintenance
-
The Hathi cluster has been taken down for emergency maintenance to address a security issue. Engineers are investigating the situation now and we will update this as we are able to determine a return to service.
-
Conte Cluster Emergency Maintenance
The Conte Cluster has been returned to production. If you have any questions or concerns, please send an email to rcac-help@purdue.edu. This is a gentle reminder that the Conte cluster will be offline tomorrow (Wed) beginning at 8:00am for an emerge...
-
The Fortress archive will be unavailable beginning at Thursday, August 17th, 2017 at 8:00am, for regular maintenance and will return to full production that day by 5:00pm. During this maintenance, Fortress will have some availability features and fir...
-
HalsteadGPU Cluster Maintenance
HalsteadGPU is back in full production and job scheduling has been resumed. Please let us know if you see any issues at rcac-help@purdue.edu. Please note, the next planned maintenance will be Wednesday, September 13. Original Message Wednesday, Augu...
-
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all systems while this issue is being add...
-
Unscheduled outages on portions of clusters
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at rcac-help@purdue.edu. UPDATE July 20, 2017 2:54pm Power has been restored to...
-
The issues with Globus have been resolved, and the Fortress archive is fully restored to normal operations. This concludes this maintenance. Update: July 19, 2017 9:36pm: The work on Fortress has been completed and it is in normal production for al...
-
The Hammer cluster has been successfully returned to full production. This concludes this maintenance. Update: July 18, 2017 5:01pm: The Hammer cluster has most of the reconfiguration complete, but work continues on a good portion of the nodes whic...
-
Email notifications from Research Computing website broken
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing website are not working. Some people are receiving no email and others are receiving damaged emails. T...
-
Email to "rcac-help@purdue.edu" not Working
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu email address stopped accepting email. Anything sent since then has not been received. We are workin...
-
Hathi, Radon, and Specialized Cluster Maintenance
Update message: After performing necessary repairs, Radon has been returned to service. -- Previous message: After consulting with vendor support, we have determined that Radon has experienced a failure in its network hardware. Parts and and a vendo...
-
Extension of monthly GitHub maintenance
Engineers and GitHub support have resolved the issues encountered earlier this afternoon and GitHub is back online and running normally again. We apologize for the disruption this may have caused. If you encounter any issues please let us know at rca...
-
The Fortress Archive will be unavailable beginning at Wednesday, May 31st, 2017 at 8:00am EDT, for scheduled maintenance. Fortress will return to full production by Wednesday, May 31st, 2017 at 5:00pm EDT. During this time, Fortress will have the Hig...
-
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, and we have not seen any issues in them doing so. This outage is closed. Update: May 25, 2017 5:00pm...
-
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you encounter any lingering issues please let us know at rcac-help@purdue.edu Original Message Some core...
-
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated the systems which are at the core of this phenomenon and are taking steps to restore normal service....
-
Scratch system failure on Rice, Snyder, Hammer
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice, Snyder, and Hammer. Update Storage engineers are working with the system vendor to evaluate a propos...
-
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the fix is underway. We anticipate returning Conte to service by 3pm today. Original message The Conte...
-
Emergency Maintenance on Rice, Snyder, Hammer
As of 7:15pm, all queues on these clusters have resumed scheduling. Nodes will continue to be upgraded as they finish current jobs and become available. In the interim, the clusters will run in a degraded state, but will continue to start new jobs...
-
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but have resumed. We expect no job loss to have occurred. This issue is now resolved.