CMU Computer Club has FLOODED ☹
503 Service Unavailable 200 OK
Late in the afternoon on Friday 25 November 2016, a water service piping component failed in Warner Hall,
flooding much of Warner and Cyert Halls,
including the CMU Computer Club's server room.
All of our services —
including Contrib web pages, club mail, club AFS, club shell servers, mirrors, hosted VMs and websites
— will be down until things dry out.
Updates will be posted here as we're able, as well as to
— Carnegie Mellon University Computer Club
Tue 13 Dec, 16:18 EST.
All services have been restored to normal operation.
Sun 11 Dec, 18:14 EST.
The majority of servers in the Drycas/VMS and VAX clusters are now online.
Wed 7 Dec, 16:33 EST.
The two collocated VM servers which had minor water incursion
were brought offline for a brief period between 14:06 and 14:37 EST today
for an unscheduled inspection and assessment by an insurance consultant.
Tue 6 Dec, 23:43 EST.
The final collocated VM server was booted this evening without issue.
At this time, all core services should be fully operational.
The Drycas/VMS and VAX clusters remain partially operational.
Mon 5 Dec, 23:14 EST.
Taking extra precautions, one of the two remaining affected
collocated VM servers has been booted without issue.
The other should be booted in the coming days.
One machine in the VAX cluster is now online.
Sun 4 Dec, 20:24 EST.
With the exception of the two machines hosting collocated VMs which got wet,
all servers running core services are now online
or will be brought online within the next few hours.
Most notably, we're now serving our many websites as normal, including Contrib.
Additionally, all shells and mirrors have been restored to normal service.
Mail delivery resumed last night and most of the queue was run through
during the early hours of Sunday.
Portions of the Drycas/VMS cluster have come online;
the VAX cluster remains offline for now.
Recovery updates from the old "failwhale" 503 page
have been archived to a historical page.
Sat 3 Dec, 23:10 EST.
Nameservers, mail servers, and KDCs have been brought back to full strength
(all three of each are now running,
as opposed to one of each which had been providing offsite backup),
and club AFS has been restored.
Room "weather" monitoring and machine status monitoring is now active,
though certain alerts to club members are presently disabled.
oyster is up;
unix.club.cc.cmu.edu temporarily points only to
Other shells up are the 32-bit
snail and an additional administrative shell.
Next priorities will be
the internal club wiki and Contrib web pages.
We are done for the evening, and plan to reconvene Sunday evening
to continue our work.
Sat 3 Dec, 18:29 EST.
We have been cleared by our contact with Insurance Services
to begin powering up any unaffected machines.
Our priorities tonight are mail, monitoring, AFS, and minimal shells.
Services such as collocated VMs, Contrib web pages, and mirrors
may have to wait a couple days further.
Thu 1 Dec, 23:08 EST.
Mains-power was restored to the room on Thursday,
though a small amount of underfloor work remains for Friday.
Club inspection of hardware is near completion.
Unfortunately, water has been found pooling in or on several items:
two of our collocated VM machines, one server in our Drycas cluster, a disused network switch,
and two Macintosh Plus
computers from the club's retro computing collection.
We believe that any such water damage will be of minimal effect,
but we will be undertaking extra caution when powering up these machines.
To the extent necessary, we will notify any organizations disproportionately affected
once we begin restoring other services, which may take a few days.
We still have no details from Insurance Services about their assessment of the room,
which must happen before we can power on any equipment.
Wed 30 Nov, 17:46 EST.
Facilities Management Services were working on evaluating and protecting
exposed wiring in the underfloor today before they were called away to a higher-priority area.
This work is expected to be completed by midday Thursday, at which point
mains-power to the room will be restored.
The new smoke detectors and sprinklers are in place.
Meanwhile, the tarps will be coming down and debris cleared as we continue to inspect hardware.
There is still no timetable on Insurance Services' inspections.
Tue 29 Nov, 20:23 EST.
New ceiling tiles are in place, with holes cut for the new smoke detectors
which should go in on Wednesday;
we're leaving the tarps in place over our servers to protect against dust
until that work is complete.
Club inspection of various hardware has begun and will continue through Wednesday evening.
We are waiting on Insurance Services' inspections of our equipment
and the underfloor wiring before powering on any machines.
Tue 29 Nov, 13:19 EST.
Remediation will continue in the affected buildings through the rest of the week.
As for our spaces, new ceiling tiles have arrived and are slated to be installed later today.
Once the (literal) dust from that work settles, we can begin removing the protective tarps
that cover our server racks, and begin moving other equipment back into the room.
Although we have been cleared to power up by Facilities Management Services,
both the club and the university's internal Insurance Services want to assess
the state of our equipment before any of it is powered on.
Our smoke detectors will also need to be replaced before we widely restore service,
but there is currently no timetable for that.
These efforts should begin this evening, but may take a few days.
Sun 27 Nov, 20:33 EST.
The light fixtures have been cleared of water, and damaged ceiling tiles have been removed.
Sun 27 Nov, 12:37 EST.
After remediation efforts, an initial inspection of our server room has been made.
There is still some residual water in the light fixtures,
and the damaged ceiling tiles are expected to be replaced Tuesday 29 November.
Much of the equipment in storage which was removed from the room during the emergent situation
to nearby areas will need to be sorted and returned to the room,
which will be our immediate priority before restoring any services.
Fri 25 Nov, 19:32 EST.
Shutting everything down until further notice. ☹
This includes just about any club service you possibly care about,
though a very small contingent of offsite machines are keeping us from going completely dark,
including serving this page.