Darhost

2026-05-17 13:15:19

GitHub's April 2026 Service Incidents: A Detailed Breakdown

GitHub faced two major incidents on April 1, 2026: a prolonged code search outage and a brief audit log disruption. This article details causes, recovery, and improvements.

In April 2026, GitHub experienced ten incidents that degraded performance across its services. While most were minor, two significant disruptions on April 1—a prolonged code search outage and a brief audit log service disruption—drew particular attention. At the end of the month, GitHub published a blog post covering these and the major incidents of April 23 and 27, and enhanced its status page with richer details. This article provides a comprehensive look at the events, their root causes, recovery steps, and the measures introduced to prevent future occurrences.

Code Search Service Outage (April 1)

On April 1, 2026, GitHub’s code search service was fully unavailable for nearly three hours, followed by a period of stale results that lasted over six hours. The incident began at 14:40 UTC and ended with full recovery at 23:45 UTC.

GitHub's April 2026 Service Incidents: A Detailed Breakdown
Source: github.blog

What Happened?

Between 14:40 and 17:00 UTC, 100% of code search queries failed. After initial restoration at 17:00 UTC, search results returned but did not reflect any repository changes made after approximately 07:00 UTC that day—meaning over ten hours of updates were invisible. Full indexing caught up by 23:45 UTC, restoring current data.

Root Cause

The disruption originated during a routine infrastructure upgrade to the messaging system that coordinates code search indexing. An automated change was applied too aggressively, causing a coordination failure between internal services. This halted search indexing, and results gradually became stale. While the engineering team worked to recover the messaging infrastructure, an unintended service deployment cleared internal routing state, escalating the staleness into a complete outage.

Recovery and Resolution

Engineers restored the messaging infrastructure through a controlled restart, reestablishing coordination between services. They then reset the search index to a point-in-time before the disruption. No repository data was lost—the search index is a secondary, derived index and Git repositories were completely unaffected. Once re-indexing completed, all results reflected the current state of repositories.

Preventative Measures

GitHub is implementing several improvements:

  • Gradual upgrades with better health checks to catch problems before they cascade.
  • Deployment safeguards to prevent unintended changes during active incidents.
  • Faster recovery tooling to reduce time to restore service.
  • Better traffic isolation to prevent cascading impact from unexpected traffic spikes during outages.

Audit Log Service Disruption (April 1)

Earlier on the same day, the audit log service experienced a 28-minute loss of connectivity to its backing data store due to a failed credential rotation. The incident ran from 15:34 UTC to 16:02 UTC.

GitHub's April 2026 Service Incidents: A Detailed Breakdown
Source: github.blog

Impact

During this window, audit log history was unavailable via both the API and web UI. This resulted in 5xx errors for 4,297 API actors and 127 github.com users. Additionally, events created during the disruption were delayed by up to 29 minutes in the interface and event streaming. However, no audit log events were lost; all were ultimately written and streamed successfully. Customers using GitHub Enterprise Cloud with data residency were not impacted.

Recovery

GitHub was alerted to the infrastructure failure at 15:40 UTC—six minutes after the credential rotation failed. The team quickly restored connectivity and cleared the backlog of delayed events. By 16:02 UTC, the service was fully operational.

Ongoing Transparency Efforts

Beyond the April 1 incidents, GitHub also faced disruptions on April 23 and 27, which were detailed in a dedicated blog post. To improve transparency, the company has added more granular information to its status page, including real‑time metrics and post‑incident reports. These updates aim to give users clearer insight into service health and incident resolution timelines.

Conclusion

GitHub acknowledges that any downtime affects developer workflows and expresses thanks for users’ patience. The company continues to invest in both short‑term fixes and long‑term architectural improvements to enhance reliability. While April 2026 was a challenging month, the steps outlined here should reduce the frequency and severity of future incidents.