Resolved -
Summary of Event: Pega endured a service impacting event caused by issues with its
service provider AWS. This event, while continuous in nature, had two distinct areas of
impact on Pega Cloud, which we detail independently. The first timeline of the service
impacting event occurred from 12:11 AM PDT to 3:35 AM PDT time, and the second event
started at 06:42 AM PDT to 12:15 PM PDT. Pega worked diligently to mitigate as much direct
client impact to core services as possible. However, during such a significant provider
issue, there was an impact that could not be mitigated. The details of the issues that our
clients may have experienced, features or capabilities impacted, and in what Pega Cloud
Deployment Regions are described below:
First Service Impacting Event:
• Potential production runtime issues – those that could impact the ability to do work in
production
o Adding attachments from Infinity, such as within a case for clients deployed in
the us-east-1 Deployment Region.
o SFTP - pushing documents and cloud file storage synchronization for clients
deployed in the us-east-1 Deployment Region.
o Background processing degradation for clients deployed in the us-east-1
Deployment Region.
o Search degradation for clients deployed us-east-1 Deployment Region.
o Pega GenAI for clients using AWS Bedrock models for clients deployed in the us-
east-1 Deployment Region.
• Service delivery: The following were disabled due to the service impacting event they
did not cause degradation or downtime.
o New environment provisioning & upgrade tasks for existing systems that are
orchestrated by Pega were failing and were subsequently put on hold.
o Self-service system restarts initiated from My Pega Cloud.
o Pega hibernation wake-up events. This would have resulted in some
environments, such as those used for development, or other testing being
unavailable for use.
o PDC event monitoring across all US instances, excluding the EU instance which
remained operational.
Second service impacting event:
• Service delivery: The following were disabled due to the service impacting event they
did not cause degradation or downtime.
o All Pega Cloud clients would not be able to initiate self-service environment
restarts in My Pega Cloud.
o My Pega Cloud, some self-service capabilities would’ve been unavailable like
allow-list managed for clients deployed in us-east-1.
• Potential production runtime issues – those that could impact the ability to do work in
production
o Clients could have experienced performance degradation if they exceeded the
systems point in time capacity, due to Pega’s inability to auto-scale resources
dynamically. This might’ve surfaced as slowness for users, OR for functionality like
background processing. This impacted clients in us-east-1 Deployment Region
o Clients would’ve experienced search degradation in the us-east-1Deployment
Region.
o In exceedingly rare scenarios (1 identified occurrence), because of Pega’s inability to
automatically replace failed underlying systems with healthy one’s service
availability could’ve been impacted.
Pega takes all service impacting events seriously, irrespective of size or scope. We strive to
ensure our communication is transparent and provides our clients with the details they
need, both during and after an event, of which this summary is a critical part. Overall, the
Pega Cloud service was highly effective in mitigating the impact our clients faced through a
combination of architecture, powerful orchestration and our cloud operations personnel
expertise. A formal RCA will be produced and distributed as soon as we have received and
processed the RCA from AWS. The RCA will provide additional information on the event,
and the steps we can take to even further mitigate any impact from future service
impacting events.
Oct 20, 23:38 UTC
Monitoring -
PEGA continues to see progress in the recovery efforts and will provide an update once AWS has confirmed the event has been fully mitigated.
Oct 20, 21:46 UTC
Update -
PEGA continues to see progress in the recovery efforts and will provide an update once AWS has confirmed the event has been mitigated.
Oct 20, 21:02 UTC
Identified -
PEGA is seeing an improvement in overall network performance and continues to monitor the restoration progress of AWS. Once all services have recovered PEGA has received confirmation from AWS that the restoration process is successful, the capabilities to restart environments will be reenabled.
Oct 20, 19:25 UTC
Investigating -
Due to the ongoing AWS outage, PEGA has temporarily disabled the restart feature (self service and PEGA assisted) as a precautionary measure to maintain service stability. PEGA will provide updates based on AWS updates. If you require assistance or need to initiate a restart, please contact our support team via the phone lines.
PEGA continues to work closely with AWS following the ongoing disruption affecting cloud services globally and are actively monitoring the situation. If you experience any service disruptions or anomalies, please create a support ticket so we can investigate further. PEGA appreciates your patience and understanding as we work through this issue.
Oct 20, 16:40 UTC
Update -
Due to the ongoing AWS outage, PEGA has temporarily disabled the restart feature (self service and PEGA assisted) as a precautionary measure to maintain service stability. PEGA will provide updates based on AWS updates. If you require assistance or need to initiate a restart, please contact our support team via the phone lines.
PEGA continues to work closely with AWS following the ongoing disruption affecting cloud services globally and are actively monitoring the situation. If you experience any service disruptions or anomalies, please create a support ticket so we can investigate further. PEGA appreciates your patience and understanding as we work through this issue.
Oct 20, 16:38 UTC
Update -
PEGA continues to work closely with AWS following the earlier disruption affecting cloud services globally. Many impacted services have recovered, however, PEGA is also aware that Amazon is still reporting ongoing network issues. If you encounter any service disruptions or anomalies, please don’t hesitate to create a support ticket so we can investigate further. Thank you for your patience and understanding as we monitor the situation.
Oct 20, 15:12 UTC
Update -
The majority of services have recovered, full restoration is still ongoing, minor degradation to services may still occur.
Oct 20, 13:49 UTC
Update -
The majority of services have recovered, though minor degradation may still occur as full restoration progresses.
Oct 20, 12:33 UTC
Monitoring -
A disruption at AWS is impacting some cloud services globally. Our teams are actively monitoring the situation and working with AWS to ensure service restoration.
This issue is causing systems on PDC to appear incorrectly as "Offline" and is impacting the availability of monitoring data.
In the interim, Logs can be downloaded from MyPegaCloud.
Oct 20, 11:08 UTC
Update -
We are continuing to investigate this issue.
Oct 20, 09:58 UTC
Investigating -
A disruption at AWS is impacting some cloud services globally. Our teams are actively monitoring the situation and working with AWS to ensure service restoration
Oct 20, 09:50 UTC