Resolved -
Final Status: Incident Resolved
All systems are operating normally, and the earlier performance degradation has been fully resolved.
Summary:
The elevated API response times were traced to parameter group changes introduced during recent RDS maintenance on the DairyQueen stack. Working with AWS, we implemented dynamic parameter updates under CR DEVOPS-15717, which immediately improved database efficiency.
• CPU utilization rand IOPS returned to normal baseline levels
• API latency and error rates have stabilized across all services
Resolution:
• Database performance has normalized
• All Punchh APIs and services are fully operational
• Continuous monitoring confirms sustained system health
The incident is now closed. Thank you for your patience and collaboration throughout the investigation and remediation process, we will provide a RCA within SLA.
Oct 22, 13:05 PDT
Monitoring -
A fix has been implemented and we are monitoring the results.
Oct 22, 12:53 PDT
Identified -
The issue has been identified and a fix is being implemented.
Oct 22, 12:53 PDT
Monitoring -
We have successfully mitigated the elevated API response time issue impacting the Punchh platform.
Summary of Actions:
In collaboration with AWS, we identified that recent DB maintenance introduced significant parameter group changes that were contributing to high CPU and IOPS utilization on the DairyQueen RDS instance.
Current Observations:
- CPU utilization has dropped from ~100% to ~30%
- IOPS has decreased significantly
- API performance and response times have improved across impacted services
Next Steps:
- Continue to monitor RDS performance and API latency
- Apply static parameter updates during the upcoming maintenance window
We’ll continue to observe the system over the next few hours and provide a final update once stability is confirmed.
Oct 22, 12:43 PDT
Identified -
Status Update: Slower API Performance
We’re currently seeing slower response times across the Punchh Platform APIs, which may affect how some features work for customers.
Impact:
- Some API calls may be delayed or fail intermittently.
- Mobile apps might show errors or take longer to load pages.
- Loyalty functions like check-ins, redemptions, or signups may be slower.
- POS systems might experience short delays in loyalty transactions.
Cause:
The issue is due to a database optimization process running on the Dairy Queen environment, which is using more system resources than usual and slowing down API responses.
What We’re Doing:
- Actively monitoring database performance and API response times.
- Working with our infrastructure team to speed up completion of the optimization.
- Exploring mitigation steps to restore normal performance faster.
We’ll share another update once the optimization completes or if we take further actions to improve performance.
Thank you for your patience while we work to resolve this quickly.
Oct 22, 10:59 PDT
Investigating -
We are currently experiencing elevated response times in our APIs, resulting in downgraded performance that may impact your experience. Our team is actively working to identify and resolve the issue as quickly as possible.
Customer Impact:
Possible Failed API requests across Punchh API's
• Timeouts or errors displayed in the mobile app when navigating to different pages
• Timeouts or errors displayed when managing loyalty profile or rewards online
• Timeouts or errors for loyalty functionality at the POS
• Timeouts or errors from Punchh Platform Functions API
• Failed Redemptions
• Failed Checkins
• Failed Guest Lookups
• Failed signups
We apologize for any inconvenience this may cause and appreciate your patience. We will provide regular updates as we make progress. Thank you for your understanding and continued support.
Oct 22, 10:05 PDT