Punchh Platform Elevated Response Time in APIs

Incident Report for Punchh DQ

Resolved

Final Status: Incident Resolved

All systems are operating normally, and the earlier performance degradation has been fully resolved.

Summary:
The elevated API response times were traced to parameter group changes introduced during recent RDS maintenance on the DairyQueen stack. Working with AWS, we implemented dynamic parameter updates under CR DEVOPS-15717, which immediately improved database efficiency.
• CPU utilization rand IOPS returned to normal baseline levels
• API latency and error rates have stabilized across all services

Resolution:
• Database performance has normalized
• All Punchh APIs and services are fully operational
• Continuous monitoring confirms sustained system health

The incident is now closed. Thank you for your patience and collaboration throughout the investigation and remediation process, we will provide a RCA within SLA.
Posted Oct 22, 2025 - 13:05 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 22, 2025 - 12:53 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Oct 22, 2025 - 12:53 PDT

Monitoring

We have successfully mitigated the elevated API response time issue impacting the Punchh platform.

Summary of Actions:
In collaboration with AWS, we identified that recent DB maintenance introduced significant parameter group changes that were contributing to high CPU and IOPS utilization on the DairyQueen RDS instance.

Current Observations:
- CPU utilization has dropped from ~100% to ~30%
- IOPS has decreased significantly
- API performance and response times have improved across impacted services

Next Steps:
- Continue to monitor RDS performance and API latency
- Apply static parameter updates during the upcoming maintenance window

We’ll continue to observe the system over the next few hours and provide a final update once stability is confirmed.
Posted Oct 22, 2025 - 12:43 PDT

Identified

Status Update: Slower API Performance

We’re currently seeing slower response times across the Punchh Platform APIs, which may affect how some features work for customers.

Impact:
- Some API calls may be delayed or fail intermittently.
- Mobile apps might show errors or take longer to load pages.
- Loyalty functions like check-ins, redemptions, or signups may be slower.
- POS systems might experience short delays in loyalty transactions.

Cause:
The issue is due to a database optimization process running on the Dairy Queen environment, which is using more system resources than usual and slowing down API responses.

What We’re Doing:
- Actively monitoring database performance and API response times.
- Working with our infrastructure team to speed up completion of the optimization.
- Exploring mitigation steps to restore normal performance faster.

We’ll share another update once the optimization completes or if we take further actions to improve performance.

Thank you for your patience while we work to resolve this quickly.
Posted Oct 22, 2025 - 10:59 PDT

Investigating

We are currently experiencing elevated response times in our APIs, resulting in downgraded performance that may impact your experience. Our team is actively working to identify and resolve the issue as quickly as possible.

Customer Impact:

Possible Failed API requests across Punchh API's
• Timeouts or errors displayed in the mobile app when navigating to different pages
• Timeouts or errors displayed when managing loyalty profile or rewards online
• Timeouts or errors for loyalty functionality at the POS
• Timeouts or errors from Punchh Platform Functions API
• Failed Redemptions
• Failed Checkins
• Failed Guest Lookups
• Failed signups

We apologize for any inconvenience this may cause and appreciate your patience. We will provide regular updates as we make progress. Thank you for your understanding and continued support.
Posted Oct 22, 2025 - 10:05 PDT
This incident affected: Web Components (Punchh API, POS API).