System and Solution Design for AI Streaming Responses with Polling

System Overview

This system involves a backend streaming API that leverages Azure Open AI to generate responses for a chat application. These responses are written directly to a distributed, horizontally scalable database (Azure Hyperscale Citus) using Hangfire for background job processing. The frontend of the application employs a polling mechanism to fetch the latest responses at fixed intervals to display in the user interface.

Components and Design

Backend Streaming API

The backend API streams responses from Azure Open AI. These responses are generated in real-time as users interact with the chat interface.

Design Considerations:

  • Asynchronous Processing: The backend leverages Hangfire for handling background jobs. This allows for efficient processing and writing of responses to the database, without blocking other operations.
  • Scalability: The backend is designed to be horizontally scalable, allowing for more instances to be spun up as demand increases. This is crucial for maintaining performance under high loads.
  • Error Handling and Retry Mechanisms: Ensuring robust error handling and retry mechanisms in the backend is crucial for managing any failures or interruptions in the Azure Open AI service.


The frontend polls the backend API at regular intervals to fetch the latest responses and update the chat interface. The frontend design should consider the following:

  • User Experience: Though the polling mechanism might introduce a slight delay in response display, optimizing the polling frequency can help achieve a balance between user experience and server load.
  • Smart Polling: Implementing “smart” or adaptive polling can enhance efficiency. The polling frequency can be adjusted based on factors like user activity and conversation length.


The system employs Azure Hyperscale (Citus) as a database solution, which allows for automatic data sharding and distribution across multiple nodes. The database should consider the following aspects:

  • Concurrency: Azure Hyperscale (Citus) can handle high concurrency, making it suitable for managing frequent write operations from multiple backend instances.
  • Reduced Latency: By distributing data across several nodes, the database can significantly reduce query latency, making it suitable for a polling-based system.

Potential Challenges and Solutions

  • Polling Overhead: Frequent polling could increase server load and costs. However, with smart polling and load balancing across a horizontally scalable backend, this challenge can be mitigated.
  • Real-Time Experience: While polling can result in slight delays in updating responses, it’s a trade-off for the scalability benefits the system offers. Moreover, optimizing polling frequency can minimize this delay.
  • Database Performance: While Azure Hyperscale (Citus) is designed for high performance, the application design still needs to ensure efficient database querying to prevent potential bottlenecks.


The Particlesy system design leverages the power of Azure Open AI, Hangfire, and Azure Hyperscale (Citus) to build a scalable chat application with a polling mechanism. Despite the potential challenges associated with polling, the system’s horizontal scalability and efficient database management provide a robust solution for a real-time chat application.

Stay Ahead with Particlesy

Sign up now to receive the latest updates, exclusive insights, and tips on how to maximize your business potential with AI.