Skip to content

Fix Dapr client startup deadlock in distributed deployment#713

Open
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-dapr-client-start-issue
Open

Fix Dapr client startup deadlock in distributed deployment#713
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-dapr-client-start-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 20, 2026

  • Analyzed the circular dependency: services wait for DB (via Dapr secrets) before starting HTTP server, but Dapr sidecar needs HTTP server to be running first
  • Added WaitForDatabaseInitializationAsync extension on IServiceProvider in Extensions.cs (with cancellation token support via WaitAsync)
  • Modified ConnectServerHostedServiceWrapper to implement IHostedLifecycleService, accept IServiceProvider instead of ConnectServer, defer server start to StartedAsync
  • Modified GameServerHostedServiceWrapper to implement IHostedLifecycleService, accept IServiceProvider, moved plugin config loading + adapter initialization + server start to StartedAsync
  • Modified ManagableServerStatePublisher to implement IHostedLifecycleService, accept IServiceProvider instead of IManageableServer, with separate InitializeServerAsync to avoid duplicate event subscriptions
  • Created ChatServerHostedServiceWrapper as new class in ChatServer.Host, replacing the factory-based hosted service registration
  • Updated ConnectServer.Host/Program.cs: replaced blocking WaitForUpdatedDatabaseAsync with non-blocking WaitForDatabaseConnectionInitializationAsync
  • Updated GameServer.Host/Program.cs: same, removed additional initialization calls (moved to GameServerHostedServiceWrapper.StartedAsync)
  • Updated ChatServer.Host/Program.cs: replaced factory AddHostedService with AddHostedService<ChatServerHostedServiceWrapper>, removed blocking DB wait
  • Updated FriendServer.Host/Program.cs: replaced blocking WaitForUpdatedDatabaseAsync with non-blocking WaitForDatabaseConnectionInitializationAsync
  • Updated GuildServer.Host/Program.cs: same as FriendServer
  • Fixed Gemini review: unsubscribe before subscribing PropertyChanged in InitializeServerAsync to prevent duplicate event subscriptions on retry
  • Fixed Gemini review: replaced silent-fail is List<PlugInConfiguration> checks in ChatServerHostedServiceWrapper and GameServerHostedServiceWrapper with fail-fast pattern (throw InvalidOperationException if wrong type)
Original prompt

This section details on the original issue you should resolve

<issue_title>Dapr client fail to start at first chance</issue_title>
<issue_description>Describe the bug
When starting up all services of the distributed deployment, the dapr clients fail to load at first chance. When the corresponding service host was fast enough to retrieve secrets, it might start up on a second (manual) attempt.

To Reproduce
Steps to reproduce the behavior:

  1. Set up the distributed docker compose, or soon the new Aspire project.
  2. Start the project

Expected behavior
All components should be running without errors or warnings (all displayed green) after a short initialization time.

Additional context
The reason for the fail is simple:
The dapr clients expect the app http api to work when they start.
However, the service hosts first wait for a database connection and therefore try to retrieve connection strings as secrets from the dapr client before they do fully start up. So we have a little circular reference, which doesn't work out well.

One solution could be, that the API of the service host is immediately available after start. The functionality (e.g. the connect server, the login server etc.) is made available later.
</issue_description>

Comments on the Issue (you are @copilot in this section)


⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

… to StartedAsync

Co-authored-by: sven-n <5238610+sven-n@users.noreply.github.com>
Agent-Logs-Url: https://github.com/MUnique/OpenMU/sessions/b9f6bd43-e155-4f43-ad0d-a5b304a95c1b
Copilot AI changed the title [WIP] Fix Dapr client fail to start at first chance Fix Dapr client startup deadlock in distributed deployment Mar 20, 2026
Copilot AI requested a review from sven-n March 20, 2026 23:01
@sven-n sven-n marked this pull request as ready for review March 21, 2026 15:13
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical startup deadlock issue in distributed deployments of the MUnique OpenMU project. The changes ensure that Dapr clients initialize correctly by decoupling the availability of the HTTP API from the readiness of game servers. This is achieved through the use of IHostedLifecycleService and non-blocking database initialization methods, improving the overall reliability and startup time of the application.

Highlights

  • Startup Deadlock Fix: Addresses a startup deadlock issue in distributed deployments where Dapr clients failed to initialize due to a circular dependency between service hosts and the Dapr sidecar.
  • Lifecycle Service Implementation: Leverages IHostedLifecycleService.StartedAsync in .NET 8+ to decouple HTTP API availability from game server readiness, ensuring the Dapr sidecar initializes correctly after the HTTP server is listening.
  • Refactored Hosted Services: Converted ConnectServerHostedServiceWrapper, GameServerHostedServiceWrapper, and ManagableServerStatePublisher to IHostedLifecycleService to defer initialization until after the HTTP server starts.
  • New Chat Server Hosted Service: Introduced ChatServerHostedServiceWrapper to replace the anonymous factory lambda, ensuring the ChatServerDefinition is resolved and initialized during StartedAsync.
  • Non-Blocking Database Initialization: Replaced the blocking WaitForUpdatedDatabaseAsync() call with the non-blocking WaitForDatabaseConnectionInitializationAsync() in Program.cs for various servers to prevent startup delays.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a startup deadlock in distributed deployments by refactoring hosted services to use IHostedLifecycleService. This defers database-dependent initializations until after the web server has started, breaking the circular dependency with the Dapr sidecar. The changes are consistent across multiple services and follow best practices for handling service startup dependencies in ASP.NET Core. I've identified a couple of areas for improvement regarding the robustness of plugin loading and a potential memory leak in an event subscription, for which I've provided suggestions.

Comment on lines +125 to +128
var server = this._serviceProvider.GetRequiredService<IManageableServer>();
this._data = new ServerStateData(server);
server.PropertyChanged += this.OnPropertyChanged;
this._server = server;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential issue here with multiple subscriptions to the PropertyChanged event. If an exception occurs after GetRequiredService but before this._server is set (e.g., in the ServerStateData constructor), the catch block will cause a retry. In the next iteration, GetRequiredService will return the same singleton IManageableServer instance, and another event handler will be subscribed. This leads to a memory leak and the handler being called multiple times for each event. It's safer to unsubscribe before subscribing to prevent this.

                var server = this._serviceProvider.GetRequiredService<IManageableServer>();
                server.PropertyChanged -= this.OnPropertyChanged; // Ensure single subscription
                server.PropertyChanged += this.OnPropertyChanged;
                this._data = new ServerStateData(server);
                this._server = server;

Comment on lines +49 to +52
if (this._serviceProvider.GetService<ICollection<PlugInConfiguration>>() is List<PlugInConfiguration> plugInConfigurations)
{
await this._serviceProvider.TryLoadPlugInConfigurationsAsync(plugInConfigurations).ConfigureAwait(false);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This conditional check on the type of the ICollection<PlugInConfiguration> could lead to silent failures. If the dependency injection is ever configured to provide a different type of collection, or if the service is not registered, this block will be skipped, and plugins won't be loaded, which could be hard to debug. A more robust approach would be to enforce the dependency on List<PlugInConfiguration> and fail fast if the assumption is not met.

        var plugInCollection = this._serviceProvider.GetRequiredService<ICollection<PlugInConfiguration>>();
        if (plugInCollection is not List<PlugInConfiguration> plugInConfigurations)
        {
            throw new InvalidOperationException($"The registered {nameof(ICollection<PlugInConfiguration>)} must be a {nameof(List<PlugInConfiguration>)} to be able to load plugin configurations.");
        }

        await this._serviceProvider.TryLoadPlugInConfigurationsAsync(plugInConfigurations).ConfigureAwait(false);

Comment on lines 49 to 52
if (this._serviceProvider.GetService<ICollection<PlugInConfiguration>>() is List<PlugInConfiguration> plugInConfigurations)
{
await this._initializer.InitializeAsync().ConfigureAwait(false);
this._isInitialized = true;
await this._serviceProvider.TryLoadPlugInConfigurationsAsync(plugInConfigurations).ConfigureAwait(false);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This conditional check on the type of the ICollection<PlugInConfiguration> could lead to silent failures. If the dependency injection is ever configured to provide a different type of collection, or if the service is not registered, this block will be skipped, and plugins won't be loaded, which could be hard to debug. A more robust approach would be to enforce the dependency on List<PlugInConfiguration> and fail fast if the assumption is not met.

        var plugInCollection = this._serviceProvider.GetRequiredService<ICollection<PlugInConfiguration>>();
        if (plugInCollection is not List<PlugInConfiguration> plugInConfigurations)
        {
            throw new InvalidOperationException($"The registered {nameof(ICollection<PlugInConfiguration>)} must be a {nameof(List<PlugInConfiguration>)} to be able to load plugin configurations.");
        }

        await this._serviceProvider.TryLoadPlugInConfigurationsAsync(plugInConfigurations).ConfigureAwait(false);

@sven-n
Copy link
Member

sven-n commented Mar 21, 2026

@copilot fix the issues described by gemini.

…plugin config type

Co-authored-by: sven-n <5238610+sven-n@users.noreply.github.com>
Agent-Logs-Url: https://github.com/MUnique/OpenMU/sessions/9f61bfbb-6724-44c3-a7bb-c52f3b6ddf2e
Copy link
Contributor Author

Copilot AI commented Mar 21, 2026

@copilot fix the issues described by gemini.

Fixed in commit 1778433:

  1. ManagableServerStatePublisher — Added server.PropertyChanged -= this.OnPropertyChanged before the subscribe in InitializeServerAsync to prevent duplicate event handlers if ServerStateData construction throws and the loop retries.

  2. ChatServerHostedServiceWrapper and GameServerHostedServiceWrapper — Replaced the silent-skip is List<PlugInConfiguration> pattern with a fail-fast check: if a ICollection<PlugInConfiguration> is registered but isn't a List<PlugInConfiguration>, an InvalidOperationException is thrown to surface the misconfiguration immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dapr client fail to start at first chance

2 participants