Skip to content

feat(translation-worker): add translation worker with argostranslate#8

Open
winsomeglint wants to merge 5 commits intomainfrom
temporal-translation
Open

feat(translation-worker): add translation worker with argostranslate#8
winsomeglint wants to merge 5 commits intomainfrom
temporal-translation

Conversation

@winsomeglint
Copy link
Copy Markdown
Contributor

@winsomeglint winsomeglint commented Mar 5, 2026

Adds a translation worker under translation-worker powered by ctranslate2 and argostranslate.

@winsomeglint winsomeglint force-pushed the temporal-translation branch 2 times, most recently from 00907da to 3468b67 Compare March 5, 2026 15:21
@winsomeglint winsomeglint changed the title initial commit translation worker Mar 10, 2026
@winsomeglint winsomeglint force-pushed the temporal-translation branch 4 times, most recently from c12bc74 to efaa435 Compare March 12, 2026 11:19
@winsomeglint winsomeglint force-pushed the temporal-translation branch from efaa435 to 271c4c1 Compare March 12, 2026 11:19
@winsomeglint winsomeglint changed the title translation worker (chore:translation-worker): add translation worker with argostranslate Mar 12, 2026
@winsomeglint winsomeglint requested a review from ClemDoum March 12, 2026 11:23


# Temporal utils
async def async_batches(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these ones could go in here: https://github.com/ICIJ/icij-python/tree/main/icij-common/icij_common
Since there are not datashare / temporal specific but more asyncio / iteratools related utils useful in all icij python projects


task: str = Field(default=TRANSLATION_TASK_NAME, frozen=True)
device: str = Field(default=CPU, frozen=True)
batch_size: int = 16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split between the TranslationConfig and TranslationWorkerConfig.
The TranslationConfig should hold all parameters which can be tweaked between calls, it should be filled by the caller/dev while the TranslationWorkerConfig should be created and set bet the person who deploys the worker and can tune parameters according to the actual worker resources.

For some parameters its not trivial to decide but for others it's more straightforward.
I'd say:

  • device should be in TranslationWorkerConfig (the worker knows it it's a CPU or GPU worker)
  • batch_size / max_parallel_batches it's tempting to put it in the TranslationConfig, but probably more appropriate in the TranslationWorkerConfig. This way we can deploy the same code, but just change the deployments if workers are hitting OOM or are not running at full speed
  • beam_size / num_hypotheses should be TranslationConfig since it impact the translation output
  • inter_threads / intra_threads / compute_type since to be more worker options than runtime options

@winsomeglint winsomeglint changed the title (chore:translation-worker): add translation worker with argostranslate feat(translation-worker): add translation worker with argostranslate Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants