feat(translation-worker): add translation worker with argostranslate#8
feat(translation-worker): add translation worker with argostranslate#8winsomeglint wants to merge 5 commits intomainfrom
Conversation
00907da to
3468b67
Compare
c12bc74 to
efaa435
Compare
efaa435 to
271c4c1
Compare
|
|
||
|
|
||
| # Temporal utils | ||
| async def async_batches( |
There was a problem hiding this comment.
I feel like these ones could go in here: https://github.com/ICIJ/icij-python/tree/main/icij-common/icij_common
Since there are not datashare / temporal specific but more asyncio / iteratools related utils useful in all icij python projects
|
|
||
| task: str = Field(default=TRANSLATION_TASK_NAME, frozen=True) | ||
| device: str = Field(default=CPU, frozen=True) | ||
| batch_size: int = 16 |
There was a problem hiding this comment.
I think we should split between the TranslationConfig and TranslationWorkerConfig.
The TranslationConfig should hold all parameters which can be tweaked between calls, it should be filled by the caller/dev while the TranslationWorkerConfig should be created and set bet the person who deploys the worker and can tune parameters according to the actual worker resources.
For some parameters its not trivial to decide but for others it's more straightforward.
I'd say:
deviceshould be inTranslationWorkerConfig(the worker knows it it's a CPU or GPU worker)batch_size/max_parallel_batchesit's tempting to put it in theTranslationConfig, but probably more appropriate in theTranslationWorkerConfig. This way we can deploy the same code, but just change the deployments if workers are hitting OOM or are not running at full speedbeam_size/num_hypothesesshould beTranslationConfigsince it impact the translation outputinter_threads/intra_threads/compute_typesince to be more worker options than runtime options
Adds a translation worker under
translation-workerpowered byctranslate2andargostranslate.