It has to distribute the workload across multiple GPUs, tensor parallelism, pipeline parallelism, data parallelism, all kinds of parallelism, expert parallelism, all kinds of parallelism, distributing the workload across multiple GPUs, processing it as fast as possible.
它必須在多個 GPU、張量並行、流水線並行、數據並行、各種並行、專家並行、各種並行中分配工作負載,在多個 GPU 上儘可能快地處理工作負載。