mixtral 1-526互联

use text-generation-inference to set up

run command

click to view command

docker run --gpus all --shm-size 1g -p 3000:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:1.3.3 \
    --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --num-shard 4 \
    --max-batch-total-tokens 1024000 \
    --max-total-tokens 32000

to access server Docker, using this command to build a bridge
ssh -f -N -L 3000:localhost:3000 ludaze@10.96.15.227

then run

click to view code

curl 127.0.0.1:3000/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

mixtral

mixtral-offloading

mixtral mixtral-offloading offloading硬件

最先模型mixtral hugging