use text-generation-inference to set up
run command
click to view command
docker run --gpus all --shm-size 1g -p 3000:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:1.3.3 \
--model-id mistralai/Mixtral-8x7B-Instruct-v0.1 \
--num-shard 4 \
--max-batch-total-tokens 1024000 \
--max-total-tokens 32000
to access server Docker, using this command to build a bridge
ssh -f -N -L 3000:localhost:3000 ludaze@10.96.15.227
then run
click to view code
curl 127.0.0.1:3000/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'