HuggingFace
To use a model hosted on HuggingFace, specify the huggingface.co path in the from field and, when needed, the files to include.
Example: Load a ML model to predict taxi trips outcomes​
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
name: hf_model
files:
- path: model.onnx
datasets:
- taxi_trips
Example: Load a LLM model to generate text​
models:
- from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phi
Example: Load a private model​
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
For more details on authentication, see below.
Example: Load a GGUF model​
models:
- from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
name: sloth-gguf
files:
- path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.gguf
note
Only GGUF model formats require a specific file path, other varieties (e.g. .safetensors) are inferred.
from Format​
The from key follows the following regex format:
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
Examples​
huggingface:username/modelname: Implies the latest version ofmodelnamehosted byusername.huggingface:huggingface.co/username/modelname:revision: Specifies a particularrevisionofmodelnamebyusername, including the optional domain.
Specification​
- Prefix: The value must start with
huggingface:. - Domain (Optional): Optionally includes
huggingface.co/immediately after the prefix. Currently no other Huggingface compatible services are supported. - Organization/User: The HuggingFace organization (
org). - Model Name: After a
/, the model name (model). - Revision (Optional): A colon (
:) followed by the git-like revision identifier (revision).
Access Tokens​
Access tokens can be provided for Huggingface models in two ways:
- In the Huggingface token cache (i.e.
~/.cache/huggingface/token). Default. - Via model params (see below).
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
Limitations
- ML models currently only support ONNX file format.
- LLM models do not support tool use when 'stream=true'.
- The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size.
