HuggingFace

To use a model hosted on HuggingFace, specify the huggingface.co path in the from field and, when needed, the files to include.

Example: Load a ML model to predict taxi trips outcomes

models:
  - from: huggingface:huggingface.co/spiceai/darts:latest
    name: hf_model
    files:
      - path: model.onnx
    datasets:
      - taxi_trips

Example: Load a LLM model to generate text

models:
  - from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
    name: phi

Example: Load a private model

models:
  - name: llama_3.2_1B
    from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
    params:
      hf_token: ${ secrets:HF_TOKEN }

For more details on authentication, see below.

Example: Load a GGUF model

models:
  - from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
    name: sloth-gguf
    files:
      - path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.gguf

note

Only GGUF model formats require a specific file path, other varieties (e.g. .safetensors) are inferred.

`from` Format

The from key follows the following regex format:

\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z

Examples

huggingface:username/modelname: Implies the latest version of modelname hosted by username.
huggingface:huggingface.co/username/modelname:revision: Specifies a particular revision of modelname by username, including the optional domain.

Specification

Prefix: The value must start with huggingface:.
Domain (Optional): Optionally includes huggingface.co/ immediately after the prefix. Currently no other Huggingface compatible services are supported.
Organization/User: The HuggingFace organization (org).
Model Name: After a /, the model name (model).
Revision (Optional): A colon (:) followed by the git-like revision identifier (revision).

Access Tokens

Access tokens can be provided for Huggingface models in two ways:

In the Huggingface token cache (i.e. ~/.cache/huggingface/token). Default.
Via model params (see below).

models:
  - name: llama_3.2_1B
    from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
    params:
      hf_token: ${ secrets:HF_TOKEN }

Limitations

ML models currently only support ONNX file format.
LLM models do not support tool use when 'stream=true'.
The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size.

Example: Load a ML model to predict taxi trips outcomes​

Example: Load a LLM model to generate text​

Example: Load a private model​

Example: Load a GGUF model​

from Format​

Examples​

Specification​

Access Tokens​