Train & Finetune
Train any model with client.train() or finetune LLMs with client.finetune() - one call handles upload, container run, and polling.
AfriLink
SDK guides for API-key authentication, A100 finetuning, custom containers, model downloads, and pricing.
AfriLink gives you one-line access to a dedicated NVIDIA A100 80 GB hosted by OpenToken for training (any framework — YOLOv8, custom PyTorch, etc.) and finetuning (LoRA/QLoRA for LLMs). The SDK handles authentication, dataset upload, GPU-aware docker run, status polling, model download, and per-job billing. Install with pip install 'afrilink-sdk[build]'.
Train any model with client.train() or finetune LLMs with client.finetune() - one call handles upload, container run, and polling.
One notebook secret, instant auth, no email/password prompts and no 12-hour certificates to refresh. Reads AFRILINK_API_KEY from Colab Secrets, Kaggle Secrets, or your env.
Jobs run on a dedicated NVIDIA A100 80 GB hosted by OpenToken as ephemeral Docker containers. No SLURM, no queue, no shared multi-tenant noise.
Bring your own base image + pip / apt deps + model source. The SDK builds on Cloud Build, pushes to a private registry, runs on the A100, and caches by spec hash so re-runs are instant.
pip install 'afrilink-sdk[build]'from afrilink import AfriLinkClientclient = AfriLinkClient()client.authenticate() # reads AFRILINK_API_KEY from notebook secrets / envjob = client.finetune( model="qwen2.5-0.5b", training_mode="low", data=your_dataframe, gpus=1, time_limit="01:00:00",)result = job.run(wait=True)if result["status"] == "completed": client.download_model(result["job_id"], "./my-model") from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B") model = PeftModel.from_pretrained(base, "./my-model") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B") out = model.generate(**tokenizer("Hello!", return_tensors="pt"), max_new_tokens=64) print(tokenizer.decode(out[0], skip_special_tokens=True))pip install 'afrilink-sdk[build]'The [build] extras pull cryptography + requests, needed for the custom-container path. Without them only the curated client.train() and client.finetune() paths work.
The core package has zero required dependencies — heavy libraries are only loaded at the point you use them and are pre-installed in most notebook environments.
As of v0.8.x the SDK uses stateless API-key auth — no email/password prompts, no 12-hour certificate refreshes.
Get an API key
afk_live_… value (shown once).AFRILINK_API_KEY.Where to set the key
| Environment | How to set |
|---|---|
| Google Colab | 🔑 sidebar → Add secret → name AFRILINK_API_KEY, paste, enable for notebook |
| Kaggle | Add-ons → Secrets → name AFRILINK_API_KEY, paste, attach to notebook |
| Local Jupyter / VS Code | os.environ["AFRILINK_API_KEY"] = "afk_live_…" before client.authenticate() |
| Anywhere | Pass directly: client.authenticate(api_key="afk_live_…") |
What happens at auth time
| Phase | What runs |
|---|---|
| 1. DataSpires session | The SDK exchanges your API key at api.dataspires.com for a short-lived Supabase JWT used for billing writes |
| 2. A100 reachability | Silent SSH probe to the OpenToken A100 to confirm your slot is live |
Both phases together take ~1–2 seconds. The session keeps the JWT in memory for the kernel lifetime — no on-disk state. To rotate the key, revoke it on the dashboard and mint a new one.
The dedicated GPU node the SDK runs on, hosted by opentoken.global:
| Component | Specification |
|---|---|
| GPU | 1× NVIDIA A100 PCIe |
| GPU memory | 80 GB HBM2e |
| FP64 performance | 9.7 TFLOPS |
| FP32 performance | 19.5 TFLOPS |
| TensorFloat-32 | 156 TFLOPS |
| BF16 / FP16 (tensor cores) | 312 TFLOPS |
| CPU cores | 12 |
| System RAM | 82 GB |
| Job runtime | Containerised (Docker, CUDA-aware via --gpus all) |
Per-job memory guide for 1× A100 80 GB:
| Model size | Training mode | Fits on 1 GPU? |
|---|---|---|
| 0.5B – 1B | low (QLoRA 4-bit) | yes |
| 3B – 7B | low / medium | yes |
| 7B – 13B | low (QLoRA) | yes |
| 13B | high (bf16) | tight — checkpoint-heavy |
| 30B+ | low (QLoRA) | marginal |
Billing: $2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance — invoice appears on the Billing Dashboard in real time.
Use client.list_available_models() to browse, or filter with size="tiny". For anything outside this registry, use client.build_image() with a custom model_source.
| Model | Type | Params | Min VRAM |
|---|---|---|---|
| Qwen 2.5 0.5B | Text | 0.50B | 4 GB |
| Gemma 3 270M | Text | 0.27B | 2 GB |
| Llama 3.2 1B | Text | 1.00B | 4 GB |
| DeepSeek R1 1.5B | Text | 1.50B | 6 GB |
| Ministral 3B | Text | 3.30B | 8 GB |
| SmolVLM 256M | Vision | 0.26B | 2 GB |
| InternVL2 1B | Vision | 1.00B | 4 GB |
| Moondream 2 | Vision | 1.90B | 8 GB |
| Florence 2 Base | Vision | 0.23B | 4 GB |
| LLaVA 1.5 7B | Vision | 7.00B | 16 GB |
The A100 backend has a single GPU; gpus>1 silently clamps to 1 with a console note.
| Mode | Strategy | Quantization | Best For |
|---|---|---|---|
low | QLoRA (rank 8) | 4-bit | Quick experiments, small datasets |
medium | LoRA (rank 16) | 8-bit / none | Balanced quality and cost |
high | Full LoRA (rank 64) | None | Production-grade training runs |
| Type | How it's handled |
|---|---|
pandas.DataFrame | Serialised to JSONL, uploaded over SCP to /mnt/data/sdk-jobs/<job_id>/input/ |
datasets.Dataset | Saved to disk and uploaded as a directory |
| File path (local) | JSONL or CSV file uploaded |
Archive (.tar.gz / .zip) | Uploaded and extracted inside the container |
DataFrame should have a text column with the full prompt + response (Alpaca-style or chat template). Inside the container the dataset is at /workspace/job/input/.
Use client.train() to run any training script in a curated container. For LoRA/QLoRA LLM fine-tuning, use client.finetune() instead.
job = client.train( script="train_yolo.py", # your training script container="afrilink-yolo", # curated container (Ultralytics + PyTorch) data="./dataset/", # uploaded automatically over SSH data_config="dataset.yaml", # YOLO config file gpus=1, # A100 backend has 1 GPU; clamps automatically time_limit="02:00:00",)result = job.run(wait=True)print(job.get_logs(tail=50))client.download_model(result["job_id"], "./yolo-out")Available curated containers:
| Name | Frameworks | Use case |
|---|---|---|
afrilink-yolo | Ultralytics, PyTorch, torchvision | Object detection, segmentation, pose |
afrilink-finetune | PyTorch, Transformers, PEFT, bitsandbytes | LLM fine-tuning (used by client.finetune()) |
Need a different stack? Use client.build_image() / client.build_and_train() below.
If the curated containers don't have the framework, version, or model you need, define it yourself. Cloud Build builds the image, Artifact Registry hosts it, the OpenToken A100 runs it ephemerally.
# Define exactly the environment your training needs.spec = dict( base_image="pytorch", # preset pip_packages=["transformers>=4.45", "accelerate>=0.34"], apt_packages=["git"], model_source={ "kind": "huggingface", "id": "Qwen/Qwen2.5-0.5B-Instruct", },)# Check the cache first - skip the build if a matching image already exists.hit = client.find_existing_image(**spec)if hit: print("cache hit:", hit["image"])# Build (or skip if cached) + run + cleanup, in one call.result = client.build_and_train( **spec, script="my_train.py", gpus=1, time_limit_hours=1.0, reuse_existing_image=True, # default; False forces a fresh build)client.download_model(result["run"]["job_id"], "./output")Presets for base_image:
| Preset | Resolves to |
|---|---|
pytorch | pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime |
pytorch-2.4 | pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime |
pytorch-cpu | pytorch/pytorch:2.5.0-cpu-runtime (CPU-only) |
cuda-12.4 | nvidia/cuda:12.4.0-runtime-ubuntu22.04 |
ultralytics | ultralytics/ultralytics:latest |
You can also pass any full image:tag string.
Model sources for model_source=:
| Kind | Required fields | Example |
|---|---|---|
huggingface | id, optional revision/subfolder | {"kind":"huggingface","id":"meta-llama/Llama-3.2-1B"} |
url | url | {"kind":"url","url":"https://…/weights.tar.gz"} |
git | url, optional revision | {"kind":"git","url":"https://github.com/openai/whisper.git"} |
gs | uri | {"kind":"gs","uri":"gs://bucket/checkpoints/"} |
s3 | uri | {"kind":"s3","uri":"s3://bucket/checkpoints/"} |
Models are fetched at container runtime, not baked at build time — keeps your images thin (~2 GB instead of 7+ GB) and means you can iterate on deps without re-shipping weights. The downloaded model lands at /workspace/models/<sanitised_id>/ and the path is exposed via the MODELS_DIR env var to your script.
For gated HF models (Llama, Gemma, etc.): add HUGGINGFACE_TOKEN as a notebook secret and the SDK forwards it to the container automatically.
Cache: the SDK hashes your spec (base_image + pip/apt packages + model source) and bakes the hash into the image as a Docker label. Re-running the same spec hits the cache instantly — first build takes ~5 min, subsequent runs go straight to the A100.
AfriLinkClient
| Method | Description |
|---|---|
authenticate(api_key=None) | Resolves the API key (arg, env, Colab/Kaggle Secrets), exchanges at api.dataspires.com, probes the A100. ~1–2 s. |
finetune(model, training_mode, data, gpus, ...) | Create a FinetuneJob in the curated afrilink-finetune container. Call .run() to submit. |
train(script, container, data, gpus, ...) | Create a TrainJob in a curated container. Call .run() to submit. |
find_existing_image(base_image, pip_packages, ...) | Returns {image, source, spec_hash} if a matching image is on the A100 or in Artifact Registry. None otherwise. |
build_image(base_image, pip_packages, script, model_source, ...) | Build a custom container on Cloud Build, push to Artifact Registry. Doesn't run it yet — useful if you want to inspect the image first. |
build_and_train(..., reuse_existing_image=True) | One-shot: cache-check → build (or skip) → run on the A100 → ephemeral cleanup. |
download_model(job_id, local_dir) | Pull the output/ directory from the A100. Adapter files land in <local_dir>/output/. |
list_available_models(size=None) | Browse the curated model registry. |
list_containers() | List available curated training containers. |
cancel_job(job_id) | Stop + remove a running container on the A100. |
run_command(cmd) | Execute a shell command on the A100 over SSH. |
TrainJob / FinetuneJob (returned by client.train() / client.finetune())
| Method / property | Description |
|---|---|
run(wait=True) | Submit to the A100. wait=True polls docker inspect until done. |
cancel() | Stop + remove the container with docker stop && docker rm. |
get_logs(tail=100) | Fetch recent log lines via docker logs --tail. |
estimated_cost_usd() | Estimate max cost based on the time limit. |
status | Current state — one of queued, running, completed, failed, cancelled. |
job_id | AfriLink job ID (8-char UUID prefix). |
container_id | Docker container ID on the A100 (set after run()). |
run() returns a dict with job_id, container_id, status, output_dir, and a billing sub-dict with the GPU-minutes and cost charged. Always check result["status"] before downloading.
Pay-as-you-go at $2.00 per GPU-hour, billed by wall-clock with a 1-minute minimum. Build-time on Cloud Build is absorbed by the platform — you only pay GPU-time. Add credits via card payment or redeem voucher codes on your Billing Dashboard.
Query the inline reference manual from any notebook cell — no internet required:
import afrilinkafrilink/help # index of all topicsafrilink/quickstart # getting startedafrilink/auth # authenticationafrilink/finetune # finetune parameters & modesafrilink/training # general training jobsafrilink/specs # A100 hardware spec sheetafrilink/datasets # dataset formatsafrilink/billing # rates, credits, invoicesOnce downloaded, adapter weights work directly with standard HuggingFace tooling.
Export to GGUF & run with Ollama
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModel# Merge adapter into base modelbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")merged = PeftModel.from_pretrained(base, "./my-model").merge_and_unload()merged.save_pretrained("./my-model-merged")AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M# ollama create my-model -f Modelfile && ollama run my-modelPublish to HuggingFace Hub
from huggingface_hub import HfApiapi = HfApi(token="hf_...")repo_id = "your-username/my-finetuned-model"api.create_repo(repo_id, exist_ok=True)api.upload_folder(folder_path="./my-model", repo_id=repo_id) # adapter onlyapi.upload_folder(folder_path="./my-model-merged", repo_id=repo_id) # full merged modelapi.upload_file(path_or_fileobj="./my-model-q4.gguf", path_in_repo="my-model-q4.gguf", repo_id=repo_id) # GGUF