Train & Finetune
Train any model with client.train() or finetune LLMs with client.finetune() - one call handles upload, submission, and polling.
AfriLink
SDK guides for authentication, finetuning, model downloads, pricing, and session recovery.
AfriLink gives you one-line access to A100 GPUs for training (any framework - YOLOv8, custom PyTorch, etc.), finetuning (LoRA/QLoRA for LLMs), and inference (HuggingFace endpoints). The SDK handles authentication, dataset upload, job submission, status polling, model download, and per-job billing. Install with pip install afrilink-sdk.
Train any model with client.train() or finetune LLMs with client.finetune() - one call handles upload, submission, and polling.
Two-phase auth: DataSpires credentials for billing, plus automated SSH certificate provisioning for HPC access.
Jobs run on NVIDIA A100 nodes via SLURM. Support for additional backends is coming soon.
Works from Google Colab, Notebooks, local Jupyter, or any Python environment.
pip install afrilink-sdkfrom afrilink import AfriLinkClientclient = AfriLinkClient()client.authenticate() # DataSpires + HPC credentialsjob = client.finetune( model="qwen2.5-0.5b", training_mode="low", data=your_dataframe, gpus=1, time_limit="01:00:00", backend="cineca",)result = job.run(wait=True)if result["status"] == "completed": client.download_model(result["job_id"], "./my-model") from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B") model = PeftModel.from_pretrained(base, "./my-model") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B") out = model.generate(**tokenizer("Hello!", return_tensors="pt"), max_new_tokens=64) print(tokenizer.decode(out[0], skip_special_tokens=True))pip install afrilink-sdkZero required dependencies - heavy libraries are only needed at the point you use them and are pre-installed in most notebook environments.
Both phases happen inside a single client.authenticate() call:
| Phase | What Happens | User Action |
|---|---|---|
| 1. DataSpires | Validates your account for billing and telemetry | Enter email + password when prompted |
| 2. HPC | Automated SSH certificate provisioning for cluster access | Fully automatic |
Pass credentials explicitly: client.authenticate(dataspires_email="...", dataspires_password="...")
After auth you get an SSH certificate (~12h), SLURM job manager, SCP transfer manager, and telemetry tracker. The SDK warns before expiry - see Session Recovery.
| Backend | Provider | Region | Status |
|---|---|---|---|
cineca | CINECA Leonardo (EuroHPC) | Bologna, Italy | Available (default) |
eversetech | EverseTech | - | Coming soon |
agh | AGH | - | Coming soon |
acf | ACF | - | Coming soon |
Each GPU node on the Booster partition (where AfriLink jobs run):
| Component | Specification |
|---|---|
| GPU per node | 4x NVIDIA A100 (custom) |
| GPU memory | 64 GB HBM2e per GPU (256 GB per node) |
| FP64 performance | 11.2 TFLOPS per GPU |
| FP32 performance | 22.4 TFLOPS per GPU |
| CPU cores per node | 32 |
| System RAM per node | 512 GB DDR4 |
| RAM per GPU (effective) | ~128 GB (shared, not partitioned) |
| Node interconnect | 200 Gb/s HDR InfiniBand |
Per-GPU memory guide for model loading:
| Model Size | Training Mode | Min GPUs |
|---|---|---|
| 0.5B - 1B | low (QLoRA 4-bit) | 1 |
| 3B - 7B | low | 1 |
| 3B - 7B | high (bf16) | 2-4 |
| 13B | low | 2 |
| 13B | high | 4 |
| 30B+ | low or high | 4 |
Billing: $2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance.
Use client.list_available_models() to browse, or filter with size="tiny".
| Model | Type | Params | Min VRAM |
|---|---|---|---|
| Qwen 2.5 0.5B | Text | 0.50B | 4 GB |
| Gemma 3 270M | Text | 0.27B | 2 GB |
| Llama 3.2 1B | Text | 1.00B | 4 GB |
| DeepSeek R1 1.5B | Text | 1.50B | 6 GB |
| Ministral 3B | Text | 3.30B | 8 GB |
| SmolVLM 256M | Vision | 0.26B | 2 GB |
| InternVL2 1B | Vision | 1.00B | 4 GB |
| Moondream 2 | Vision | 1.90B | 8 GB |
| Florence 2 Base | Vision | 0.23B | 4 GB |
| LLaVA 1.5 7B | Vision | 7.00B | 16 GB |
| Mode | Strategy | Quantization | GPUs | Best For |
|---|---|---|---|---|
low | QLoRA (rank 8) | 4-bit | 1 | Quick experiments, small datasets |
medium | LoRA (rank 16) | 8-bit / none | 1-2 | Balanced quality and cost |
high | LoRA (rank 64) + DDP/FSDP | None | 2-4+ | Production-grade training runs |
| Type | How It's Handled |
|---|---|
pandas.DataFrame | Serialised to JSONL and uploaded |
datasets.Dataset | Saved to disk and uploaded |
| File path (local) | JSONL or CSV file uploaded |
File path (remote, starts with $) | Treated as existing HPC path - no upload needed |
DataFrame should have a text column with the full prompt + response (Alpaca-style or chat template).
Use client.train() to run any training script on HPC. Works with any framework that runs inside a Singularity container (YOLOv8, custom PyTorch, etc.). For LoRA/QLoRA LLM fine-tuning, use client.finetune() instead.
job = client.train( script="train_yolo.py", # your training script container="afrilink-yolo", # pre-built container data="./dataset/", # uploaded automatically data_config="dataset.yaml", # config file gpus=1, time_limit="02:00:00",)result = job.run(wait=True)print(job.get_logs(tail=50))Available Containers:
| Name | Frameworks | Use Case |
|---|---|---|
afrilink-yolo | Ultralytics, PyTorch, torchvision | Object detection, segmentation, pose |
afrilink-finetune | PyTorch, Transformers, PEFT | LLM fine-tuning |
Data handling: Pass a local directory, .tar.gz archive, single file, pandas DataFrame, or remote HPC path (starting with $ or /). Archives are automatically extracted on the cluster.
TrainJob has the same interface as FinetuneJob: .run(), .cancel(),.get_logs(), .status, .job_id, .estimated_cost_usd().
AfriLinkClient
| Method | Description |
|---|---|
authenticate() | Full auth flow. Optionally pass dataspires_email and dataspires_password. |
train(script, container, data, gpus, ...) | Create a TrainJob for general-purpose training. Call .run() to submit. |
finetune(model, data, training_mode, gpus, ...) | Create a FinetuneJob for LLM fine-tuning. Call .run() to submit. |
download_model(job_id, local_dir) | Download trained LoRA adapter weights. Ready for PeftModel.from_pretrained(). |
upload_dataset(local_path, dataset_name) | Upload a local dataset file to HPC storage. |
list_containers() | List available training containers on HPC. |
list_available_models(size=None) | Browse the model registry. Filter by size. |
list_jobs() | View submitted SLURM jobs and their statuses. |
recover_session(download_dir=None) | Re-authenticate and check/download tracked jobs. |
cancel_job(job_id) | Cancel a running SLURM job. |
run_command(cmd) | Execute a shell command on the HPC login node. |
cert_minutes_remaining | Minutes until your SSH certificate expires (float). |
TrainJob / FinetuneJob (returned by client.train() and client.finetune())
| Method / Property | Description |
|---|---|
run(wait=True) | Submit to SLURM. wait=True polls until done; wait=False returns after submission. |
cancel() | Cancel the SLURM job. |
get_logs(tail=100) | Fetch recent log lines from a running or completed job. |
estimated_cost_usd() | Estimate max cost based on GPUs and time limit. |
status | Current job status string. |
job_id | AfriLink job ID (8-character UUID prefix). |
run() returns a dict with job_id, slurm_job_id, status, and output_dir. Always check result["status"] before downloading.
Pay-as-you-go at $2.00 per GPU-hour. Add credits via card payment or redeem voucher codes on your Billing Dashboard.
SSH certificates expire after ~12 hours. The SDK warns at 60, 30, 15, and 5 minutes before expiry. Call recover_session() to pick up where you left off:
# Re-authenticate and download completed modelsrecovery = client.recover_session("./recovered-models")print(recovery.re_authenticated) # True if fresh cert obtainedprint(recovery.jobs) # status of each tracked jobprint(recovery.files_retrieved) # downloaded model directories# Or just re-authenticate without downloadingclient.recover_session()What it does:
download_dirJobs keep running on the cluster after cert expiry - you just need fresh credentials to check on them.
Query the inline reference manual from any notebook cell - no internet required:
import afrilinkafrilink/help # index of all topicsafrilink/quickstart # getting startedafrilink/auth # authentication & sessionsafrilink/finetune # finetune parameters & modesafrilink/specs # models and GPU requirementsafrilink/datasets # dataset formats and uploadafrilink/transfer # file upload & downloadafrilink/jobs # SLURM job managementOnce downloaded, adapter weights work directly with standard HuggingFace tooling.
Export to GGUF & run with Ollama
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModel# Merge adapter into base modelbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")merged = PeftModel.from_pretrained(base, "./my-model").merge_and_unload()merged.save_pretrained("./my-model-merged")AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M# ollama create my-model -f Modelfile && ollama run my-modelPublish to HuggingFace Hub
from huggingface_hub import HfApiapi = HfApi(token="hf_...")repo_id = "your-username/my-finetuned-model"api.create_repo(repo_id, exist_ok=True)api.upload_folder(folder_path="./my-model", repo_id=repo_id) # adapter onlyapi.upload_folder(folder_path="./my-model-merged", repo_id=repo_id) # full merged modelapi.upload_file(path_or_fileobj="./my-model-q4.gguf", path_in_repo="my-model-q4.gguf", repo_id=repo_id) # GGUFHugging Face provides a Serverless Inference API as a way for users to quickly test and evaluate publicly accessible machine learning models for free. You can use the InferenceClient from the huggingface_hub Python library.
First, make sure to install the required packages: pip install -U huggingface_hub transformers and authenticate to the Hub with your User Access Token.
from huggingface_hub import InferenceClientclient = InferenceClient()# Generate text with an open LLMresponse = client.text_generation( prompt="A HTTP POST request is used to ", model="codellama/CodeLlama-7b-hf", temperature=0.8, max_new_tokens=50, seed=42, return_full_text=True,)print(response)# Chat completion with instruct modelsmessages = [ {"role": "system", "content": "You are an expert prompt engineer with artistic flair."}, {"role": "user", "content": "Write a concise prompt for a fun image... Only return the prompt."},]for token in client.chat_completion( messages, model="meta-llama/Meta-Llama-3-8B-Instruct", max_tokens=250, stream=True, seed=42): print(token.choices[0].delta.content, end="")