DATASPIRES

AfriLink
Notebooks
FAQ

AfriLink

AfriLink Documentation

SDK guides for authentication, finetuning, model downloads, pricing, and session recovery.

Overview

AfriLink gives you one-line access to A100 GPUs for training (any framework - YOLOv8, custom PyTorch, etc.), finetuning (LoRA/QLoRA for LLMs), and inference (HuggingFace endpoints). The SDK handles authentication, dataset upload, job submission, status polling, model download, and per-job billing. Install with pip install afrilink-sdk.

Train & Finetune

Train any model with client.train() or finetune LLMs with client.finetune() - one call handles upload, submission, and polling.

Secure Authentication

Two-phase auth: DataSpires credentials for billing, plus automated SSH certificate provisioning for HPC access.

A100 GPU Access

Jobs run on NVIDIA A100 nodes via SLURM. Support for additional backends is coming soon.

Run From Anywhere

Works from Google Colab, Notebooks, local Jupyter, or any Python environment.

Quick Start

python
pip install afrilink-sdkfrom afrilink import AfriLinkClientclient = AfriLinkClient()client.authenticate()  # DataSpires + HPC credentialsjob = client.finetune(    model="qwen2.5-0.5b",    training_mode="low",    data=your_dataframe,    gpus=1,    time_limit="01:00:00",    backend="cineca",)result = job.run(wait=True)if result["status"] == "completed":    client.download_model(result["job_id"], "./my-model")    from transformers import AutoModelForCausalLM, AutoTokenizer    from peft import PeftModel    base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")    model = PeftModel.from_pretrained(base, "./my-model")    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")    out = model.generate(**tokenizer("Hello!", return_tensors="pt"), max_new_tokens=64)    print(tokenizer.decode(out[0], skip_special_tokens=True))

Installation

bash
pip install afrilink-sdk

Zero required dependencies - heavy libraries are only needed at the point you use them and are pre-installed in most notebook environments.

Authentication

Both phases happen inside a single client.authenticate() call:

PhaseWhat HappensUser Action
1. DataSpiresValidates your account for billing and telemetryEnter email + password when prompted
2. HPCAutomated SSH certificate provisioning for cluster accessFully automatic

Pass credentials explicitly: client.authenticate(dataspires_email="...", dataspires_password="...")

After auth you get an SSH certificate (~12h), SLURM job manager, SCP transfer manager, and telemetry tracker. The SDK warns before expiry - see Session Recovery.

HPC Backends

BackendProviderRegionStatus
cinecaCINECA Leonardo (EuroHPC)Bologna, ItalyAvailable (default)
eversetechEverseTech-Coming soon
aghAGH-Coming soon
acfACF-Coming soon

Hardware Specs - CINECA Leonardo

Each GPU node on the Booster partition (where AfriLink jobs run):

ComponentSpecification
GPU per node4x NVIDIA A100 (custom)
GPU memory64 GB HBM2e per GPU (256 GB per node)
FP64 performance11.2 TFLOPS per GPU
FP32 performance22.4 TFLOPS per GPU
CPU cores per node32
System RAM per node512 GB DDR4
RAM per GPU (effective)~128 GB (shared, not partitioned)
Node interconnect200 Gb/s HDR InfiniBand

Per-GPU memory guide for model loading:

Model SizeTraining ModeMin GPUs
0.5B - 1Blow (QLoRA 4-bit)1
3B - 7Blow1
3B - 7Bhigh (bf16)2-4
13Blow2
13Bhigh4
30B+low or high4

Billing: $2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance.

Available Models

Use client.list_available_models() to browse, or filter with size="tiny".

ModelTypeParamsMin VRAM
Qwen 2.5 0.5BText0.50B4 GB
Gemma 3 270MText0.27B2 GB
Llama 3.2 1BText1.00B4 GB
DeepSeek R1 1.5BText1.50B6 GB
Ministral 3BText3.30B8 GB
SmolVLM 256MVision0.26B2 GB
InternVL2 1BVision1.00B4 GB
Moondream 2Vision1.90B8 GB
Florence 2 BaseVision0.23B4 GB
LLaVA 1.5 7BVision7.00B16 GB

Training Modes

ModeStrategyQuantizationGPUsBest For
lowQLoRA (rank 8)4-bit1Quick experiments, small datasets
mediumLoRA (rank 16)8-bit / none1-2Balanced quality and cost
highLoRA (rank 64) + DDP/FSDPNone2-4+Production-grade training runs

Dataset Formats

TypeHow It's Handled
pandas.DataFrameSerialised to JSONL and uploaded
datasets.DatasetSaved to disk and uploaded
File path (local)JSONL or CSV file uploaded
File path (remote, starts with $)Treated as existing HPC path - no upload needed

DataFrame should have a text column with the full prompt + response (Alpaca-style or chat template).

Training (General-Purpose)

Use client.train() to run any training script on HPC. Works with any framework that runs inside a Singularity container (YOLOv8, custom PyTorch, etc.). For LoRA/QLoRA LLM fine-tuning, use client.finetune() instead.

python
job = client.train(    script="train_yolo.py",        # your training script    container="afrilink-yolo",      # pre-built container    data="./dataset/",              # uploaded automatically    data_config="dataset.yaml",     # config file    gpus=1,    time_limit="02:00:00",)result = job.run(wait=True)print(job.get_logs(tail=50))

Available Containers:

NameFrameworksUse Case
afrilink-yoloUltralytics, PyTorch, torchvisionObject detection, segmentation, pose
afrilink-finetunePyTorch, Transformers, PEFTLLM fine-tuning

Data handling: Pass a local directory, .tar.gz archive, single file, pandas DataFrame, or remote HPC path (starting with $ or /). Archives are automatically extracted on the cluster.

TrainJob has the same interface as FinetuneJob: .run(), .cancel(),.get_logs(), .status, .job_id, .estimated_cost_usd().

SDK Reference

AfriLinkClient

MethodDescription
authenticate()Full auth flow. Optionally pass dataspires_email and dataspires_password.
train(script, container, data, gpus, ...)Create a TrainJob for general-purpose training. Call .run() to submit.
finetune(model, data, training_mode, gpus, ...)Create a FinetuneJob for LLM fine-tuning. Call .run() to submit.
download_model(job_id, local_dir)Download trained LoRA adapter weights. Ready for PeftModel.from_pretrained().
upload_dataset(local_path, dataset_name)Upload a local dataset file to HPC storage.
list_containers()List available training containers on HPC.
list_available_models(size=None)Browse the model registry. Filter by size.
list_jobs()View submitted SLURM jobs and their statuses.
recover_session(download_dir=None)Re-authenticate and check/download tracked jobs.
cancel_job(job_id)Cancel a running SLURM job.
run_command(cmd)Execute a shell command on the HPC login node.
cert_minutes_remainingMinutes until your SSH certificate expires (float).

TrainJob / FinetuneJob (returned by client.train() and client.finetune())

Method / PropertyDescription
run(wait=True)Submit to SLURM. wait=True polls until done; wait=False returns after submission.
cancel()Cancel the SLURM job.
get_logs(tail=100)Fetch recent log lines from a running or completed job.
estimated_cost_usd()Estimate max cost based on GPUs and time limit.
statusCurrent job status string.
job_idAfriLink job ID (8-character UUID prefix).

run() returns a dict with job_id, slurm_job_id, status, and output_dir. Always check result["status"] before downloading.

Pricing

Pay-as-you-go at $2.00 per GPU-hour. Add credits via card payment or redeem voucher codes on your Billing Dashboard.

Session Recovery

SSH certificates expire after ~12 hours. The SDK warns at 60, 30, 15, and 5 minutes before expiry. Call recover_session() to pick up where you left off:

python
# Re-authenticate and download completed modelsrecovery = client.recover_session("./recovered-models")print(recovery.re_authenticated)   # True if fresh cert obtainedprint(recovery.jobs)                # status of each tracked jobprint(recovery.files_retrieved)     # downloaded model directories# Or just re-authenticate without downloadingclient.recover_session()

What it does:

  1. Re-authenticates - fresh SSH certificate without re-entering credentials
  2. Checks all tracked SLURM jobs and reports their current status
  3. Downloads completed models automatically if you pass a download_dir
  4. Registers email notification for jobs still running

Jobs keep running on the cluster after cert expiry - you just need fresh credentials to check on them.

Built-in User Guide

Query the inline reference manual from any notebook cell - no internet required:

python
import afrilinkafrilink/help          # index of all topicsafrilink/quickstart    # getting startedafrilink/auth          # authentication & sessionsafrilink/finetune      # finetune parameters & modesafrilink/specs         # models and GPU requirementsafrilink/datasets      # dataset formats and uploadafrilink/transfer      # file upload & downloadafrilink/jobs          # SLURM job management

Working With Your Model

Once downloaded, adapter weights work directly with standard HuggingFace tooling.

Export to GGUF & run with Ollama

python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom peft import PeftModel# Merge adapter into base modelbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")merged = PeftModel.from_pretrained(base, "./my-model").merge_and_unload()merged.save_pretrained("./my-model-merged")AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M# ollama create my-model -f Modelfile  &&  ollama run my-model

Publish to HuggingFace Hub

python
from huggingface_hub import HfApiapi = HfApi(token="hf_...")repo_id = "your-username/my-finetuned-model"api.create_repo(repo_id, exist_ok=True)api.upload_folder(folder_path="./my-model", repo_id=repo_id)          # adapter onlyapi.upload_folder(folder_path="./my-model-merged", repo_id=repo_id)   # full merged modelapi.upload_file(path_or_fileobj="./my-model-q4.gguf",                path_in_repo="my-model-q4.gguf", repo_id=repo_id)       # GGUF

Inference

Hugging Face provides a Serverless Inference API as a way for users to quickly test and evaluate publicly accessible machine learning models for free. You can use the InferenceClient from the huggingface_hub Python library.

First, make sure to install the required packages: pip install -U huggingface_hub transformers and authenticate to the Hub with your User Access Token.

python
from huggingface_hub import InferenceClientclient = InferenceClient()# Generate text with an open LLMresponse = client.text_generation(    prompt="A HTTP POST request is used to ",    model="codellama/CodeLlama-7b-hf",    temperature=0.8,    max_new_tokens=50,    seed=42,    return_full_text=True,)print(response)# Chat completion with instruct modelsmessages = [    {"role": "system", "content": "You are an expert prompt engineer with artistic flair."},    {"role": "user", "content": "Write a concise prompt for a fun image... Only return the prompt."},]for token in client.chat_completion(    messages, model="meta-llama/Meta-Llama-3-8B-Instruct", max_tokens=250, stream=True, seed=42):    print(token.choices[0].delta.content, end="")