init commit

This commit is contained in:
2026-02-25 13:25:56 +08:00
commit d6aa5f568a
6 changed files with 1739 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
data/*
Qwen3-VL-2B-Instruct/*

342
README.md Executable file
View File

@ -0,0 +1,342 @@
# AICAS 2026 - Vision-Language Model Optimization Competition
## Table of Contents
- [Overview](#overview)
- [Code Structure](#code-structure)
- [Core Files](#core-files)
- [Quick Start](#quick-start)
- [Evaluation Metrics](#evaluation-metrics)
- [Competition Rules](#competition-rules)
- [Important Notes](#important-notes)
- [Submission Guidelines](#submission-guidelines)
## Overview
This competition focuses on optimizing Vision-Language Models (VLM) for inference performance. Participants are required to modify the `VLMModel` class in `evaluation_wrapper.py` to achieve better Time-To-First-Token (TTFT) and Throughput while maintaining accuracy.
## Code Structure
```
AICASGC/
├── benchmark.py # Benchmark script (not recommended to modify)
├── evaluation_wrapper.py # Model wrapper (participants implement optimizations here)
├── requirements.txt # Python dependencies
├── data/ # Validation dataset
│ ├── data-*.arrow # Dataset files
│ ├── dataset_info.json # Dataset metadata
│ └── state.json # Dataset state
├── Qwen3-VL-2B-Instruct/ # Model weights directory (participants need to download)
└── README.md / README_CN.md # Documentation
```
## Core Files
- **`benchmark.py`** - Self-testing benchmark script (⚠️ **Not recommended to modify**)
- **`evaluation_wrapper.py`** - Model wrapper where participants implement optimizations
- **`Qwen3-VL-2B-Instruct/`** - Competition model weights (participants need to download, see "Quick Start" section)
- **`data/`** - Validation dataset
- **`requirements.txt`** - Python dependencies
## Quick Start
### 0. Download Model (First Time)
The model files are large and need to be downloaded separately. Please create the model directory first, then download the model:
```bash
# Create model directory
mkdir -p Qwen3-VL-2B-Instruct
# Install huggingface_hub (if not installed)
pip install -U huggingface_hub
# Set mirror endpoint (recommended for users in China, faster download)
export HF_ENDPOINT=https://hf-mirror.com
# Download model to specified directory
huggingface-cli download \
--resume-download \
Qwen/Qwen3-VL-2B-Instruct \
--local-dir ./Qwen3-VL-2B-Instruct \
--local-dir-use-symlinks False
```
**Note:**
- Model size is approximately 4-5GB, download may take some time
- If download is interrupted, you can rerun the command and it will resume automatically (`--resume-download`)
- After download completes, the `Qwen3-VL-2B-Instruct/` folder will contain all model files
- Ensure you have sufficient disk space (at least 5GB)
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Run Test
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result.json \
--num-samples 100
```
### 3. Implement Your Optimizations
Edit the `VLMModel` class in `evaluation_wrapper.py`. The optimization architecture uses **modular design**, where each optimization direction corresponds to an independent method.
#### 3.1 Explore Model Structure (Optional)
Before starting optimizations, you can explore the model structure to understand optimization targets:
```python
class VLMModel:
def __init__(self, model_path: str, device: str = "cuda:0"):
# ... load model ...
# Optional: Explore model structure
self._explore_model_structure() # Will print model structure information
```
#### 3.2 Enable Optimization Methods
In the `__init__` method, enable/disable different optimizations by commenting/uncommenting:
```python
class VLMModel:
def __init__(self, model_path: str, device: str = "cuda:0"):
# ... load model ...
# ================================================================
# Participant Optimization Area - Enable/disable optimization methods
# ================================================================
# 1. Vision Encoder Acceleration (optimize high-resolution image processing)
# self._optimize_vision_encoder()
# 2. KV Cache Management (optimize memory fragmentation during generation)
# self._optimize_kv_cache()
# 3. Cross-modal Connector Optimization (optimize Cross-modal Connector)
# self._optimize_cross_modal_connector()
# 4. Flash Attention Optimization
# self._enable_flash_attention()
# 5. Quantization Optimization
# self._apply_quantization()
```
#### 3.3 Implement Optimization Code
Implement your optimization logic in each optimization method. For example, optimizing Vision Encoder:
```python
def _optimize_vision_encoder(self):
"""Find this method in evaluation_wrapper.py and implement your optimization"""
# Example: Replace attention operator
# from your_optimization import optimized_attention
# if hasattr(self._model, 'vision_model'):
# for layer in self._model.vision_model.encoder.layers:
# layer.self_attn.forward = optimized_attention
# TODO: Implement your Vision Encoder optimization
pass
```
**Important Notes:**
- Benchmark directly calls `self.model.generate()` for performance testing
- Your optimizations should modify `self.model` or its operators via Monkey Patch in optimization methods
- All optimization methods are called in `__init__`, and optimizations take effect automatically
- The `generate()` method is optional and mainly for debugging
### 4. Test Your Optimized Model
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result_optimized.json \
--num-samples 100
```
### 5. Generate Full Results for Submission
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result.json \
--num-samples 5000
```
## Evaluation Metrics
The final score is calculated as:
```
Final Score = 0.4 × Accuracy + 0.3 × TTFT_Improvement + 0.3 × Throughput_Improvement
```
### Metrics Explained
- **TTFT (Time To First Token)**: Time from input preparation to first token generation (in milliseconds)
- Includes: image encoding, text encoding, cross-modal interaction, prefill stage, first token generation
- Baseline: ~80ms
- Improvement = (Baseline - Your_TTFT) / Baseline
- **Throughput**: End-to-end token generation rate (tokens per second)
- Baseline: ~55 tokens/sec
- Improvement = (Your_Throughput - Baseline) / Baseline
- **Accuracy**: VQA accuracy on validation set (5000 samples)
- Soft matching with multiple ground truth answers
## Competition Rules
### Critical Rules
1. **Do not modify `benchmark.py`**
- This benchmark script is for self-testing only
- Final evaluation will use a separate official benchmark system
- Modifying this file may lead to inconsistencies between your local results and final evaluation results
2. **Only modify `evaluation_wrapper.py`**
3. **Maintain required properties**
- The `VLMModel` class must expose `processor`, `model`, and `device` properties
- Benchmark uses these properties to access the model and processor
- The `generate()` method is optional and mainly for debugging
4. **Prohibited behaviors**
- Do not hardcode answers
- Do not modify the dataset
- Do not use external APIs or services
- All optimizations must be local and self-contained
### Optimization Directions
- Encouraged: Operator replacement and kernel optimization - Rewrite or replace standard operator implementations (such as Attention, LayerNorm, Conv2d, etc.) using Triton, CUDA C++, etc.
- Encouraged: Memory and cache optimization - Optimize KV Cache memory layout, reduce memory fragmentation, optimize GPU memory access patterns
- Encouraged: Compilation and graph optimization - Use torch.compile for computation graph optimization, custom kernel scheduling
- Encouraged: Attention mechanism optimization - Implement Flash Attention, memory-efficient attention, sparse attention
- Encouraged: Generation process optimization - Optimize decoding strategies, cache management, generation configuration parameters
**Not Permitted:**
- Using external services: Prohibited from calling external APIs, cloud services, or any functionality requiring network connection
- Data and answer cheating: Prohibited from training on test data, pre-computing answers, hardcoding outputs
- Model replacement and tampering: Participants should focus on operator-level optimization. Do not use additional datasets to train the model, change model architecture, or directly modify weight values.
- Overfitting optimization: Prohibited from using conditional branches or special processing for specific evaluation samples
- Black-box tool application: Behavior of only modifying configuration files without substantive code contributions is not recognized
- Environment manipulation: Prohibited from interfering with fair evaluation by modifying system environment, GPU frequency locking, etc.
## Important Notes
### Sample Selection
- The provided `benchmark.py` uses **fixed order** (first N samples from index 0)
- When you run `--num-samples 100`, it evaluates samples 0-99
- This ensures reproducibility for local self-testing
- **Note**: The official evaluation system used by the competition committee may employ
different sampling strategies (including random sampling) for final verification
### Hardware Information
The benchmark automatically records detailed hardware information:
- Python version, PyTorch version, CUDA version
- GPU name, memory, compute capability
- CPU model, cores, frequency
- System information (OS, kernel, architecture)
- PPU information (if available)
This information is saved in `result.json` under `system_info` for statistical analysis.
### Performance Measurement
- **Warmup**: 10 samples are used for GPU warmup before actual measurement
- **TTFT Measurement**: Measures time from input preparation to first token (includes all preprocessing)
- **Throughput Measurement**: Measures end-to-end generation time for 128 tokens
- **State Isolation**: GPU cache is cleared between measurements to ensure fairness
### Random Seed
- The `--random-seed` parameter only affects PyTorch's random number generator
- It does **NOT** affect sample selection order (which is always fixed)
- Use it for reproducibility of model inference randomness
### Output Format
The `result.json` file contains:
```json
{
"system_info": {
"timestamp": "...",
"python_version": "...",
"torch_version": "...",
"cuda_version": "...",
"gpu_name": "...",
...
},
"performance": {
"avg_ttft_ms": 90.55,
"avg_throughput_tokens_per_sec": 57.77
},
"answers": [
{
"question_id": 34602,
"prediction": "your answer text here"
},
...
]
}
```
## Submission Guidelines
### Required Files for Preliminary Submission
1. **`result.json`** - Generated by running `benchmark.py`
- Contains predictions for all samples
- Must include valid `performance` metrics
- **Important**: The `result.json` uploaded to the Tianchi platform is for reference only. Final scores will be evaluated by the competition committee using standardized hardware and the official evaluation system.
2. **Your optimized code** - `evaluation_wrapper.py` containing your optimized `VLMModel` class
3. **Docker image** - Container with your optimized environment
### Evaluation Process
1. **Self-Testing**: Use the provided `benchmark.py` to test your optimizations locally
2. **Submission**: Upload your `result.json` to the Tianchi platform (for reference only)
3. **Official Evaluation**: The competition committee will evaluate your code using:
- Docker image submission
- Standardized hardware environment
- Official evaluation code
- Full validation set with random sampling for verification
4. **Final Ranking**: Based on the final score calculated by the official evaluation system
## Good Luck!
We hope you will focus on operator-level optimization, kernel replacement, and efficient memory management. Remember: accuracy and speed are equally important! Good luck!

348
README_CN.md Executable file
View File

@ -0,0 +1,348 @@
# AICAS 2026 - 面向AI芯片的VLM高效推理与优化赛道
## 目录
- [概述](#概述)
- [代码结构](#代码结构)
- [核心文件](#核心文件)
- [快速开始](#快速开始)
- [评测指标](#评测指标)
- [比赛规则](#比赛规则)
- [重要提示](#重要提示)
- [提交指南](#提交指南)
## 概述
本次竞赛专注于优化视觉语言模型VLM的推理性能。参赛者需要修改 `evaluation_wrapper.py` 中的 `VLMModel` 类,在保持准确率的同时提升首 Token 时间TTFT和吞吐量Throughput
## 代码结构
```
AICASGC/
├── benchmark.py # 基准测试脚本
├── evaluation_wrapper.py # 模型包装器(选手在此实现优化)
├── requirements.txt # Python 依赖包
├── data/ # 验证数据集
│ ├── data-*.arrow # 数据集文件
│ ├── dataset_info.json # 数据集元信息
│ └── state.json # 数据集状态
├── Qwen3-VL-2B-Instruct/ # 模型权重目录(需要选手自行下载)
└── README.md / README_CN.md # 说明文档
```
## 核心文件
- **`benchmark.py`** - 自测基准脚本(⚠️ **不建议修改**
- **`evaluation_wrapper.py`** - 模型包装器,参赛者在此实现优化
- **`Qwen3-VL-2B-Instruct/`** - 竞赛模型权重(需要选手自行下载,见"快速开始"部分)
- **`data/`** - 验证数据集
- **`requirements.txt`** - Python 依赖包
## 快速开始
### 0. 下载模型(首次使用)
模型文件较大,需要单独下载。请先创建模型目录,然后下载模型:
```bash
# 创建模型目录
mkdir -p Qwen3-VL-2B-Instruct
# 安装 huggingface_hub如果未安装
pip install -U huggingface_hub
# 设置镜像源(国内用户推荐,加速下载)
export HF_ENDPOINT=https://hf-mirror.com
# 下载模型到指定目录
huggingface-cli download \
--resume-download \
Qwen/Qwen3-VL-2B-Instruct \
--local-dir ./Qwen3-VL-2B-Instruct \
--local-dir-use-symlinks False
```
**注意:**
- 模型大小约 4-5GB下载可能需要一些时间
- 如果下载中断,可以重新运行命令,会自动续传(`--resume-download`
- 下载完成后,`Qwen3-VL-2B-Instruct/` 文件夹会包含所有模型文件
- 确保有足够的磁盘空间(至少 5GB
### 1. 安装依赖
```bash
pip install -r requirements.txt
```
### 2. 运行测试
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result.json \
--num-samples 100
```
### 3. 实现你的优化
编辑 `evaluation_wrapper.py` 中的 `VLMModel` 类。优化采用**模块化设计**,每个优化方向对应一个独立方法。
#### 3.1 探索模型结构(可选)
在开始优化前,可以先探索模型结构,了解优化目标:
```python
class VLMModel:
def __init__(self, model_path: str, device: str = "cuda:0"):
# ... 加载模型 ...
# 可选:探索模型结构
self._explore_model_structure() # 会打印模型结构信息
```
#### 3.2 启用优化方法
`__init__` 方法中,通过注释/取消注释来启用/禁用不同的优化:
```python
class VLMModel:
def __init__(self, model_path: str, device: str = "cuda:0"):
# ... 加载模型 ...
# ================================================================
# 选手优化区域 - 启用/禁用优化方法
# ================================================================
# 1. Vision Encoder 加速(优化大分辨率图像处理)
# self._optimize_vision_encoder()
# 2. KV Cache 管理(优化生成过程中的内存碎片)
# self._optimize_kv_cache()
# 3. 跨模态融合层优化(优化 Cross-modal Connector
# self._optimize_cross_modal_connector()
# 4. Flash Attention 优化
# self._enable_flash_attention()
# 5. 量化优化
# self._apply_quantization()
```
#### 3.3 实现优化代码
在各个优化方法中实现你的优化逻辑。例如,优化 Vision Encoder
```python
def _optimize_vision_encoder(self):
"""在 evaluation_wrapper.py 中找到这个方法,实现你的优化"""
# 示例:替换注意力算子
# from your_optimization import optimized_attention
# if hasattr(self._model, 'vision_model'):
# for layer in self._model.vision_model.encoder.layers:
# layer.self_attn.forward = optimized_attention
# TODO: 实现你的 Vision Encoder 优化
pass
```
### 4. 测试你的优化模型
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result_optimized.json \
--num-samples 100
```
### 5. 生成完整结果用于提交
```bash
python benchmark.py \
--model-path ./Qwen3-VL-2B-Instruct \
--dataset-path ./data \
--output result.json \
--num-samples 5000
```
## 评测指标
最终得分计算公式:
```
最终得分 = 0.4 × 准确率 + 0.3 × TTFT提升率 + 0.3 × 吞吐量提升率
```
### 指标说明
- **TTFT (Time To First Token)**: 从输入准备到生成第一个 Token 的时间(毫秒)
- 包含图像编码、文本编码、跨模态交互、Prefill 阶段、第一个 Token 生成
- Baseline: ~80ms
- 提升率 = (Baseline - 你的TTFT) / Baseline
- **Throughput (吞吐量)**: 端到端 Token 生成速率tokens/秒)
- Baseline: ~55 tokens/sec
- 提升率 = (你的吞吐量 - Baseline) / Baseline
- **Accuracy (准确率)**: 验证集上的 VQA 准确率5000 个样本)
- 支持多个标准答案的软匹配
## 比赛规则
### 重要规则
1. **不要修改 `benchmark.py`**
- 此基准脚本仅用于自测
- 最终评测将使用独立的官方基准系统
- 修改此文件可能导致本地结果与最终评测结果不一致
2. **仅修改 `evaluation_wrapper.py`**
3. **保持必需的属性**
- `VLMModel` 类必须暴露 `processor``model``device` 属性
- Benchmark 使用这些属性来访问模型和处理器
- `generate()` 方法是可选的,主要用于调试
4. **禁止行为**
- 禁止硬编码答案
- 禁止修改数据集
- 禁止使用外部 API 或服务
- 所有优化必须是本地且自包含的
### 优化方向
- 鼓励实现算子替换与内核优化使用Triton、CUDA C++等重写或替换标准算子实现如Attention、LayerNorm、Conv2d等
- 鼓励实现内存与缓存优化优化KV Cache内存布局、减少内存碎片、优化显存访问模式
- 鼓励实现编译与图优化使用torch.compile进行计算图优化、自定义内核调度
- 鼓励实现注意力机制优化实现Flash Attention、内存高效注意力、稀疏注意力
- 鼓励实现生成过程优化:优化解码策略、缓存管理、生成配置参数
**不允许:**
- 使用外部服务禁止调用外部API、云服务或任何需要网络连接的功能
- 数据与答案作弊:禁止使用测试数据进行训练、预计算答案、硬编码输出
- 模型替换与篡改:希望选手着重做算子优化,不要用额外的数据集去训练模型、改变模型架构、直接修改权重数值等。
- 过拟合优化:禁止针对特定评测样本进行条件分支或特殊处理
- 黑盒工具套用:仅修改配置文件而无实质性代码贡献的行为不被认可
- 环境操纵禁止通过修改系统环境、GPU频率锁定等方式干扰公平评测
## 重要提示
### 样本选择
- 提供的 `benchmark.py` 使用**固定顺序**(从索引 0 开始的前 N 个样本)
- 运行 `--num-samples 100` 时,会评测样本 0-99
- 这确保了本地自测的可复现性
- **注意**:竞赛委员会使用的官方评测系统可能采用不同的采样策略(包括随机采样)进行最终验证
### 硬件信息
基准测试会自动记录详细的硬件信息:
- Python 版本、PyTorch 版本、CUDA 版本
- GPU 名称、显存、计算能力
- CPU 型号、核心数、频率
- 系统信息(操作系统、内核、架构)
- PPU 信息(如果可用)
这些信息保存在 `result.json``system_info` 字段中,用于统计分析。
### 性能测量
- **预热**:在实际测量前使用 10 个样本进行 GPU 预热
- **TTFT 测量**:测量从输入准备到第一个 Token 的时间(包含所有预处理)
- **吞吐量测量**:测量生成 128 个 Token 的端到端时间
- **状态隔离**:在测量之间清理 GPU 缓存,确保公平性
### 随机种子
- `--random-seed` 参数仅影响 PyTorch 的随机数生成器
- 它**不会**影响样本选择顺序(始终是固定的)
- 用于模型推理随机性的可复现性
### 输出格式
`result.json` 文件包含:
```json
{
"system_info": {
"timestamp": "...",
"python_version": "...",
"torch_version": "...",
"cuda_version": "...",
"gpu_name": "...",
...
},
"performance": {
"avg_ttft_ms": 90.55,
"avg_throughput_tokens_per_sec": 57.77
},
"answers": [
{
"question_id": 34602,
"prediction": "你的答案文本"
},
...
]
}
```
## 提交指南
### 初赛提交必需文件
1. **`result.json`** - 通过运行 `benchmark.py` 生成
- 包含所有样本的预测
- 必须包含有效的 `performance` 指标
- **重要**:上传到天池平台的 `result.json` 仅用于参考。最终成绩将由竞赛委员会使用标准化硬件和官方评测系统进行评测。
2. **你的优化代码** - 包含你优化的 `VLMModel` 类的 `evaluation_wrapper.py`
3. **Docker 镜像**- 包含你优化环境的容器
### 评测流程
1. **自测**:使用提供的 `benchmark.py` 在本地测试你的优化
2. **提交**:将你的 `result.json` 上传到天池平台(仅用于参考)
3. **官方评测**:竞赛委员会将使用以下方式评测你的代码:
- 提交Docker镜像
- 标准化硬件环境
- 官方评测代码
- 完整验证集,随机采样进行验证
4. **最终排名**:基于官方评测系统计算的最终得分
## 祝你好运!
希望你会专注于算子级优化、内核替换和高效的内存管理。记住:准确率和速度同样重要!祝你好运!

613
benchmark.py Executable file
View File

@ -0,0 +1,613 @@
#!/usr/bin/env python3
"""
AICAS 2026 - Self-Testing Benchmark Tool
Measures TTFT and Throughput, generates result.json for self-testing.
Note: It is recommended not to modify this file. This benchmark is intended for
self-testing purposes only. The final evaluation will be conducted using a
separate official benchmark system on standardized hardware by the competition
committee.
"""
import sys
import json
import time
import argparse
import platform
import subprocess
from datetime import datetime
from pathlib import Path
import torch
from PIL import Image
from datasets import load_from_disk
from tqdm import tqdm
try:
import psutil
HAS_PSUTIL = True
except ImportError:
HAS_PSUTIL = False
from evaluation_wrapper import VLMModel
# Fixed parameters - Not recommended to modify
MAX_NEW_TOKENS = 128 # Token length for performance testing
ACCURACY_MAX_TOKENS = 1024 # Token length for accuracy testing
WARMUP_SAMPLES = 10 # Warmup samples for GPU stabilization
PERFORMANCE_SAMPLES = None # Performance test samples (None = all samples)
VAL_SAMPLES = 5000 # Total validation samples
def get_system_info() -> dict:
"""Collect system information (hardware and software environment)"""
info = {
"timestamp": datetime.now().isoformat(),
}
# Python environment
info["python_version"] = sys.version.split()[0]
info["python_full_version"] = sys.version
# PyTorch information
info["torch_version"] = torch.__version__
# CUDA information
if torch.cuda.is_available():
info["cuda_available"] = True
info["cuda_version"] = torch.version.cuda if hasattr(torch.version, 'cuda') else "N/A"
try:
if torch.backends.cudnn.is_available():
info["cudnn_version"] = str(torch.backends.cudnn.version())
else:
info["cudnn_version"] = "N/A"
except:
info["cudnn_version"] = "N/A"
# GPU information
info["gpu_count"] = torch.cuda.device_count()
info["gpu_name"] = torch.cuda.get_device_name(0)
# GPU memory
try:
gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3) # GB
info["gpu_memory_gb"] = round(gpu_memory, 2)
except:
info["gpu_memory_gb"] = "N/A"
# GPU compute capability
try:
compute_capability = torch.cuda.get_device_properties(0).major, torch.cuda.get_device_properties(0).minor
info["gpu_compute_capability"] = f"{compute_capability[0]}.{compute_capability[1]}"
except:
info["gpu_compute_capability"] = "N/A"
else:
info["cuda_available"] = False
info["cuda_version"] = "N/A"
info["gpu_count"] = 0
info["gpu_name"] = "N/A"
# CPU information
info["cpu_processor"] = platform.processor() or "N/A"
if HAS_PSUTIL:
try:
info["cpu_count_physical"] = psutil.cpu_count(logical=False)
info["cpu_count_logical"] = psutil.cpu_count(logical=True)
cpu_freq = psutil.cpu_freq()
if cpu_freq:
info["cpu_freq_mhz"] = round(cpu_freq.current, 2) if cpu_freq.current else "N/A"
else:
info["cpu_freq_mhz"] = "N/A"
except:
info["cpu_count_physical"] = "N/A"
info["cpu_count_logical"] = "N/A"
info["cpu_freq_mhz"] = "N/A"
else:
info["cpu_count_physical"] = "N/A"
info["cpu_count_logical"] = "N/A"
info["cpu_freq_mhz"] = "N/A"
# Try to get CPU model from /proc/cpuinfo (Linux)
try:
if platform.system() == "Linux":
with open("/proc/cpuinfo", "r") as f:
for line in f:
if "model name" in line.lower():
info["cpu_model"] = line.split(":")[1].strip()
break
elif "Processor" in line and ":" in line:
info["cpu_model"] = line.split(":")[1].strip()
break
except:
pass
if "cpu_model" not in info:
info["cpu_model"] = platform.processor() or "N/A"
# System information
info["platform_system"] = platform.system()
info["platform_release"] = platform.release()
info["platform_version"] = platform.version()
info["platform_machine"] = platform.machine()
info["platform_architecture"] = platform.architecture()[0]
# PPU information (if available)
info["ppu_available"] = False
info["ppu_info"] = {}
# Check for PPU-related devices
try:
if torch.cuda.is_available():
gpu_name = torch.cuda.get_device_name(0).lower()
if "ppu" in gpu_name or "pu" in gpu_name:
info["ppu_available"] = True
info["ppu_info"] = {
"name": torch.cuda.get_device_name(0),
"type": "detected_from_gpu_name"
}
except:
pass
# Try to get detailed GPU info via nvidia-smi (if available)
if torch.cuda.is_available() and platform.system() == "Linux":
try:
result = subprocess.run(
["nvidia-smi", "--query-gpu=name,driver_version,memory.total", "--format=csv,noheader"],
capture_output=True,
text=True,
timeout=5
)
if result.returncode == 0:
lines = result.stdout.strip().split("\n")
if lines:
parts = lines[0].split(",")
if len(parts) >= 3:
info["gpu_driver_version"] = parts[1].strip() if len(parts) > 1 else "N/A"
info["gpu_memory_total"] = parts[2].strip() if len(parts) > 2 else "N/A"
except:
pass
# Memory information
if HAS_PSUTIL:
try:
mem = psutil.virtual_memory()
info["memory_total_gb"] = round(mem.total / (1024**3), 2)
info["memory_available_gb"] = round(mem.available / (1024**3), 2)
except:
pass
return info
def measure_performance(model: VLMModel, image: Image.Image, question: str) -> tuple:
"""
Measure performance metrics (TTFT and Throughput)
TTFT measurement: Full model call time (generating 1 token)
Includes: image encoding, text encoding, cross-modal interaction, prefill, first token generation
Args:
model: VLMModel instance (must expose processor and model attributes)
image: PIL Image
question: Question text
Returns:
tuple: (ttft, throughput, token_count)
"""
if not hasattr(model, 'processor') or not hasattr(model, 'model'):
raise AttributeError("Model must expose 'processor' and 'model' attributes")
processor = model.processor
device = model.device
model_obj = model.model
# Clear GPU state
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
# Prepare inputs
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(device)
input_len = inputs.input_ids.shape[1]
# Step 1: Measure TTFT (generate 1 token, includes all preprocessing)
try:
torch.cuda.synchronize()
start_ttft = time.perf_counter()
# Direct call to underlying model
with torch.no_grad():
output_ids_ttft = model_obj.generate(
**inputs,
max_new_tokens=1,
do_sample=False,
temperature=0.0,
use_cache=True
)
torch.cuda.synchronize()
ttft = time.perf_counter() - start_ttft
except torch.cuda.OutOfMemoryError as e:
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
print(f"[Error] OOM during TTFT measurement: {e}")
return float('inf'), 0.0, 0
except Exception as e:
print(f"[Error] Error during TTFT measurement: {e}")
import traceback
traceback.print_exc()
return float('inf'), 0.0, 0
# Clear state
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
time.sleep(0.005) # Ensure state reset
# Step 2: Measure full generation (for Throughput)
try:
torch.cuda.synchronize()
start_full = time.perf_counter()
# Direct call to underlying model
with torch.no_grad():
output_ids = model_obj.generate(
**inputs,
max_new_tokens=MAX_NEW_TOKENS,
do_sample=False,
temperature=0.0,
use_cache=True
)
torch.cuda.synchronize()
total_time = time.perf_counter() - start_full
# Extract generated tokens
generated_ids = output_ids[0][input_len:]
token_count = len(generated_ids)
except torch.cuda.OutOfMemoryError as e:
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
print(f"[Error] OOM during full generation: {e}")
return ttft, 0.0, 0
except Exception as e:
print(f"[Error] Error during full generation: {e}")
import traceback
traceback.print_exc()
return ttft, 0.0, 0
# Calculate throughput
if total_time > 0.001 and token_count > 0:
throughput = token_count / total_time
else:
throughput = 0.0
return ttft, throughput, token_count
def generate_answer(model: VLMModel, image: Image.Image, question: str, max_new_tokens: int = ACCURACY_MAX_TOKENS) -> dict:
"""
Generate full answer (for accuracy evaluation)
Args:
model: VLMModel instance
image: PIL Image
question: Question text
max_new_tokens: Maximum tokens to generate
Returns:
dict: {"text": str, "token_count": int}
"""
if not hasattr(model, 'processor') or not hasattr(model, 'model'):
# Fallback: use generate method
return model.generate(image, question, max_new_tokens=max_new_tokens)
processor = model.processor
device = model.device
model_obj = model.model
# Prepare inputs
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(device)
input_len = inputs.input_ids.shape[1]
# Generate answer using underlying model
with torch.no_grad():
output_ids = model_obj.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
temperature=0.0,
use_cache=True
)
# Extract generated tokens
generated_ids = output_ids[0][input_len:]
text = processor.tokenizer.decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
return {
"text": text,
"token_count": len(generated_ids)
}
def run_benchmark(
model_class,
model_path: str,
dataset_path: str,
output_path: str,
num_samples: int = None,
random_seed: int = None
):
"""
Run benchmark evaluation
Process:
1. Load participant model
2. Measure TTFT and Throughput
3. Generate answers
4. Calculate statistics
5. Save results
Args:
random_seed: Random seed for reproducibility
"""
# Set random seed (if provided)
if random_seed is not None:
import random
import numpy as np
random.seed(random_seed)
np.random.seed(random_seed)
torch.manual_seed(random_seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(random_seed)
# Clear GPU cache
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
# Load dataset
print("=" * 60)
print("AICAS 2026 Benchmark Tool")
print("=" * 60)
print(f"\nLoading dataset from: {dataset_path}")
dataset = load_from_disk(dataset_path)
total_samples = num_samples or min(VAL_SAMPLES, len(dataset))
# Performance test samples
if PERFORMANCE_SAMPLES is None:
perf_samples = total_samples # Test all samples
else:
perf_samples = min(PERFORMANCE_SAMPLES, total_samples)
print(f"Total samples: {total_samples}")
print(f"Performance test samples: {perf_samples}")
# Prepare samples (fixed order: first N samples)
samples = []
for i in range(total_samples):
item = dataset[i]
samples.append({
"question_id": item.get("question_id", i),
"image": item["image"],
"question": item["question"],
})
results = {
"system_info": get_system_info(),
"performance": {},
"answers": []
}
# Load and test participant model
print("\n" + "=" * 60)
print("Running Model Benchmark")
print("=" * 60)
model = model_class(model_path)
# Warmup
print(f"\nWarming up ({WARMUP_SAMPLES} samples)...")
for i in range(min(WARMUP_SAMPLES, len(samples))):
try:
generate_answer(model, samples[i]["image"], samples[i]["question"], max_new_tokens=10)
except Exception as e:
print(f"[Warning] Warmup sample {i} failed: {e}")
# Clear state after warmup
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
# Performance testing + answer generation
ttfts = []
throughputs = []
predictions = []
print(f"\nMeasuring performance & generating answers...")
# Performance test samples: measure performance + generate full answers
for sample in tqdm(samples[:perf_samples], desc="Performance"):
# Clear state before each measurement for fairness
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
try:
# Step 1: Measure performance
ttft, throughput, token_count = measure_performance(
model, sample["image"], sample["question"]
)
# Check for failures
if ttft == float('inf') or throughput == 0.0:
print(f"[Warning] Sample {sample['question_id']} failed (TTFT={ttft}, Throughput={throughput})")
else:
ttfts.append(ttft)
throughputs.append(throughput)
# Clear state again before generating full answer
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
# Step 2: Generate full answer (for accuracy evaluation)
try:
result_full = generate_answer(
model,
sample["image"],
sample["question"],
max_new_tokens=ACCURACY_MAX_TOKENS
)
predictions.append({
"question_id": sample["question_id"],
"prediction": result_full["text"]
})
except Exception as e:
print(f"[Error] Error generating full answer for sample {sample['question_id']}: {e}")
predictions.append({
"question_id": sample["question_id"],
"prediction": ""
})
except Exception as e:
print(f"[Error] Sample {sample['question_id']} failed: {e}")
predictions.append({
"question_id": sample["question_id"],
"prediction": ""
})
continue
# If there are remaining samples, only generate answers
if total_samples > perf_samples:
for sample in tqdm(samples[perf_samples:], desc="Accuracy"):
try:
result = generate_answer(
model,
sample["image"],
sample["question"],
max_new_tokens=ACCURACY_MAX_TOKENS
)
predictions.append({
"question_id": sample["question_id"],
"prediction": result["text"]
})
except Exception as e:
print(f"[Error] Error generating answer for sample {sample['question_id']}: {e}")
predictions.append({
"question_id": sample["question_id"],
"prediction": ""
})
# Calculate statistics
if len(ttfts) > 0:
avg_ttft = sum(ttfts) / len(ttfts) * 1000 # Convert to ms
avg_throughput = sum(throughputs) / len(throughputs)
else:
avg_ttft = float('inf')
avg_throughput = 0.0
# Build performance results
performance = {
"avg_ttft_ms": round(avg_ttft, 2) if avg_ttft != float('inf') else None,
"avg_throughput_tokens_per_sec": round(avg_throughput, 2),
}
results["performance"] = performance
results["answers"] = predictions
# Print summary
if len(ttfts) > 0:
print(f"\n✓ TTFT: {avg_ttft:.2f} ms")
print(f"✓ Throughput: {avg_throughput:.2f} tokens/sec")
else:
print(f"\n✗ All samples failed!")
# Save results
with open(output_path, "w", encoding="utf-8") as f:
json.dump(results, f, indent=2, ensure_ascii=False)
print("\n" + "=" * 60)
print("Benchmark Complete!")
print("=" * 60)
print(f"\n📊 Results Summary:")
if len(ttfts) > 0:
print(f" TTFT: {avg_ttft:.2f} ms")
print(f" Throughput: {avg_throughput:.2f} tokens/sec")
else:
print(f" ⚠ All samples failed!")
print(f" Samples evaluated: {total_samples}")
print(f"\n💾 Results saved to: {output_path}")
return results
def main():
parser = argparse.ArgumentParser(description="AICAS 2026 Benchmark Tool")
parser.add_argument("--model-path", type=str, default="./Qwen3-VL-2B-Instruct", help="Path to model weights")
parser.add_argument("--dataset-path", type=str, default="./data", help="Path to validation dataset")
parser.add_argument("--output", type=str, default="result.json", help="Output JSON file path")
parser.add_argument("--num-samples", type=int, default=None, help="Number of samples to evaluate (default: all)")
parser.add_argument("--random-seed", type=int, default=None, help="Random seed for reproducibility")
args = parser.parse_args()
# Use VLMModel (participants modify this class in evaluation_wrapper.py)
print("=" * 60)
print("Using VLMModel (modify evaluation_wrapper.py to add optimizations)")
print("=" * 60)
# Run benchmark
run_benchmark(
model_class=VLMModel,
model_path=args.model_path,
dataset_path=args.dataset_path,
output_path=args.output,
num_samples=args.num_samples,
random_seed=args.random_seed
)
if __name__ == "__main__":
main()

403
evaluation_wrapper.py Executable file
View File

@ -0,0 +1,403 @@
"""
AICAS 2026 - Participant Core Modification File
Participants should modify the VLMModel class to implement optimizations.
Note:
- Benchmark directly calls self.model.generate() for performance testing.
- Your optimizations should modify self.model or its operators in __init__ via Monkey Patch.
- The generate() method is optional and mainly for debugging.
"""
from typing import Dict
try:
from PIL import Image
except ImportError:
# For testing without PIL
class Image:
pass
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
class VLMModel:
"""
Participant optimization class - modify this to implement optimizations.
Optimization Architecture:
- Split optimizations into separate methods for isolation and testing
- Enable/disable each optimization independently in __init__
- Each optimization method can be tested individually
Important Notes:
1. Benchmark directly calls self.model.generate() for performance testing.
2. Your optimizations should modify self.model or its operators via Monkey Patch.
3. All optimizations are applied in __init__ by calling optimization methods.
"""
def __init__(self, model_path: str, device: str = "cuda:0"):
"""
Initialize model and apply optimizations.
Args:
model_path: Qwen3-VL-2B-Instruct model path
device: CUDA device, e.g., "cuda:0"
"""
self._device = device
self.model_path = model_path
# Load processor
print(f"[VLMModel] Loading processor from {model_path}...")
self._processor = AutoProcessor.from_pretrained(model_path)
# Load model
print(f"[VLMModel] Loading model with FP16...")
self._model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map=device
)
self._model.eval()
# Track applied optimizations
self._optimizations_applied = []
# ================================================================
# Participant Optimization Area - Enable/disable optimizations here
# Uncomment the optimization methods you want to apply
# ================================================================
# 1. Vision Encoder Acceleration
# self._optimize_vision_encoder()
# 2. KV Cache Management
# self._optimize_kv_cache()
# 3. Cross-modal Connector Optimization
# self._optimize_cross_modal_connector()
# 4. Flash Attention Optimization
# self._enable_flash_attention()
# 5. Quantization
# self._apply_quantization()
# Optional: Explore model structure before optimization
# self._explore_model_structure()
# ================================================================
print(f"[VLMModel] Model loaded successfully on {device}")
if self._optimizations_applied:
print(f"[VLMModel] Applied optimizations: {', '.join(self._optimizations_applied)}")
# ================================================================
# Optimization Methods - Implement your optimizations here
# ================================================================
def _explore_model_structure(self):
"""
Helper method to explore model structure.
Use this to understand the model architecture before implementing optimizations.
This helps identify where to apply monkey patches.
"""
print("=" * 60)
print("Model Structure Exploration")
print("=" * 60)
# Explore vision model structure
if hasattr(self._model, 'vision_model'):
print(f"Vision Model: {type(self._model.vision_model)}")
if hasattr(self._model.vision_model, 'encoder'):
if hasattr(self._model.vision_model.encoder, 'layers'):
print(f" Vision Encoder Layers: {len(self._model.vision_model.encoder.layers)}")
# Show first layer structure
if len(self._model.vision_model.encoder.layers) > 0:
print(f" First Layer Type: {type(self._model.vision_model.encoder.layers[0])}")
else:
print("Vision Model: Not found (model structure may differ)")
# Explore language model structure
if hasattr(self._model, 'model'):
print(f"Language Model: {type(self._model.model)}")
if hasattr(self._model.model, 'layers'):
print(f" Language Model Layers: {len(self._model.model.layers)}")
else:
print("Language Model: Not found (model structure may differ)")
# Explore cross-modal components
cross_modal_attrs = ['connector', 'cross_attn', 'cross_attention', 'proj', 'projector']
found_components = []
for attr in cross_modal_attrs:
if hasattr(self._model, attr):
found_components.append(attr)
if found_components:
print(f"Cross-modal Components: {', '.join(found_components)}")
else:
print("Cross-modal Components: Explore manually (structure may vary)")
print("=" * 60)
print("Tip: Use print(self._model) to see full model structure")
print("=" * 60)
def _optimize_vision_encoder(self):
"""
Optimize Vision Encoder for high-resolution image inputs.
Optimization Directions:
1. Patch embedding convolution optimization
2. Vision Transformer attention mechanism optimization
3. Layer normalization optimization
4. Memory-efficient image processing
Implementation Steps:
1. Inspect model structure: call self._explore_model_structure()
2. Identify bottlenecks using profiling tools (PyTorch Profiler, nsys, etc.)
3. Implement optimized operators (Triton/CUDA kernels)
4. Replace original operators via monkey patch
Target Components:
- self._model.vision_model (if exists)
- Vision encoder layers and attention mechanisms
- Convolution operations in patch embedding
"""
# TODO: Implement your Vision Encoder optimization here
#
# Example workflow:
# 1. from your_optimization import optimized_attention, optimized_conv
# 2. Inspect: print(self._model.vision_model) to find target layers
# 3. Replace: layer.self_attn.forward = optimized_attention
# 4. Test: Run benchmark to verify improvement
if 'vision_encoder' not in self._optimizations_applied:
self._optimizations_applied.append('vision_encoder')
def _optimize_kv_cache(self):
"""
Optimize KV Cache management to reduce memory fragmentation.
Optimization Directions:
1. Memory layout optimization (contiguous memory allocation)
2. Fragmentation-free allocation strategies
3. Efficient cache reuse patterns
4. Dynamic cache sizing
Implementation Steps:
1. Understand current KV cache implementation in model layers
2. Design memory-efficient cache allocation strategy
3. Implement custom KV cache allocator if needed
4. Apply optimizations via monkey patch or config modification
Target Components:
- self._model.config (cache configuration)
- Attention layers (KV cache allocation)
- Generation loop (cache management)
"""
# Enable KV Cache first
self._model.config.use_cache = True
if hasattr(self._model.config, 'pad_token_id'):
if self._model.config.pad_token_id is None:
self._model.config.pad_token_id = self._model.config.eos_token_id
# TODO: Implement advanced KV Cache optimizations here
#
# Example workflow:
# 1. from your_optimization import FragmentationFreeKVCache
# 2. for layer in self._model.model.layers:
# 3. layer.attention.custom_kv_cache = FragmentationFreeKVCache()
# 4. Test: Monitor memory usage and generation speed
if 'kv_cache' not in self._optimizations_applied:
self._optimizations_applied.append('kv_cache')
def _optimize_cross_modal_connector(self):
"""
Optimize Cross-modal Connector computation efficiency.
Optimization Directions:
1. Cross-attention mechanism optimization
2. Vision-to-language projection optimization
3. Multi-modal fusion layer efficiency
4. Feature alignment and transformation optimization
Implementation Steps:
1. Identify cross-modal components using self._explore_model_structure()
2. Profile cross-modal operations to find bottlenecks
3. Implement optimized cross-attention or projection kernels
4. Replace original operations via monkey patch
Note: Qwen3-VL's cross-modal structure may vary.
Use model exploration to identify actual component names and locations.
"""
# TODO: Implement your Cross-modal Connector optimization here
#
# Example workflow:
# 1. Explore: self._explore_model_structure() to find connector components
# 2. from your_optimization import optimized_cross_attention
# 3. Identify: Inspect model to find cross-attention layers
# 4. Replace: connector.cross_attention.forward = optimized_cross_attention
# 5. Test: Verify accuracy and performance improvements
if 'cross_modal' not in self._optimizations_applied:
self._optimizations_applied.append('cross_modal')
def _enable_flash_attention(self):
"""
Enable or implement Flash Attention optimization.
Implementation Approaches:
Approach 1: Enable PyTorch's Built-in Flash Attention (Simple)
- Uses torch.backends.cuda.enable_flash_sdp(True)
- Easy to enable but limited customization
- May not work for all attention patterns in Qwen3-VL
Approach 2: Implement Custom Flash Attention (Advanced, Recommended)
- Write custom Triton/CUDA kernels for attention computation
- Replace torch.nn.functional.scaled_dot_product_attention
- Full control over attention computation and memory layout
- Better performance potential but requires more implementation effort
Recommended: Implement Approach 2 for better performance gains.
Use profiling to identify which attention operations benefit most from optimization.
"""
# TODO: Choose and implement your Flash Attention approach
# Approach 1: Simple (enable PyTorch built-in)
# torch.backends.cuda.enable_flash_sdp(True)
# Approach 2: Advanced (custom implementation - recommended)
# from your_optimization import custom_flash_attention
# torch.nn.functional.scaled_dot_product_attention = custom_flash_attention
#
# Or replace at layer level:
# for layer in self._model.model.layers:
# layer.self_attn.forward = custom_attention_with_flash
if 'flash_attention' not in self._optimizations_applied:
self._optimizations_applied.append('flash_attention')
def _apply_quantization(self):
"""
Apply quantization to reduce model size and speed up inference.
Optimization Directions:
1. INT8 quantization (8-bit integer)
2. FP8 quantization (8-bit floating point)
3. Mixed precision quantization
4. Dynamic vs static quantization
Implementation Steps:
1. Choose quantization strategy based on accuracy/performance trade-off
2. Use quantization libraries (BitsAndBytes, TensorRT, etc.)
3. Calibrate quantized model on validation data
4. Verify accuracy preservation
Note: Quantization may require reloading the model with quantization config.
Consider applying quantization before other optimizations if model reload is needed.
"""
# TODO: Implement your quantization here
#
# Example workflow:
# 1. from transformers import BitsAndBytesConfig
# 2. quantization_config = BitsAndBytesConfig(load_in_8bit=True)
# 3. Note: May need to reload model with quantization config
# 4. Test: Verify accuracy and performance improvements
if 'quantization' not in self._optimizations_applied:
self._optimizations_applied.append('quantization')
# Required properties for benchmark
@property
def processor(self):
"""
Required by benchmark for input processing.
Benchmark uses this to prepare inputs with unified tokenizer.
"""
return self._processor
@property
def model(self):
"""
Required by benchmark for direct model.generate() calls.
Benchmark directly calls self.model.generate() for performance testing.
Your optimizations should modify this model object or its operators.
"""
return self._model
@property
def device(self):
"""
Required by benchmark for device information.
"""
return self._device
def generate(
self,
image: Image.Image,
question: str,
max_new_tokens: int = 128
) -> Dict:
"""
Generate answer (optional method, mainly for debugging).
Note: Benchmark uses self.model.generate() directly for performance testing.
This method is provided for convenience and debugging purposes.
Args:
image: PIL Image object
question: Question text
max_new_tokens: Maximum tokens to generate
Returns:
Dict: {
"text": str, # Generated text answer
"token_count": int # Generated token count
}
"""
# Build Qwen3-VL message format
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]
}]
# Process inputs
inputs = self._processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(self._device)
# Generate
with torch.no_grad():
output_ids = self._model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
temperature=0.0,
top_p=1.0,
use_cache=True
)
# Extract generated tokens (remove input part)
input_len = inputs.input_ids.shape[1]
generated_ids = output_ids[0][input_len:]
# Decode
text = self._processor.tokenizer.decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
return {
"text": text,
"token_count": len(generated_ids)
}

31
requirements.txt Executable file
View File

@ -0,0 +1,31 @@
# AICAS 2026 - 环境依赖
# ============================================
# 核心框架
torch>=2.0.0
transformers>=4.40.0
accelerate>=0.25.0
# 数据处理
datasets>=2.14.0
Pillow>=9.0.0
# 进度条
tqdm>=4.65.0
# 系统信息(可选,用于获取详细的硬件信息)
psutil>=5.9.0
# 可选Triton 算子开发
triton>=2.1.0
# 可选Flash Attention需要 CUDA 编译)
# flash-attn>=2.0.0
# 可选:量化工具
# bitsandbytes>=0.41.0
# auto-gptq>=0.5.0
# 可选Profiling
# tensorboard>=2.14.0