evo

Evolvable programming language -- evo

View the Project on GitHub lancerstadium/evo

EVO Reference

Item Type Lang Company Platform Targets Main Func Main Opt
TinyMaix Infer C Sipeed MCU SSE NEON CSKYV2 RV32P   Inline Asm,
CMSIS-NN              
TinyEngine Infer C   MCU      
ORT Infer C++ Microsoft PC, Server SSE AVX NEON CUDA   Inline Asm,
microTVM Infer C++ Apache Paper      
TFLM Infer C++ Google MCU Arm(Cortex-M), Hexagon, RISC-V, Xtensa ??? ??? ???
NCNN Infer C/C++ Tencent Phone      
CoreML Train & Infer Swift Apple IPhone Arm(Cortex-M) Metal ??? ???
MNN Train & Infer C++ Alibaba Phone Arm(Cortex-M/A) SSE AVX NEON Metal HIAI OpenCL Vulkan CUDA Metal Convert, Compress, Express, Train, CV Inline Asm, Winograd Conv, FP16
MindSpore Train & Infer C++/Python HuaWei? All?      

性能:

Perf

1 主流引擎架构

大模型全栈架构图:

ALL

推理引擎架构: engine

2 TinyML的推理引擎

端侧部署: edge

AI部署平台细分:

PlatformLevel AI-Box AI-Camera AIoT TinyML
Storage Media eMMC/SSD eMMC/Nand/Nor Nor/SIP Flash Nor/SIP Flash
Storage Size >=8GB 16MB~8GB 1~16MB 16KB~16MB
Memory Media DDR4 mostly DDR3 SRAM/PSRAM SRAM/PSRAM
Memory Size >=2GB 64MB~2GB 0.5~64MB 2KB~8MB
CPU Freq >=1.5G 0.5~1.5G 100~500M 16~500M
Computing Power >=2TOPS 0.2~1TOPS 50~200GOPS <1GOPS
Deploy Language python/c++ python/c++ mpy/C mostly C
Typical Device JetsonNano RV1109 IPC BL808/K210 ESP32/BL618
Typical Board Price >$100 $20~$100 $5~$20 <$5
Typical Chip Price >$10 $4~$10 $2~$5 $1~$3

兼容性与性能之间平衡:

  1. 读入模型支持.onnx来提升框架兼容性,将读入后的模型压缩为自定义的运行时模型,降低运行时内存开销(使用flatbuffer);
  2. 动态静态图优化:主要性能瓶颈在于运行时内存与数据IO,对于TinyML的场景需要做专项的量化、调度方案;
  3. 异构执行与内联汇编:选取热点算子进行内联汇编优化,支持硬件汇编指令,提升推理速度;
  4. 计算加载与卸载:需要搭建模型数据库,针对模型选取推理网络类型(边缘独立推理、边缘集群推理、云边协同推理),在指定网络下,尽量降低推理时延和内存占用;

3 部署实验

3.1 实验环境

  Pynq-Z2 PC
CPU ARM Cortex A9  

3.2 TFLM

TFLM(TensorFlow Lite for Microcontrollers)自称其运行时(runtime)在 Cortex M3 上仅需 16KB,可以直接在“裸机”上运行,不需要操作系统支持。

参考:

3.2.1 构建

TFLM 使用 Bazel 构筑工具(Google的构筑工具,基于JRM的,较难用)

  1. Bazel仓库去选择适合版本下载,这里是x86_64:
    # 下载可执行文件
    wget https://github.com/bazelbuild/bazel/releases/download/7.2.1/bazel-7.2.1-linux-x86_64
    # 移动到/usr/local/bin
    sudo cp ./bazel-7.2.1-linux-x86_64 /usr/local/bin/bazel
    # 更新环境变量
    source ~/.bashrc
    # 检查版本
    bazel --version
    

Bazel 中的配置文件:

  1. WORKSPACE:含有该文件的目录将会被视为根目录
  2. BUILD:含有该构建规则文件的目录被视为项目的一个模块
  1. 构筑 TFLM
git clone https://github.com/tensorflow/tflite-micro.git
cd tflite-micro
# 查看构建规则
cat BUILD
# 构筑 micro 工具链:无效
bazel build
# 这个管用
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=linux TARGET_ARCH=x86_64 microlite

3.2.2 示例

qemu进行模拟测试:

./tensorflow/lite/micro/tools/ci_build/test_cortex_m_qemu.sh
3.2.x 综合评估

3.3 TinyEngine

TinyEngine

3.3.1 构建
git clone --recursive https://github.com/mit-han-lab/tinyengine.git
conda create -n tinyengine python=3.6 pip
conda activate tinyengine
cd tinyengine
pip install -r requirements.txt

3.3.2 示例

3.4 TinyMaix

3.4.1 构建
git clone https://github.com/sipeed/TinyMaix.git
cd TinyMaix
mkdir build
cd build
cmake ..
make
3.4.2 示例
cd examples/cifar10
mkdir build
cd build
cmake ..
make
./cifar10
3.4.x 综合评估
model cifar10 total
Mem    
Time    
Deploy easy  

3.5 CMSIS-NN

参考: