simd(单指令流多数据流)支持与硬件紧密关联,在rust-lang/rust/src/librustc_target
中对不同的硬件平台和操作系统进行了相应的支持。 为了让rust编译器根据不同的平台使用特定的指令集,可通过环境变量
RUSTFLAGS
来让编译器生成相应平台的相应指令集代码。
通过设置RUSTFLAGS="-C target-cpu=xxx"
或RUSTFLAGS="-C target-features=+xxx"
来分别指定CPU和指令集。
rustc --print target-list
与rust-lang/rust/src/librustc_target/spec
中的*.rs文件对应。
aarch64-fuchsia
aarch64-linux-android
aarch64-pc-windows-msvc
aarch64-unknown-cloudabi
aarch64-unknown-freebsd
aarch64-unknown-hermit
aarch64-unknown-linux-gnu
aarch64-unknown-linux-musl
aarch64-unknown-netbsd
aarch64-unknown-none
aarch64-unknown-openbsd
arm-linux-androideabi
arm-unknown-linux-gnueabi
arm-unknown-linux-gnueabihf
arm-unknown-linux-musleabi
arm-unknown-linux-musleabihf
armebv7r-none-eabi
armebv7r-none-eabihf
armv4t-unknown-linux-gnueabi
armv5te-unknown-linux-gnueabi
armv5te-unknown-linux-musleabi
armv6-unknown-freebsd
armv6-unknown-netbsd-eabihf
armv7-linux-androideabi
armv7-unknown-cloudabi-eabihf
armv7-unknown-freebsd
armv7-unknown-linux-gnueabihf
armv7-unknown-linux-musleabihf
armv7-unknown-netbsd-eabihf
armv7r-none-eabi
armv7r-none-eabihf
asmjs-unknown-emscripten
i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-apple-darwin
i686-linux-android
i686-pc-windows-gnu
i686-pc-windows-msvc
i686-unknown-cloudabi
i686-unknown-dragonfly
i686-unknown-freebsd
i686-unknown-haiku
i686-unknown-linux-gnu
i686-unknown-linux-musl
i686-unknown-netbsd
i686-unknown-openbsd
mips-unknown-linux-gnu
mips-unknown-linux-musl
mips-unknown-linux-uclibc
mips64-unknown-linux-gnuabi64
mips64el-unknown-linux-gnuabi64
mipsel-unknown-linux-gnu
mipsel-unknown-linux-musl
mipsel-unknown-linux-uclibc
mipsisa32r6-unknown-linux-gnu
mipsisa32r6el-unknown-linux-gnu
mipsisa64r6-unknown-linux-gnuabi64
mipsisa64r6el-unknown-linux-gnuabi64
msp430-none-elf
nvptx64-nvidia-cuda
powerpc-unknown-linux-gnu
powerpc-unknown-linux-gnuspe
powerpc-unknown-linux-musl
powerpc-unknown-netbsd
powerpc64-unknown-freebsd
powerpc64-unknown-linux-gnu
powerpc64-unknown-linux-musl
powerpc64le-unknown-linux-gnu
powerpc64le-unknown-linux-musl
riscv32imac-unknown-none-elf
riscv32imc-unknown-none-elf
riscv64gc-unknown-none-elf
riscv64imac-unknown-none-elf
s390x-unknown-linux-gnu
sparc-unknown-linux-gnu
sparc64-unknown-linux-gnu
sparc64-unknown-netbsd
sparcv9-sun-solaris
thumbv6m-none-eabi
thumbv7a-pc-windows-msvc
thumbv7em-none-eabi
thumbv7em-none-eabihf
thumbv7m-none-eabi
thumbv7neon-linux-androideabi
thumbv7neon-unknown-linux-gnueabihf
thumbv8m.base-none-eabi
thumbv8m.main-none-eabi
thumbv8m.main-none-eabihf
wasm32-experimental-emscripten
wasm32-unknown-emscripten
wasm32-unknown-unknown
wasm32-unknown-wasi
x86_64-apple-darwin
x86_64-fortanix-unknown-sgx
x86_64-fuchsia
x86_64-linux-android
x86_64-pc-windows-gnu
x86_64-pc-windows-msvc
x86_64-rumprun-netbsd
x86_64-sun-solaris
x86_64-unknown-bitrig
x86_64-unknown-cloudabi
x86_64-unknown-dragonfly
x86_64-unknown-freebsd
x86_64-unknown-haiku
x86_64-unknown-hermit
x86_64-unknown-l4re-uclibc
x86_64-unknown-linux-gnu
x86_64-unknown-linux-gnux32
x86_64-unknown-linux-musl
x86_64-unknown-netbsd
x86_64-unknown-openbsd
x86_64-unknown-redox
x86_64-unknown-uefi
2.2 查看Rust所支持平台的所支持的features(指令集)
# uname -a //查看当前系统平台
Linux zyd-VirtualBox 4.15.0-58-generic #64~16.04.1-Ubuntu SMP Wed Aug 7 14:10:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# rustc --target=x86_64-unknown-linux-gnu --print target-features
Available features for this target:
16bit-mode - 16-bit mode (i8086).
32bit-mode - 32-bit mode (80386).
3dnow - Enable 3DNow! instructions.
3dnowa - Enable 3DNow! Athlon instructions.
64bit - Support 64-bit instructions.
64bit-mode - 64-bit mode (x86_64).
adx - Support ADX instructions.
aes - Enable AES instructions.
atom - Intel Atom processors.
avx - Enable AVX instructions.
avx2 - Enable AVX2 instructions.
avx512bitalg - Enable AVX-512 Bit Algorithms.
avx512bw - Enable AVX-512 Byte and Word Instructions.
avx512cd - Enable AVX-512 Conflict Detection Instructions.
avx512dq - Enable AVX-512 Doubleword and Quadword Instructions.
avx512er - Enable AVX-512 Exponential and Reciprocal Instructions.
avx512f - Enable AVX-512 instructions.
avx512ifma - Enable AVX-512 Integer Fused Multiple-Add.
avx512pf - Enable AVX-512 PreFetch Instructions.
avx512vbmi - Enable AVX-512 Vector Byte Manipulation Instructions.
avx512vbmi2 - Enable AVX-512 further Vector Byte Manipulation Instructions.
avx512vl - Enable AVX-512 Vector Length eXtensions.
avx512vnni - Enable AVX-512 Vector Neural Network Instructions.
avx512vpopcntdq - Enable AVX-512 Population Count Instructions.
bmi - Support BMI instructions.
bmi2 - Support BMI2 instructions.
cldemote - Enable Cache Demote.
clflushopt - Flush A Cache Line Optimized.
clwb - Cache Line Write Back.
clzero - Enable Cache Line Zero.
cmov - Enable conditional move instructions.
cx16 - 64-bit with cmpxchg16b.
ermsb - REP MOVS/STOS are fast.
f16c - Support 16-bit floating point conversion instructions.
false-deps-lzcnt-tzcnt - LZCNT/TZCNT have a false dependency on destregister.
false-deps-popcnt - POPCNT has a false dependency on dest register.
fast-11bytenop - Target can quickly decode up to 11 byte NOPs.
fast-15bytenop - Target can quickly decode up to 15 byte NOPs.
fast-bextr - Indicates that the BEXTR instruction is implemented as a single uop with good throughput..
fast-gather - Indicates if gather is reasonably fast..
fast-hops - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
fast-lzcnt - LZCNT instructions are as fast as most simple integer ops.
fast-partial-ymm-or-zmm-write - Partial writes to YMM/ZMM registers are fast.
fast-scalar-fsqrt - Scalar SQRT is fast (disable Newton-Raphson).
fast-shld-rotate - SHLD can be used as a faster rotate.
fast-variable-shuffle - Shuffles with variable masks are fast.
fast-vector-fsqrt - Vector SQRT is fast (disable Newton-Raphson).
fma - Enable three-operand fused multiple-add.
fma4 - Enable four-operand fused multiple-add.
fsgsbase - Support FS/GS Base instructions.
fxsr - Support fxsave/fxrestore instructions.
gfni - Enable Galois Field Arithmetic Instructions.
glm - Intel Goldmont processors.
glp - Intel Goldmont Plus processors.
idivl-to-divb - Use 8-bit divide for positive values less than 256.
idivq-to-divl - Use 32-bit divide for positive values less than 2^32.
invpcid - Invalidate Process-Context Identifier.
lea-sp - Use LEA for adjusting the stack pointer.
lea-uses-ag - LEA instruction needs inputs at AG stage.
lwp - Enable LWP instructions.
lzcnt - Support LZCNT instruction.
macrofusion - Various instructions can be fused with conditional branches.
merge-to-threeway-branch - Merge branches to a three-way conditional branch.
mmx - Enable MMX instructions.
movbe - Support MOVBE instruction.
movdir64b - Support movdir64b instruction.
movdiri - Support movdiri instruction.
mpx - Support MPX instructions.
mwaitx - Enable MONITORX/MWAITX timer functionality.
nopl - Enable NOPL instruction.
pad-short-functions - Pad short functions.
pclmul - Enable packed carry-less multiplication instructions.
pconfig - platform configuration instruction.
pku - Enable protection keys.
popcnt - Support POPCNT instruction.
prefer-256-bit - Prefer 256-bit AVX instructions.
prefetchwt1 - Prefetch with Intent to Write and T1 Hint.
prfchw - Support PRFCHW instructions.
ptwrite - Support ptwrite instruction.
rdpid - Support RDPID instructions.
rdrnd - Support RDRAND instruction.
rdseed - Support RDSEED instruction.
retpoline - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct..
retpoline-external-thunk - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature..
retpoline-indirect-branches - Remove speculation of indirect branches from the generated code..
retpoline-indirect-calls - Remove speculation of indirect calls from the generated code..
rtm - Support RTM instructions.
sahf - Support LAHF and SAHF instructions.
sgx - Enable Software Guard Extensions.
sha - Enable SHA instructions.
shstk - Support CET Shadow-Stack instructions.
slm - Intel Silvermont processors.
slow-3ops-lea - LEA instruction with 3 ops or certain registers is slow.
slow-incdec - INC and DEC instructions are slower than ADD and SUB.
slow-lea - LEA instruction with certain arguments is slow.
slow-pmaddwd - PMADDWD is slower than PMULLD.
slow-pmulld - PMULLD instruction is slow.
slow-shld - SHLD instruction is slow.
slow-two-mem-ops - Two memory operand instructions are slow.
slow-unaligned-mem-16 - Slow unaligned 16-byte memory access.
slow-unaligned-mem-32 - Slow unaligned 32-byte memory access.
soft-float - Use software floating point features..
sse - Enable SSE instructions.
sse-unaligned-mem - Allow unaligned memory operands with SSE instructions.
sse2 - Enable SSE2 instructions.
sse3 - Enable SSE3 instructions.
sse4.1 - Enable SSE 4.1 instructions.
sse4.2 - Enable SSE 4.2 instructions.
sse4a - Support SSE 4a instructions.
ssse3 - Enable SSSE3 instructions.
tbm - Enable TBM instructions.
tremont - Intel Tremont processors.
vaes - Promote selected AES instructions to AVX512/AVX registers.
vpclmulqdq - Enable vpclmulqdq instructions.
waitpkg - Wait and pause enhancements.
wbnoinvd - Write Back No Invalidate.
x87 - Enable X87 float instructions.
xop - Enable XOP instructions.
xsave - Support xsave instructions.
xsavec - Support xsavec instructions.
xsaveopt - Support xsaveopt instructions.
xsaves - Support xsaves instructions.
Use +feature to enable a feature, or -feature to disable it.
For example, rustc -C -target-cpu=mycpu -C target-feature=+feature1,-feature2
不同的CPU平台支持不同的指令集,可参见CPU指令集,Rust对指令集的选择通过-C target-features=+avx2
来enable avx2指令集。注意,尽管所有支持AVX2的CPU都支持FMA,但是如果想同时使用AVX2和FMA,需明确enable,如-C target-features=+avx2,+fma
。若想启用的指令集间有依赖关系,也需启用所有依赖的指令集。
rustc --target=x86_64-unknown-linux-gnu --print target-cpus
Available CPUs for this target:
native - Select the CPU of the current host (currently skylake).
amdfam10 - Select the amdfam10 processor.
athlon - Select the athlon processor.
athlon-4 - Select the athlon-4 processor.
athlon-fx - Select the athlon-fx processor.
athlon-mp - Select the athlon-mp processor.
athlon-tbird - Select the athlon-tbird processor.
athlon-xp - Select the athlon-xp processor.
athlon64 - Select the athlon64 processor.
athlon64-sse3 - Select the athlon64-sse3 processor.
atom - Select the atom processor.
barcelona - Select the barcelona processor.
bdver1 - Select the bdver1 processor.
bdver2 - Select the bdver2 processor.
bdver3 - Select the bdver3 processor.
bdver4 - Select the bdver4 processor.
bonnell - Select the bonnell processor.
broadwell - Select the broadwell processor.
btver1 - Select the btver1 processor.
btver2 - Select the btver2 processor.
c3 - Select the c3 processor.
c3-2 - Select the c3-2 processor.
cannonlake - Select the cannonlake processor.
cascadelake - Select the cascadelake processor.
core-avx-i - Select the core-avx-i processor.
core-avx2 - Select the core-avx2 processor.
core2 - Select the core2 processor.
corei7 - Select the corei7 processor.
corei7-avx - Select the corei7-avx processor.
generic - Select the generic processor.
geode - Select the geode processor.
goldmont - Select the goldmont processor.
goldmont-plus - Select the goldmont-plus processor.
haswell - Select the haswell processor.
i386 - Select the i386 processor.
i486 - Select the i486 processor.
i586 - Select the i586 processor.
i686 - Select the i686 processor.
icelake-client - Select the icelake-client processor.
icelake-server - Select the icelake-server processor.
ivybridge - Select the ivybridge processor.
k6 - Select the k6 processor.
k6-2 - Select the k6-2 processor.
k6-3 - Select the k6-3 processor.
k8 - Select the k8 processor.
k8-sse3 - Select the k8-sse3 processor.
knl - Select the knl processor.
knm - Select the knm processor.
lakemont - Select the lakemont processor.
nehalem - Select the nehalem processor.
nocona - Select the nocona processor.
opteron - Select the opteron processor.
opteron-sse3 - Select the opteron-sse3 processor.
penryn - Select the penryn processor.
pentium - Select the pentium processor.
pentium-m - Select the pentium-m processor.
pentium-mmx - Select the pentium-mmx processor.
pentium2 - Select the pentium2 processor.
pentium3 - Select the pentium3 processor.
pentium3m - Select the pentium3m processor.
pentium4 - Select the pentium4 processor.
pentium4m - Select the pentium4m processor.
pentiumpro - Select the pentiumpro processor.
prescott - Select the prescott processor.
sandybridge - Select the sandybridge processor.
silvermont - Select the silvermont processor.
skx - Select the skx processor.
skylake - Select the skylake processor.
skylake-avx512 - Select the skylake-avx512 processor.
slm - Select the slm processor.
tremont - Select the tremont processor.
westmere - Select the westmere processor.
winchip-c6 - Select the winchip-c6 processor.
winchip2 - Select the winchip2 processor.
x86-64 - Select the x86-64 processor.
yonah - Select the yonah processor.
znver1 - Select the znver1 processor.
跨平台编译,需指定相应的CPU,若只是本地运行的话,可以直接export RUSTFLAGS="-C target_cpu=native"
。
参考资料: [1] https://rust-lang-nursery.github.io/packed_simd/perf-guide/target-feature/features.html [2] https://github.com/rust-lang/rust/tree/master/src/librustc_target [3] CPU指令集