您当前的位置: 首页 >  rust

mutourend

暂无认证

  • 2浏览

    0关注

    661博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

rust对simd支持之RUSTFLAGS

mutourend 发布时间:2019-08-27 11:49:12 ,浏览量:2

1. simd and RUSTFLAGS

simd(单指令流多数据流)支持与硬件紧密关联,在rust-lang/rust/src/librustc_target中对不同的硬件平台和操作系统进行了相应的支持。 在这里插入图片描述 为了让rust编译器根据不同的平台使用特定的指令集,可通过环境变量RUSTFLAGS来让编译器生成相应平台的相应指令集代码。

通过设置RUSTFLAGS="-C target-cpu=xxx"RUSTFLAGS="-C target-features=+xxx"来分别指定CPU和指令集。

2. 查看当前Rust所支持simd 2.1 查看Rust所支持的平台
rustc --print target-list

rust-lang/rust/src/librustc_target/spec中的*.rs文件对应。

aarch64-fuchsia
aarch64-linux-android
aarch64-pc-windows-msvc
aarch64-unknown-cloudabi
aarch64-unknown-freebsd
aarch64-unknown-hermit
aarch64-unknown-linux-gnu
aarch64-unknown-linux-musl
aarch64-unknown-netbsd
aarch64-unknown-none
aarch64-unknown-openbsd
arm-linux-androideabi
arm-unknown-linux-gnueabi
arm-unknown-linux-gnueabihf
arm-unknown-linux-musleabi
arm-unknown-linux-musleabihf
armebv7r-none-eabi
armebv7r-none-eabihf
armv4t-unknown-linux-gnueabi
armv5te-unknown-linux-gnueabi
armv5te-unknown-linux-musleabi
armv6-unknown-freebsd
armv6-unknown-netbsd-eabihf
armv7-linux-androideabi
armv7-unknown-cloudabi-eabihf
armv7-unknown-freebsd
armv7-unknown-linux-gnueabihf
armv7-unknown-linux-musleabihf
armv7-unknown-netbsd-eabihf
armv7r-none-eabi
armv7r-none-eabihf
asmjs-unknown-emscripten
i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-apple-darwin
i686-linux-android
i686-pc-windows-gnu
i686-pc-windows-msvc
i686-unknown-cloudabi
i686-unknown-dragonfly
i686-unknown-freebsd
i686-unknown-haiku
i686-unknown-linux-gnu
i686-unknown-linux-musl
i686-unknown-netbsd
i686-unknown-openbsd
mips-unknown-linux-gnu
mips-unknown-linux-musl
mips-unknown-linux-uclibc
mips64-unknown-linux-gnuabi64
mips64el-unknown-linux-gnuabi64
mipsel-unknown-linux-gnu
mipsel-unknown-linux-musl
mipsel-unknown-linux-uclibc
mipsisa32r6-unknown-linux-gnu
mipsisa32r6el-unknown-linux-gnu
mipsisa64r6-unknown-linux-gnuabi64
mipsisa64r6el-unknown-linux-gnuabi64
msp430-none-elf
nvptx64-nvidia-cuda
powerpc-unknown-linux-gnu
powerpc-unknown-linux-gnuspe
powerpc-unknown-linux-musl
powerpc-unknown-netbsd
powerpc64-unknown-freebsd
powerpc64-unknown-linux-gnu
powerpc64-unknown-linux-musl
powerpc64le-unknown-linux-gnu
powerpc64le-unknown-linux-musl
riscv32imac-unknown-none-elf
riscv32imc-unknown-none-elf
riscv64gc-unknown-none-elf
riscv64imac-unknown-none-elf
s390x-unknown-linux-gnu
sparc-unknown-linux-gnu
sparc64-unknown-linux-gnu
sparc64-unknown-netbsd
sparcv9-sun-solaris
thumbv6m-none-eabi
thumbv7a-pc-windows-msvc
thumbv7em-none-eabi
thumbv7em-none-eabihf
thumbv7m-none-eabi
thumbv7neon-linux-androideabi
thumbv7neon-unknown-linux-gnueabihf
thumbv8m.base-none-eabi
thumbv8m.main-none-eabi
thumbv8m.main-none-eabihf
wasm32-experimental-emscripten
wasm32-unknown-emscripten
wasm32-unknown-unknown
wasm32-unknown-wasi
x86_64-apple-darwin
x86_64-fortanix-unknown-sgx
x86_64-fuchsia
x86_64-linux-android
x86_64-pc-windows-gnu
x86_64-pc-windows-msvc
x86_64-rumprun-netbsd
x86_64-sun-solaris
x86_64-unknown-bitrig
x86_64-unknown-cloudabi
x86_64-unknown-dragonfly
x86_64-unknown-freebsd
x86_64-unknown-haiku
x86_64-unknown-hermit
x86_64-unknown-l4re-uclibc
x86_64-unknown-linux-gnu
x86_64-unknown-linux-gnux32
x86_64-unknown-linux-musl
x86_64-unknown-netbsd
x86_64-unknown-openbsd
x86_64-unknown-redox
x86_64-unknown-uefi
2.2 查看Rust所支持平台的所支持的features(指令集)
# uname -a //查看当前系统平台
Linux zyd-VirtualBox 4.15.0-58-generic #64~16.04.1-Ubuntu SMP Wed Aug 7 14:10:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# rustc --target=x86_64-unknown-linux-gnu --print target-features
Available features for this target:
    16bit-mode                    - 16-bit mode (i8086).
    32bit-mode                    - 32-bit mode (80386).
    3dnow                         - Enable 3DNow! instructions.
    3dnowa                        - Enable 3DNow! Athlon instructions.
    64bit                         - Support 64-bit instructions.
    64bit-mode                    - 64-bit mode (x86_64).
    adx                           - Support ADX instructions.
    aes                           - Enable AES instructions.
    atom                          - Intel Atom processors.
    avx                           - Enable AVX instructions.
    avx2                          - Enable AVX2 instructions.
    avx512bitalg                  - Enable AVX-512 Bit Algorithms.
    avx512bw                      - Enable AVX-512 Byte and Word Instructions.
    avx512cd                      - Enable AVX-512 Conflict Detection Instructions.
    avx512dq                      - Enable AVX-512 Doubleword and Quadword Instructions.
    avx512er                      - Enable AVX-512 Exponential and Reciprocal Instructions.
    avx512f                       - Enable AVX-512 instructions.
    avx512ifma                    - Enable AVX-512 Integer Fused Multiple-Add.
    avx512pf                      - Enable AVX-512 PreFetch Instructions.
    avx512vbmi                    - Enable AVX-512 Vector Byte Manipulation Instructions.
    avx512vbmi2                   - Enable AVX-512 further Vector Byte Manipulation Instructions.
    avx512vl                      - Enable AVX-512 Vector Length eXtensions.
    avx512vnni                    - Enable AVX-512 Vector Neural Network Instructions.
    avx512vpopcntdq               - Enable AVX-512 Population Count Instructions.
    bmi                           - Support BMI instructions.
    bmi2                          - Support BMI2 instructions.
    cldemote                      - Enable Cache Demote.
    clflushopt                    - Flush A Cache Line Optimized.
    clwb                          - Cache Line Write Back.
    clzero                        - Enable Cache Line Zero.
    cmov                          - Enable conditional move instructions.
    cx16                          - 64-bit with cmpxchg16b.
    ermsb                         - REP MOVS/STOS are fast.
    f16c                          - Support 16-bit floating point conversion instructions.
    false-deps-lzcnt-tzcnt        - LZCNT/TZCNT have a false dependency on destregister.
    false-deps-popcnt             - POPCNT has a false dependency on dest register.
    fast-11bytenop                - Target can quickly decode up to 11 byte NOPs.
    fast-15bytenop                - Target can quickly decode up to 15 byte NOPs.
    fast-bextr                    - Indicates that the BEXTR instruction is implemented as a single uop with good throughput..
    fast-gather                   - Indicates if gather is reasonably fast..
    fast-hops                     - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
    fast-lzcnt                    - LZCNT instructions are as fast as most simple integer ops.
    fast-partial-ymm-or-zmm-write - Partial writes to YMM/ZMM registers are fast.
    fast-scalar-fsqrt             - Scalar SQRT is fast (disable Newton-Raphson).
    fast-shld-rotate              - SHLD can be used as a faster rotate.
    fast-variable-shuffle         - Shuffles with variable masks are fast.
    fast-vector-fsqrt             - Vector SQRT is fast (disable Newton-Raphson).
    fma                           - Enable three-operand fused multiple-add.
    fma4                          - Enable four-operand fused multiple-add.
    fsgsbase                      - Support FS/GS Base instructions.
    fxsr                          - Support fxsave/fxrestore instructions.
    gfni                          - Enable Galois Field Arithmetic Instructions.
    glm                           - Intel Goldmont processors.
    glp                           - Intel Goldmont Plus processors.
    idivl-to-divb                 - Use 8-bit divide for positive values less than 256.
    idivq-to-divl                 - Use 32-bit divide for positive values less than 2^32.
    invpcid                       - Invalidate Process-Context Identifier.
    lea-sp                        - Use LEA for adjusting the stack pointer.
    lea-uses-ag                   - LEA instruction needs inputs at AG stage.
    lwp                           - Enable LWP instructions.
    lzcnt                         - Support LZCNT instruction.
    macrofusion                   - Various instructions can be fused with conditional branches.
    merge-to-threeway-branch      - Merge branches to a three-way conditional branch.
    mmx                           - Enable MMX instructions.
    movbe                         - Support MOVBE instruction.
    movdir64b                     - Support movdir64b instruction.
    movdiri                       - Support movdiri instruction.
    mpx                           - Support MPX instructions.
    mwaitx                        - Enable MONITORX/MWAITX timer functionality.
    nopl                          - Enable NOPL instruction.
    pad-short-functions           - Pad short functions.
    pclmul                        - Enable packed carry-less multiplication instructions.
    pconfig                       - platform configuration instruction.
    pku                           - Enable protection keys.
    popcnt                        - Support POPCNT instruction.
    prefer-256-bit                - Prefer 256-bit AVX instructions.
    prefetchwt1                   - Prefetch with Intent to Write and T1 Hint.
    prfchw                        - Support PRFCHW instructions.
    ptwrite                       - Support ptwrite instruction.
    rdpid                         - Support RDPID instructions.
    rdrnd                         - Support RDRAND instruction.
    rdseed                        - Support RDSEED instruction.
    retpoline                     - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct..
    retpoline-external-thunk      - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature..
    retpoline-indirect-branches   - Remove speculation of indirect branches from the generated code..
    retpoline-indirect-calls      - Remove speculation of indirect calls from the generated code..
    rtm                           - Support RTM instructions.
    sahf                          - Support LAHF and SAHF instructions.
    sgx                           - Enable Software Guard Extensions.
    sha                           - Enable SHA instructions.
    shstk                         - Support CET Shadow-Stack instructions.
    slm                           - Intel Silvermont processors.
    slow-3ops-lea                 - LEA instruction with 3 ops or certain registers is slow.
    slow-incdec                   - INC and DEC instructions are slower than ADD and SUB.
    slow-lea                      - LEA instruction with certain arguments is slow.
    slow-pmaddwd                  - PMADDWD is slower than PMULLD.
    slow-pmulld                   - PMULLD instruction is slow.
    slow-shld                     - SHLD instruction is slow.
    slow-two-mem-ops              - Two memory operand instructions are slow.
    slow-unaligned-mem-16         - Slow unaligned 16-byte memory access.
    slow-unaligned-mem-32         - Slow unaligned 32-byte memory access.
    soft-float                    - Use software floating point features..
    sse                           - Enable SSE instructions.
    sse-unaligned-mem             - Allow unaligned memory operands with SSE instructions.
    sse2                          - Enable SSE2 instructions.
    sse3                          - Enable SSE3 instructions.
    sse4.1                        - Enable SSE 4.1 instructions.
    sse4.2                        - Enable SSE 4.2 instructions.
    sse4a                         - Support SSE 4a instructions.
    ssse3                         - Enable SSSE3 instructions.
    tbm                           - Enable TBM instructions.
    tremont                       - Intel Tremont processors.
    vaes                          - Promote selected AES instructions to AVX512/AVX registers.
    vpclmulqdq                    - Enable vpclmulqdq instructions.
    waitpkg                       - Wait and pause enhancements.
    wbnoinvd                      - Write Back No Invalidate.
    x87                           - Enable X87 float instructions.
    xop                           - Enable XOP instructions.
    xsave                         - Support xsave instructions.
    xsavec                        - Support xsavec instructions.
    xsaveopt                      - Support xsaveopt instructions.
    xsaves                        - Support xsaves instructions.

Use +feature to enable a feature, or -feature to disable it.
For example, rustc -C -target-cpu=mycpu -C target-feature=+feature1,-feature2

不同的CPU平台支持不同的指令集,可参见CPU指令集,Rust对指令集的选择通过-C target-features=+avx2来enable avx2指令集。注意,尽管所有支持AVX2的CPU都支持FMA,但是如果想同时使用AVX2和FMA,需明确enable,如-C target-features=+avx2,+fma。若想启用的指令集间有依赖关系,也需启用所有依赖的指令集。

2.3 查看Rust所支持平台的所支持的CPU
rustc --target=x86_64-unknown-linux-gnu --print target-cpus
Available CPUs for this target:
    native         - Select the CPU of the current host (currently skylake).
    amdfam10       - Select the amdfam10 processor.
    athlon         - Select the athlon processor.
    athlon-4       - Select the athlon-4 processor.
    athlon-fx      - Select the athlon-fx processor.
    athlon-mp      - Select the athlon-mp processor.
    athlon-tbird   - Select the athlon-tbird processor.
    athlon-xp      - Select the athlon-xp processor.
    athlon64       - Select the athlon64 processor.
    athlon64-sse3  - Select the athlon64-sse3 processor.
    atom           - Select the atom processor.
    barcelona      - Select the barcelona processor.
    bdver1         - Select the bdver1 processor.
    bdver2         - Select the bdver2 processor.
    bdver3         - Select the bdver3 processor.
    bdver4         - Select the bdver4 processor.
    bonnell        - Select the bonnell processor.
    broadwell      - Select the broadwell processor.
    btver1         - Select the btver1 processor.
    btver2         - Select the btver2 processor.
    c3             - Select the c3 processor.
    c3-2           - Select the c3-2 processor.
    cannonlake     - Select the cannonlake processor.
    cascadelake    - Select the cascadelake processor.
    core-avx-i     - Select the core-avx-i processor.
    core-avx2      - Select the core-avx2 processor.
    core2          - Select the core2 processor.
    corei7         - Select the corei7 processor.
    corei7-avx     - Select the corei7-avx processor.
    generic        - Select the generic processor.
    geode          - Select the geode processor.
    goldmont       - Select the goldmont processor.
    goldmont-plus  - Select the goldmont-plus processor.
    haswell        - Select the haswell processor.
    i386           - Select the i386 processor.
    i486           - Select the i486 processor.
    i586           - Select the i586 processor.
    i686           - Select the i686 processor.
    icelake-client - Select the icelake-client processor.
    icelake-server - Select the icelake-server processor.
    ivybridge      - Select the ivybridge processor.
    k6             - Select the k6 processor.
    k6-2           - Select the k6-2 processor.
    k6-3           - Select the k6-3 processor.
    k8             - Select the k8 processor.
    k8-sse3        - Select the k8-sse3 processor.
    knl            - Select the knl processor.
    knm            - Select the knm processor.
    lakemont       - Select the lakemont processor.
    nehalem        - Select the nehalem processor.
    nocona         - Select the nocona processor.
    opteron        - Select the opteron processor.
    opteron-sse3   - Select the opteron-sse3 processor.
    penryn         - Select the penryn processor.
    pentium        - Select the pentium processor.
    pentium-m      - Select the pentium-m processor.
    pentium-mmx    - Select the pentium-mmx processor.
    pentium2       - Select the pentium2 processor.
    pentium3       - Select the pentium3 processor.
    pentium3m      - Select the pentium3m processor.
    pentium4       - Select the pentium4 processor.
    pentium4m      - Select the pentium4m processor.
    pentiumpro     - Select the pentiumpro processor.
    prescott       - Select the prescott processor.
    sandybridge    - Select the sandybridge processor.
    silvermont     - Select the silvermont processor.
    skx            - Select the skx processor.
    skylake        - Select the skylake processor.
    skylake-avx512 - Select the skylake-avx512 processor.
    slm            - Select the slm processor.
    tremont        - Select the tremont processor.
    westmere       - Select the westmere processor.
    winchip-c6     - Select the winchip-c6 processor.
    winchip2       - Select the winchip2 processor.
    x86-64         - Select the x86-64 processor.
    yonah          - Select the yonah processor.
    znver1         - Select the znver1 processor.

跨平台编译,需指定相应的CPU,若只是本地运行的话,可以直接export RUSTFLAGS="-C target_cpu=native"

参考资料: [1] https://rust-lang-nursery.github.io/packed_simd/perf-guide/target-feature/features.html [2] https://github.com/rust-lang/rust/tree/master/src/librustc_target [3] CPU指令集

关注
打赏
1664532908
查看更多评论
立即登录/注册

微信扫码登录

0.0412s