Closed Bug 1708743 (simd-avx2) Opened 4 years ago Closed 3 years ago

x64/x86 SIMD: AVX2 support

Tracking

()

Status:

RESOLVED FIXED

Milestone:

101 Branch

Tracking Flags:

Tracking

Status

firefox101

---

fixed

People

(Reporter: lth, Assigned: yury)

References

(Blocks 2 open bugs)

Details

Attachments

(7 files)

Bug 1708743 - Enable some AVX for SIMD, JS shell preference. r?lth 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - Enable more AVX tests. r?lth 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - [wasm] Use IsAvxPresent flag in build ID. r?lth 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - Add javascript.options.wasm_simd_avx preference. r?jandem 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - Add HasAVX2() to assembler; detect via CPUID. r?lth 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - Enable AVX for spec tests. r?rhunt 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708743 - Enable AVX support by default in release. r?lth 3 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review

Lars T Hansen [:lth]

Reporter

Description

•

4 years ago

We should experiment with AVX2 support for SIMD on x86 and x64. Almost 60% of our Windows users have AVX2 (https://firefoxgraphics.github.io/telemetry/#view=system), and this can only increase. With AVX2 we would get lower register pressure from the 3-address ops, and better code generation in some cases.

My understanding is that plain AVX is probably not worth the bother, but I'll try to back that up with links to discussions.

We used to have AVX support for SIMD.js but we turned it off for two reasons:

the AVX unit is frequently powered down and powering it up is expensive. It is only code that uses Serious SIMD that benefits from getting AVX codegen.
the YMM registers caused some weird stalls at task switch time on MacOS.

The parameters may have changed here (hypothetical fixes to MacOS + Mac is moving to ARM64; maybe the cost of enabling/disabling AVX is lower in current chips or OSs) and in addition it may be that we can stick to SSE4.1 in the baseline compiler to avoid stalls during startup. There must be other ideas.

I will try to dig up old bugs that pertain to the known problems so that we can investigate how to avoid them.

One thing I don't know yet is whether there's a penalty for mixing AVX code (instructions have a special prefix) with non-AVX code, so that it would be necessary to do AVX encodings for everything before we could assess performance.

Lars T Hansen [:lth]

Reporter

Comment 1

•

4 years ago

https://github.com/WebAssembly/simd/issues/342#issuecomment-834805766 suggests (this has to be checked) that AVX-encoded instructions no longer requires memory operands to be aligned. At the moment, for example, we can't do PADDD offs(basereg), destreg because we don't normally know if the effective address is aligned. Being able to do so without worrying alignment would save us from doing a load, and save us from dedicating a register to the loaded value.

Lars T Hansen [:lth]

Reporter

Updated

•

4 years ago

Blocks: 1713056

lukas.bernhard

Comment 2

•

4 years ago

The alignment requirements are documented in Intel Manual Vol I, Chapter 14 PROGRAMMING WITH AVX, FMA AND AVX2
In short, only explicitly aligned avx instructions trigger a GP fault; "regular" avx instructions won't.

14.9 MEMORY ALIGNMENT
[...]
With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, 
arithmetic and data processing instructions operate in a flexible environment regarding memory address 
alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load 
operation by default. Memory arguments for most instructions with VEX prefix operate normally without
causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions). The instructions that 
require explicit memory alignment requirements are listed in Table 14-22
[...]