Leuchtturm @leuchtturm

0 posts0 participants0 posts today

**Tweede golf** @tweedegolf@fosstodon.org · 6d

Tweede golf @tweedegolf@fosstodon.org

New blog series: @folkertdev shows how we use SIMD in the zlib-rs project.

SIMD is crucial to good performance, but learning how to use it can be daunting. In this series we'll show concrete examples of using SIMD in a real world project.

Part 1 explains how the compiler already uses SIMD for us, how to evaluate whether it's doing a good job, and how to use a more optimal version when the current CPU supports it.

https://tweedegolf.nl/en/blog/153/simd-in-zlib-rs-part-1-autovectorization-and-target-features

@trifectatech

tweedegolf.nlSIMD in zlib-rs (part 1): Autovectorization and target features - Blog - Tweede golfI'm fascinated by the creative use of SIMD instructions. When you first learn about SIMD, it is clear that doing more multiplications in a single instruction is useful for speeding up matrix multi ...

#rustlang #datacompression #simd

**mkretz** @mkretz@floss.social · Apr 3

Apr 3

mkretz @mkretz@floss.social

While implementing complex numbers for #simd I tripped over failures wrt. negative zero. After multiple re-readings of C23 Annex G and considering the meaning of infinite infinities on a 2D plane (with zeros simply being their inverse) I believe #C and #CPlusPlus should ignore the sign of zeros and infinities in their x+iy representations of complex numbers. https://compiler-explorer.com/z/YavE4MnMj provides some motivation.
Am I missing something?

40%No, ignore signs: r=0 or r=∞ => θ is indeterminate
40%Yes, the 8 different 0s and ∞s tell me something
20%What are you talking about?

compiler-explorer.comCompiler Explorer - C++ int main() { using C = std::complex<double>; std::cout << C() * -C() << '\n'; std::cout << 0. * -C() << '\n'; }

**N-gated Hacker News** @ngate@mastodon.social · Mar 7

Mar 7

N-gated Hacker News @ngate@mastodon.social

Ah, the classic tale of a coder thinking #SIMD would make their code fly , only to discover it trips over its own feet . Our hero's memory seems as patchy as their #benchmarks, but fear not, the valuable lesson here is clear: #optimization is just a synonym for #headache.
https://genna.win/blog/convolution-simd/ #coding #woes #lessons #HackerNews #ngated

genna.winPerformance optimization, and how to do it wrong | Just wing itOptimization is hard. And sometimes, the compiler makes it even harder.

**LavX News** @lavxnews@mastodon.cloud · Feb 22

Feb 22

LavX News @lavxnews@mastodon.cloud

Unlocking the Power of Assembly Language in FFmpeg: A Deep Dive into SIMD Programming

FFmpeg's assembly language lessons reveal the intricacies of SIMD programming, offering developers a chance to optimize multimedia processing. This article explores the significance of assembly langua...

https://news.lavx.hu/article/unlocking-the-power-of-assembly-language-in-ffmpeg-a-deep-dive-into-simd-programming

#news #tech #FFmpeg

**IT News** @itnewsbot@schleuss.online · Dec 23, 2024

Dec 23, 2024

IT News @itnewsbot@schleuss.online

Faster Integer Division with Floating Point - Multiplication on a common microcontroller is easy. But division is much more diff... - https://hackaday.com/2024/12/22/faster-integer-division-with-floating-point/ #softwaredevelopment #softwarehacks #optimization #assembly #avx-512 #x86_64 #simd #x86

Hackaday · Dec 23, 2024Faster Integer Division With Floating PointMultiplication on a common microcontroller is easy. But division is much more difficult. Even with hardware assistance, a 32-bit division on a modern 64-bit x86 CPU can run between 9 and 15 cycles.…

**FCLC** @fclc@mast.hpc.social · Dec 9, 2024 *

Dec 9, 2024 *

FCLC @fclc@mast.hpc.social

Channeling my inner @shafik, assuming a standard, compliant #riscv processor, what kind of float instructions can be executed on the vector unit of a processor that advertises

"RV32IMFDZve64f"

#HPC #IEEE754 #SIMD #RISCV #RVV

https://github.com/riscvarchive/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf

7%1xFP32, 0xFP64
27%1xFP32, 1xFP64
67%2xFP32, 1xFP64
13%2xFP32, 0xFP64

**FCLC** @fclc@mast.hpc.social · Dec 6, 2024

Dec 6, 2024

FCLC @fclc@mast.hpc.social

Days since burned by the #RVV Spec: 0

Turns out the 32 bit version of RVV doesn’t mandate the normal f and D data type extensions unlike the 64 bit version -.-

#riscv #simd #HPC

**FCLC** @fclc@mast.hpc.social · Dec 6, 2024

Dec 6, 2024

FCLC @fclc@mast.hpc.social

For the #SIMD nerds, a fun discussion of algorithm design on fixed vs variable length vector extensions: https://bsky.app/profile/falvyu.bsky.social/post/3lcm2ulsphs2y

Bluesky Social · Dec 6, 2024Falvyu (@falvyu.bsky.social)There's been a lot of discourse on fixed-length SIMD (e.g. SSE, AVX512, NEON) vs variable-length/vector (SVE, RVV) ISAs. It's probably too early for a definite answer. But as I've designed SIMD image processing algorithms, I'll share a few results.

**FCLC** @fclc@mast.hpc.social · Nov 28, 2024

Nov 28, 2024

FCLC @fclc@mast.hpc.social

Hey friends!
For folks interested in #RISCV, and especially #RVV, here's some information on the #tenstorrent in house designed CPU!

High level, vector is 2x256, full RVV1.0 as well as a fair few of the optional extras to RVV1.0!

Phoronix article here: https://www.phoronix.com/news/LLVM-20-Tenstorrent-Ascalon

LLVM patches here: https://github.com/llvm/llvm-project/pull/115100

One Pager: https://cdn.sanity.io/files/jpb4ed5r/production/6a28f7d59b6d1300fccdbdd394e192a4fd5f54c6.pdf

www.phoronix.comLLVM Merges Support The For Tenstorrent TT-Ascalon-D8 RISC-V CPU

#HPC #SIMD

**mkretz** @mkretz@floss.social · Nov 23, 2024

Nov 23, 2024

mkretz @mkretz@floss.social

C++26 will have data-parallel types (or std::simd as it came to be known; unless we rename it next meeting — don't settle in for the name just yet) #cpp #cplusplus #cpp26 #simd

**sarah quiñones** @sarah_vegan_btw@eldritch.cafe · Nov 13, 2024

Nov 13, 2024

sarah quiñones @sarah_vegan_btw@eldritch.cafe

#simd

there's this trick i randomly found a few years ago and i've been wondering if there's a name for it or if other people have done this before

```
for enforcing floating point determinism with realigned buffers

if we have
x x x 0 1 2 3 4 5 6 7 x x x

where x is the identity for my operation, and our operation is commutative (not necessarily associative)

then adding x padding doesn't affect the result as long as we do a tree reduction at the end

e.g.

accumulate in register: v = 0+4 1+5 2+6 3+7

tree reduction step 0: (0+4)+(2+6) (1+5)+(3+7)
tree reduction step 1: ((0+4)+(2+6)) + ((1+5)+(3+7))

if we add padding (e.g., by realigning the buffer and using a masked load)

accumulate in register: v = x+1+5 x+2+6 x+3+7 0+4+x

tree reduction step 0: (1+5)+(3+7) (0+4)+(2+6)
tree reduction step 1: ((1+5)+(3+7)) + ((0+4)+(2+6))

commuting the elements shows us that this is the exact same result as the previous one, so the bit pattern of the final result is unaffected (modulo signed zero, nan, etc)
```

**FCLC** @fclc@mast.hpc.social · Nov 4, 2024

Nov 4, 2024

FCLC @fclc@mast.hpc.social

HW implementation people: is #RVV as bad to implement if it’s purely in order?

My intuition is you’re still going to have trouble around LMUL + gather, but that seems much easier without tracking implied state/register renaming across OoO

#SIMD #riscv #HPC

**Karsten Schmidt** @toxi@mastodon.thi.ng · Nov 4, 2024

Nov 4, 2024

Karsten Schmidt @toxi@mastodon.thi.ng

Yesterday, one year ago... (Still wondering how many people actually have read or tried out any of these)

https://mastodon.thi.ng/@toxi/111348591236791838

#ThingUmbrella #HowToThing #TypeScript

**FCLC** @fclc@mast.hpc.social · Oct 6, 2024

Oct 6, 2024

FCLC @fclc@mast.hpc.social

A real #SIMD #ISA on #riscv?

Pete Crawley, AKA Corsix, deep dives the Tenstorrent wormhole Tensix vector unit!

https://www.corsix.org/content/tt-wh-part6

www.corsix.orgTenstorrent Wormhole Series Part 6: Vector instruction set

Replied in thread

**FCLC** @fclc@mast.hpc.social · Sep 20, 2024

Sep 20, 2024

FCLC @fclc@mast.hpc.social

@Methylzero I had an idea last year around adding an extension to use the #FP16 FPUs as 10 bit int pipelines to save a cycle on IFMAs and I16ADD over the int16 MAC/add instructions, but they were seen as too niche (even for x86)

There was already precedent on this sort of thing (avx512 IFMA did this for the FP64 pipes)

Idea was saving a cycle (3.5 instead of 4.5) and saving some power (but not dealing with the extra 6 bits of a normal int16)

#simd #HPC

**FCLC** @fclc@mast.hpc.social · Sep 17, 2024

Sep 17, 2024

FCLC @fclc@mast.hpc.social

Hey friends! Looking for clarity on the topic of the matrix extensions to #RISCV.

I’ve seen a lot of proposals for specs around, but is there an actual, in progress, official *spec* that someone can point me towards?

#HPC #RVV #SIMD

Replied in thread

**NLnet Labs** @nlnetlabs@fosstodon.org · Sep 12, 2024

Sep 12, 2024

NLnet Labs @nlnetlabs@fosstodon.org

@resingm @ximon18 Meanwhile, it's day 4 and @bal4e is seriously on a mission with making the `domain` zone file parser lightning fast. ️ #DNS #SIMD #rustlang️ https://github.com/NLnetLabs/domain/pull/388

GitHubOverhaul parsing from the presentation format by bal-e · Pull Request #388 · NLnetLabs/domainBy bal-e

**NLnet Labs** @nlnetlabs@fosstodon.org · Sep 6, 2024 *

Sep 6, 2024 *

NLnet Labs @nlnetlabs@fosstodon.org

Jeroen Koekkoek, one of our lead developers, has collaborated with @lemire to create a blazingly fast #DNS zone file parser that is now part of our authoritative nameserver NSD.

They have now published a paper outlining how they enhanced parsing throughput using data parallelism, specifically Single Instruction Multiple Data (#SIMD) instructions available on commodity processors. #programming https://www.authorea.com/1222979

Parsing speed of simdzone compared to Knot DNS 3.3.4 and NSD 4.9.

**FCLC** @fclc@mast.hpc.social · Sep 5, 2024

Sep 5, 2024

FCLC @fclc@mast.hpc.social

I feel like I’m missing something with how the overlap of RV extensions is done, namely the implied support of types in #RVV.

Ex: Given a chip that supports rv64gfdv, does that imply that both the scalar and vector units *must* support FP32 and FP64?

Or would an implementation that has FP64 on the scalar units only, but FP32 on both the scalar and the vector units, be valid?

Namely the passage I’m uncertain of is https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#13-vector-floating-point-instructions

#riscv #simd #HPC

**FCLC** @fclc@mast.hpc.social · Sep 2, 2024

Sep 2, 2024

FCLC @fclc@mast.hpc.social

Clang/LLVM friends, trying to understand *why* Clang (18) doesn't see through what seems to me like an obvious optimization.

#compiler_explorer link here, explanation of what I don't understand follows:
https://godbolt.org/z/j8WqsMjb6

Going through Hackers delight and doing some of the dirt simple exercises, I dumped the assembly for Chapter 1 exercise 2 "loop that goes from 1 to 0xFFFFFFFF". (changed to not fault in CE)

(continues in next post, but putting hashtags here)

godbolt.orgCompiler ExplorerCompiler Explorer is an interactive online compiler which shows the assembly output of compiled C++, Rust, Go (and many more) code.

#clang #compilers #SIMD

Recent searches

Search options

Administered by:

Server stats:

#simd