Loading...
Implement multi-head attention by splitting Q, K, V into multiple heads.
Given Q, K, V of shape (n, d) and number of heads h:
Assume d is divisible by h. No linear projections needed — just split and concat.
Input:
Output: Multi-head attention output (n, d), values rounded to 4 decimal places.
2 4 2 1 0 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 2 3 4 5 6 7 8
[1.7311 2.7311 3.2689 4.2689] [4.2689 5.2689 6.7311 7.7311]