Ticket #113 (new enhancement)
Optimization of ffmpeg h264 decoder
| Reported by: | astrange | Owned by: | astrange |
|---|---|---|---|
| Priority: | normal | Milestone: | Sometime after 1.0 |
| Component: | ffmpeg | Version: | |
| Severity: | normal | Keywords: | |
| Cc: |
Description (last modified by astrange) (diff)
The H.264 decoder is impressively fast, but not enough.
Possible improvements, based on profiling I did on x86:
Cache misses happen too often:
* rewrite very large functions (fill_caches, decode_mb_cabac) to be smaller; interlacing/other oddities can be slightly penalized if necessary as in hl_decode_mb.
* improve cache hints; hl_motion and hl_decode_mb use some, but they use the same ones which might be wrong in both cases. x86 has no hints for store, but maybe non-temporal could be used.
* the decoding context is over 200KB!
* lots of really complex struct accesses, ex:
"h->non_zero_count_cache[3+8*1 + 2*8*i]= h->non_zero_count[left_xy[i]][left_block[0+2*i]];"
Branch misprediction:
* happens often in the CABAC/arithmetic decoder, which is naturally unpredictable. Rewrite more of it to be branchless?
PPC is missing some assembly optimizations that x86 has (cabac, altivec), but no idea if they're needed. I don't have a recent profile on one.

