NVIDIA, ARM, and Intel try to make a good FP8 format
http://arxiv.org/abs/2209.05433
- INT8 never became the de facto standard, requires some fiddling, not all modules are supported, etc etc
- Ofc this is not supported by frameworks and hardware
- Paper does not mention any real throughput / latency metrics
If this lands, this will be very cool. Though in my experience FP16 helps with batch size / memory, not speed.
http://arxiv.org/abs/2209.05433
- INT8 never became the de facto standard, requires some fiddling, not all modules are supported, etc etc
- Ofc this is not supported by frameworks and hardware
- Paper does not mention any real throughput / latency metrics
If this lands, this will be very cool. Though in my experience FP16 helps with batch size / memory, not speed.