Skip to content

Support for floating point types#1307

Merged
JanFSchulte merged 8 commits into
fastmachinelearning:mainfrom
vloncar:float_type
Aug 6, 2025
Merged

Support for floating point types#1307
JanFSchulte merged 8 commits into
fastmachinelearning:mainfrom
vloncar:float_type

Conversation

@vloncar

@vloncar vloncar commented Jun 8, 2025

Copy link
Copy Markdown
Contributor

Description

Current hls4ml supports only arbitrary precision integer and fixed-point types. This PR adds support for floating point types. Floating point types are defined in ap_float.h and ac_float.h of the respective libraries and cover two distinct cases: IEEE floating-point standard (basically C/C++ types) and general floating-point implementation (any combination of mantissa and exponent). AC types library is more broad and offers more flexibility than AP types one. The PR covers both by introducing FloatPrecisionType for the general case (covered by the ac_float) and StandardFloatPrecisionType (for ap_float<W,E> and ac_std_float<W,E>). In principle one could cram everything in a single type but that makes it complicated to track what is the actual intended use, especially because of the 1-bit difference between AC and AP types.

To use, user can specify float, double, half and bfloat16 as type and this will result in StandardFloatTypePrecision objects to be used and those C++ types used in the generated code. Note that half and bfloat16 aren't supported out of the box and require the user to tweak the code to make it compile, as it is dependent on the compiler how these are exposed. If the user specifies std_float<W,E> the ap_float<W,E> and ac_std_float<W,E> will be generated using the same object. Finally, for full control of AC type, user can use ac_float<W,I,E,Q> which will use the FloatPrecisionType class and emit the corresponding type in code.

The PR is somewhat incomplete as there are numerous nuances of full support of the general case and the half/bfloat16 but is a good starting point and is self-contained. The problem remains with AP types that don't have a public version of ap_float.h (we'll ask AMD about open-sourcing it), so if user uses a general floating-point type the local compilation with compile() won't work. In the future we can tackle the include issue (also for half and bfloat16 on host compilers), as well as look into optimizations of algorithms (for example, using the accumulator type for the CMVM). The intention right now is not to make this a first-class supported feature, rather an exotic option for users who know what they want and are aware of the caveats and rough edges, but more crucially it allows us to avoid silly test failures due to bitwidth issues. This could be advertised as an experimental feature, but I see we're completely lacking any documentation on type setting, so that may come as a separate PR.

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Tests for parsing the new types has been added to test_types.h.

Checklist

Yeah, yeah, I did all the things in the checklist.

@vloncar vloncar requested review from calad0i and jmitrevs June 8, 2025 21:24
@vloncar vloncar added the please test Trigger testing by creating local PR branch label Jun 9, 2025
@vloncar

vloncar commented Jun 27, 2025

Copy link
Copy Markdown
Contributor Author

I'll add ap_float.h and mapping to C++23 types. Converting to draft until I make the change

@vloncar vloncar marked this pull request as draft June 27, 2025 15:08
@vloncar vloncar marked this pull request as ready for review July 28, 2025 18:50
#include "nnet_utils/nnet_types.h"
#include <cstddef>
#include <cstdio>
#if __cplusplus > 202002L

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be #if __cplusplus >= 202302L but since GCC doesn't fully support C++23 the constant is not set yet to that value, hence we check for "newer than C++20". Alternatively, we could check __STDCPP_­BFLOAT16_­T__ and __STDCPP_­FLOAT16_­T__ but these may also not be available even if the implementation is there...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am running GCC version 11.5 which apparently support C++23, but does not support <stdfloat>. So this guard does not protect from compilation failures in my tests. Checking __STDCPP_­BFLOAT16_­T__ works, but then I run into further problems with ap_float not naming a type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we need an #include "nnet_utils/nnet_types.h" in this file. With that, compilation works.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, nnet_types.h is already included? As for the other issues, I am thinking to include <stdfloat> only if the type is used, and not otherwise. Same for ap_float.h, because that one is not available in Vivado 2020.1. Such a mess with these types...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, copied the wrong line. I had to add #include "ap_types/ap_float.h" in the defines.h for it to work.

@JanFSchulte

Copy link
Copy Markdown
Contributor

I've been playing with this a bit and see that when running synthesis in Vitis I get this error

../../../../firmware/nnet_utils/nnet_helpers.h:287:13: error: use of overloaded operator '<<' is ambiguous (with operand types 'std::ostream' (aka 'basic_ostream<char>') and 'ap_float<32, 8, 0>')

Looks like we need an implementation of the << operator in the ap_types.h

@JanFSchulte

Copy link
Copy Markdown
Contributor

I can now synthesize a model in Vitis 2024.1 when I use float as the data type. For an ap_float, there are still some issues, this is the full error stack I'm getting.

/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:24:3: warning: use of this statement in a constexpr function is a C++14 extension [-Wc++14-extensions]
  if (val <= 0)
  ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:29:3: warning: multiple return statements in constexpr function is a C++14 extension [-Wc++14-extensions]
  return log2_ceil((val + 1) / 2) + 1;
  ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:25:5: note: previous return statement is here
    return std::numeric_limits<int>::lowest();
    ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:27:5: note: previous return statement is here
    return 0;
    ^
In file included from ../../../../myproject_test.cpp:1:
In file included from /home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/algorithm:61:
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:324:18: error: no viable overloaded '='
              *__result = *__first;
              ~~~~~~~~~ ^ ~~~~~~~~
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:386:22: note: in instantiation of function template specialization 'std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<const float *, ap_float<32, 8> *>' requested here
                              _Category>::__copy_m(__first, __last, __result);
                                          ^
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:422:23: note: in instantiation of function template specialization 'std::__copy_move_a<false, const float *, ap_float<32, 8> *>' requested here
      return _OI(std::__copy_move_a<_IsMove>(std::__niter_base(__first),
                      ^
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:454:20: note: in instantiation of function template specialization 'std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<const float *, std::vector<float>>, ap_float<32, 8> *>' requested here
      return (std::__copy_move_a2<__is_move_iterator<_II>::__value>
                   ^
../../../../firmware/nnet_utils/nnet_helpers.h:255:10: note: in instantiation of function template specialization 'std::copy<__gnu_cxx::__normal_iterator<const float *, std::vector<float>>, ap_float<32, 8> *>' requested here
    std::copy(in_begin, in_end, dst);
         ^
../../../../myproject_test.cpp:63:13: note: in instantiation of function template specialization 'nnet::copy_data<float, ap_float<32, 8>, 0UL, 300UL>' requested here
      nnet::copy_data<float, input_t, 0, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(in, x);
            ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:162:13: note: candidate function not viable: no known conversion from 'const float' to 'ap_float<32, 8>' for 1st argument
  ap_float& operator=(ap_float &&) = default;
            ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:163:13: note: candidate function not viable: no known conversion from 'const float' to 'const ap_float<32, 8>' for 1st argument
  ap_float& operator=(const ap_float &) = default;
            ^
In file included from ../../../../myproject_test.cpp:1:
In file included from /home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/algorithm:61:
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:754:11: error: no viable overloaded '='
        *__first = __tmp;
        ~~~~~~~~ ^ ~~~~~
/home/tools/Xilinx/Vitis_HLS/2024.1/tps/lnx64/gcc-8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/stl_algobase.h:789:23: note: in instantiation of function template specialization 'std::__fill_n_a<ap_float<32, 8> *, unsigned long, double>' requested here
      return _OI(std::__fill_n_a(std::__niter_base(__first), __n, __value));
                      ^
../../../../firmware/nnet_utils/nnet_helpers.h:304:79: note: in instantiation of function template specialization 'std::fill_n<ap_float<32, 8> *, unsigned long, double>' requested here
template <class data_T, size_t SIZE> void fill_zero(data_T data[SIZE]) { std::fill_n(data, SIZE, 0.); }
                                                                              ^
../../../../myproject_test.cpp:95:19: note: in instantiation of function template specialization 'nnet::fill_zero<ap_float<32, 8>, 300UL>' requested here
            nnet::fill_zero<input_t, N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1>(x);
                  ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:162:13: note: candidate function not viable: no known conversion from 'const double' to 'ap_float<32, 8>' for 1st argument
  ap_float& operator=(ap_float &&) = default;
            ^
/home/tools/Xilinx/Vitis_HLS/2024.1/include/ap_float.h:163:13: note: candidate function not viable: no known conversion from 'const double' to 'const ap_float<32, 8>' for 1st argument
  ap_float& operator=(const ap_float &) = default;
            ^
2 warnings and 2 errors generated.
make: *** [csim.mk:87: obj/myproject_test.o] Error 1
ERROR: [SIM 211-100] 'csim_design' failed: compilation error(s).

@vloncar

vloncar commented Aug 1, 2025

Copy link
Copy Markdown
Contributor Author

Well, it turns out ap_float has many, many limitations on how it can be used, so assignment of literals doesn't work (like in initializations of arrays), comparison operators with literals (like > 0 in relu) also etc. no idea how to solve this. i'm thinking we either can it for now, or leave but not advertise it. after all, most of this was developed so that i can write tests with float as the type and not worry about the f***ing mismatches.

@JanFSchulte

Copy link
Copy Markdown
Contributor

Dang, that's frustrating. I think we should merge it as is and not advertise it. It all works fine from the hls4ml side and would be nice to just have it in there in case ap_float ever gets developed into a fully working data type, we will have support for it.

Comment thread test/pytest/test_types.py Outdated
@JanFSchulte JanFSchulte merged commit 544a16d into fastmachinelearning:main Aug 6, 2025
5 checks passed
morunner pushed a commit to morunner/hls4ml that referenced this pull request Nov 6, 2025
* Support for floating point types

* Use C++23 types for bfloat16 and half

* Implement << op

* Use floating-point headers only if the types require them

* Print a warning if new C++ types (half and bloat16) are used

* Fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

please test Trigger testing by creating local PR branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants