public final class Float16
extends java.lang.Object
The FP16
class is a wrapper and a utility class to manipulate half-precision 16-bit
IEEE 754
floating point data types (also called fp16 or binary16). A half-precision float can be
created from or converted to single-precision floats, and is stored in a short data type.
The IEEE 754 standard specifies an fp16 as having the following format:
The format is laid out as follows:
1 11111 1111111111 ^ --^-- -----^---- sign | |_______ significand | -- exponent
Half-precision floating points can be useful to save memory and/or bandwidth at the expense of range and precision when compared to single-precision floating points (fp32).
To help you decide whether fp16 is the right storage type for you need, please refer to the table below that shows the available precision throughout the range of possible values. The precision column indicates the step size between two consecutive numbers in a specific part of the range.
Range start | Precision |
---|---|
0 | 1 ⁄ 16,777,216 |
1 ⁄ 16,384 | 1 ⁄ 16,777,216 |
1 ⁄ 8,192 | 1 ⁄ 8,388,608 |
1 ⁄ 4,096 | 1 ⁄ 4,194,304 |
1 ⁄ 2,048 | 1 ⁄ 2,097,152 |
1 ⁄ 1,024 | 1 ⁄ 1,048,576 |
1 ⁄ 512 | 1 ⁄ 524,288 |
1 ⁄ 256 | 1 ⁄ 262,144 |
1 ⁄ 128 | 1 ⁄ 131,072 |
1 ⁄ 64 | 1 ⁄ 65,536 |
1 ⁄ 32 | 1 ⁄ 32,768 |
1 ⁄ 16 | 1 ⁄ 16,384 |
1 ⁄ 8 | 1 ⁄ 8,192 |
1 ⁄ 4 | 1 ⁄ 4,096 |
1 ⁄ 2 | 1 ⁄ 2,048 |
1 | 1 ⁄ 1,024 |
2 | 1 ⁄ 512 |
4 | 1 ⁄ 256 |
8 | 1 ⁄ 128 |
16 | 1 ⁄ 64 |
32 | 1 ⁄ 32 |
64 | 1 ⁄ 16 |
128 | 1 ⁄ 8 |
256 | 1 ⁄ 4 |
512 | 1 ⁄ 2 |
1,024 | 1 |
2,048 | 2 |
4,096 | 4 |
8,192 | 8 |
16,384 | 16 |
32,768 | 32 |
This table shows that numbers higher than 1024 lose all fractional precision.
Modifier and Type | Field and Description |
---|---|
static short |
EPSILON
Epsilon is the difference between 1.0 and the next value representable
by a half-precision floating-point.
|
static int |
EXPONENT_BIAS
The offset of the exponent from the actual value.
|
static int |
EXPONENT_SHIFT
The offset to shift by to obtain the exponent bits.
|
static int |
EXPONENT_SIGNIFICAND_MASK
The bitmask to AND with to obtain exponent and significand bits.
|
static short |
LOWEST_VALUE
Smallest negative value a half-precision float may have.
|
static int |
MAX_EXPONENT
Maximum exponent a finite half-precision float may have.
|
static short |
MAX_VALUE
Maximum positive finite value a half-precision float may have.
|
static int |
MIN_EXPONENT
Minimum exponent a normalized half-precision float may have.
|
static short |
MIN_NORMAL
Smallest positive normal value a half-precision float may have.
|
static short |
MIN_VALUE
Smallest positive non-zero value a half-precision float may have.
|
static short |
NaN
A Not-a-Number representation of a half-precision float.
|
static short |
NEGATIVE_INFINITY
Negative infinity of type half-precision float.
|
static short |
NEGATIVE_ZERO
Negative 0 of type half-precision float.
|
static short |
POSITIVE_INFINITY
Positive infinity of type half-precision float.
|
static short |
POSITIVE_ZERO
Positive 0 of type half-precision float.
|
static int |
SHIFTED_EXPONENT_MASK
The bitmask to AND a number shifted by
EXPONENT_SHIFT right, to obtain exponent bits. |
static int |
SIGN_MASK
The bitmask to AND a number with to obtain the sign bit.
|
static int |
SIGN_SHIFT
The offset to shift by to obtain the sign bit.
|
static int |
SIGNIFICAND_MASK
The bitmask to AND a number with to obtain significand bits.
|
static int |
SIZE
The number of bits used to represent a half-precision float value.
|
Modifier and Type | Method and Description |
---|---|
static short |
ceil(short h)
Returns the smallest half-precision float value toward negative infinity
greater than or equal to the specified half-precision float value.
|
static int |
compare(short x,
short y)
Compares the two specified half-precision float values.
|
static boolean |
equals(short x,
short y)
Returns true if the two half-precision float values are equal.
|
static short |
floor(short h)
Returns the largest half-precision float value toward positive infinity
less than or equal to the specified half-precision float value.
|
static boolean |
greater(short x,
short y)
Returns true if the first half-precision float value is greater (larger
toward positive infinity) than the second half-precision float value.
|
static boolean |
greaterEquals(short x,
short y)
Returns true if the first half-precision float value is greater (larger
toward positive infinity) than or equal to the second half-precision float
value.
|
static boolean |
isInfinite(short h)
Returns true if the specified half-precision float value represents
infinity, false otherwise.
|
static boolean |
isNaN(short h)
Returns true if the specified half-precision float value represents
a Not-a-Number, false otherwise.
|
static boolean |
isNormalized(short h)
Returns true if the specified half-precision float value is normalized
(does not have a subnormal representation).
|
static boolean |
less(short x,
short y)
Returns true if the first half-precision float value is less (smaller
toward negative infinity) than the second half-precision float value.
|
static boolean |
lessEquals(short x,
short y)
Returns true if the first half-precision float value is less (smaller
toward negative infinity) than or equal to the second half-precision
float value.
|
static short |
max(short x,
short y)
Returns the larger of two half-precision float values (the value closest
to positive infinity).
|
static short |
min(short x,
short y)
Returns the smaller of two half-precision float values (the value closest
to negative infinity).
|
static short |
rint(short h)
Returns the closest integral half-precision float value to the specified
half-precision float value.
|
static float |
toFloat(short h)
Converts the specified half-precision float value into a
single-precision float value.
|
static short |
toHalf(float f)
Converts the specified single-precision float value into a
half-precision float value.
|
static java.lang.String |
toHexString(short h)
Returns a hexadecimal string representation of the specified half-precision
float value.
|
static short |
trunc(short h)
Returns the truncated half-precision float value of the specified
half-precision float value.
|
public static final int SIZE
public static final short EPSILON
public static final int MAX_EXPONENT
public static final int MIN_EXPONENT
public static final short LOWEST_VALUE
public static final short MAX_VALUE
public static final short MIN_NORMAL
public static final short MIN_VALUE
public static final short NaN
public static final short NEGATIVE_INFINITY
public static final short NEGATIVE_ZERO
public static final short POSITIVE_INFINITY
public static final short POSITIVE_ZERO
public static final int SIGN_SHIFT
public static final int EXPONENT_SHIFT
public static final int SIGN_MASK
public static final int SHIFTED_EXPONENT_MASK
EXPONENT_SHIFT
right, to obtain exponent bits.public static final int SIGNIFICAND_MASK
public static final int EXPONENT_SIGNIFICAND_MASK
public static final int EXPONENT_BIAS
public static int compare(short x, short y)
Compares the two specified half-precision float values. The following conditions apply during the comparison:
NaN
is considered by this method to be equal to itself and greater
than all other half-precision float values (including #POSITIVE_INFINITY
)POSITIVE_ZERO
is considered by this method to be greater than
NEGATIVE_ZERO
.x
- The first half-precision float value to compare.y
- The second half-precision float value to compare0
if x
is numerically equal to y
, a
value less than 0
if x
is numerically less than y
,
and a value greater than 0
if x
is numerically greater
than y
public static short rint(short h)
h
- A half-precision float valuepublic static short ceil(short h)
h
- A half-precision float valuepublic static short floor(short h)
h
- A half-precision float valuepublic static short trunc(short h)
h
- A half-precision float valuepublic static short min(short x, short y)
NEGATIVE_ZERO
is smaller than POSITIVE_ZERO
x
- The first half-precision valuey
- The second half-precision valuepublic static short max(short x, short y)
POSITIVE_ZERO
is greater than NEGATIVE_ZERO
x
- The first half-precision valuey
- The second half-precision valuepublic static boolean less(short x, short y)
x
- The first half-precision valuey
- The second half-precision valuepublic static boolean lessEquals(short x, short y)
x
- The first half-precision valuey
- The second half-precision valuepublic static boolean greater(short x, short y)
x
- The first half-precision valuey
- The second half-precision valuepublic static boolean greaterEquals(short x, short y)
x
- The first half-precision valuey
- The second half-precision valuepublic static boolean equals(short x, short y)
POSITIVE_ZERO
and NEGATIVE_ZERO
are considered equal.x
- The first half-precision valuey
- The second half-precision valuepublic static boolean isInfinite(short h)
h
- A half-precision float valuepublic static boolean isNaN(short h)
h
- A half-precision float valuepublic static boolean isNormalized(short h)
POSITIVE_INFINITY
, NEGATIVE_INFINITY
,
POSITIVE_ZERO
, NEGATIVE_ZERO
, NaN or any subnormal
number, this method returns false.h
- A half-precision float valuepublic static float toFloat(short h)
Converts the specified half-precision float value into a single-precision float value. The following special cases are handled:
NaN
, the returned value is Float.NaN
POSITIVE_INFINITY
or
NEGATIVE_INFINITY
, the returned value is respectively
Float.POSITIVE_INFINITY
or Float.NEGATIVE_INFINITY
h
- The half-precision float value to convert to single-precisionpublic static short toHalf(float f)
Converts the specified single-precision float value into a half-precision float value. The following special cases are handled:
Float.isNaN(float)
), the returned
value is NaN
Float.POSITIVE_INFINITY
or
Float.NEGATIVE_INFINITY
, the returned value is respectively
POSITIVE_INFINITY
or NEGATIVE_INFINITY
POSITIVE_ZERO
or NEGATIVE_ZERO
MIN_VALUE
, the returned value
is flushed to POSITIVE_ZERO
or NEGATIVE_ZERO
MIN_NORMAL
, the returned value
is a denorm half-precision floatf
- The single-precision float value to convert to half-precisionpublic static java.lang.String toHexString(short h)
Returns a hexadecimal string representation of the specified half-precision
float value. If the value is a NaN, the result is "NaN"
,
otherwise the result follows this format:
'-'
"Infinity"
"0x0.0p0"
"0x1."
followed by its lowercase hexadecimal
representation. Trailing zeroes are removed unless all digits are 0, then
a single zero is used. The significand representation is followed by the
exponent, represented by "p"
, itself followed by a decimal
string of the unbiased exponent"0x0."
followed by its lowercase hexadecimal
representation. Trailing zeroes are removed unless all digits are 0, then
a single zero is used. The significand representation is followed by the
exponent, represented by "p-14"
h
- A half-precision float value