RNN Dense Prototype and Function List¶

Description¶

This kernel implements a single basic fully connected (or dense) calculation typically used in the majority of RNN architectures:

\[y_{i} = b_{i} + \sum_{j}^{}{xa}_{j}*{Wa}_{i,j} + \ \sum_{j}^{}{xb}_{j}*{Wb}_{i,j} + \ldots\ \sum_{j}^{}{xn}_{j}*{Wn}_{i,j}\]

Where:

\({xa}_{j}\), \({xb}_{j}\), \({xn}_{j}\) - \(j_{\text{th}}\) value in one of the input tensors. These input tensors might be current input, previous output, cell state or any other tensor depending on RNN Cell architecture

\({Wa}_{i,j}\), \({Wb}_{i,j}\), \({Wc}_{i,j}\) - weight of \(j_{th}\ \)input element for \(i_{th}\) neuron in one of input weights tensors. These weights tensors might be input-to-a-gate weights, output-to-a-gate weights or any other tensor depending on RNN Cell architecture

\(y_{i}\) - output of \(i_{th}\) neuron ( \(i_{th}\) value in output tensor).

\(b_{i}\) - bias for \(i_{th}\) neuron

This is a MAC-based kernel which implies accumulation. See Quantization: Influence of Accumulator Bit Depth for more information on related quantization aspects. The number of accumulation series is equal to total number of values in all inputs.

Functions¶

Kernels which implement an RNN Dense functionality have the following prototype:

mli_status mli_krn_rnn_dense_<data_format>(
   const mli_tensor **inputs,
   const mli_tensor **weights,
   const mli_tensor *bias,
   const mli_rnn_dense_cfg *cfg,
   mli_tensor *out);

where data_format is one of the data formats listed in Table MLI Data Formats and the function parameters are shown in the following table:

RNN Dense Function Parameters¶
Parameter	Type	Description
`inputs`	`mli_tensor **`	[IN] Pointer to the array of pointers to constant input tensors
`weights`	`mli_tensor **`	[IN] Pointer to the array of pointers to constant weights tensors
`bias`	`mli_tensor *`	[IN] Pointer to constant bias tensor
`cfg`	`mli_rnn_dense_cfg *`	[IN] Pointer to RNN dense parameters structure
`out`	`mli_tensor *`	[IN \| OUT] Pointer to output tensor. Result is stored here.

mli_rnn_dense_cfg is defined as:

typedef struct {
     uint8_t inputs_num;
 } mli_rnn_dense_cfg;

mli_rnn_dense_cfg Structure Field Description¶
Field Name	Type	Description
`inputs_num`	`uint8_t`	Number of input tensors (number of pointers in inputs array). Also, the number of weights tensors (number of pointers in weights array), as each input is specified with its own weights tensor. Maximum number of tensors in the array is specified by MLI_RNN_MAX_INPUTS define.

Here is a list of all available RNN Dense functions:

List of Available RNN Dense Functions¶
Function Name	Details
`mli_krn_rnn_dense_sa8_sa8_sa32`	In/out/weights data format: sa8 Bias data format: sa32
`mli_krn_rnn_dense_fx16`	All tensors data format: fx16
`mli_krn_rnn_dense_fx16_fx8_fx8`	In/out data format: fx16 Weights/Bias data format: fx8

Conditions¶

Ensure that you satisfy the following general conditions before calling the listed functions:

bias, out, all tensors in inputs array and all tensors in weights array must be valid (see mli_tensor Structure Field Descriptions).

The number of tensors in inputs and weights arrays must be the same and must not exceed MLI_RNN_MAX_INPUTS value.

Shapes of bias, out, all tensors in inputs array and all tensors in weights array must be compatible, which implies the following requirements:

Each tensor in inputs array might be of any shape and rank. Only total number of elements is considered.

The \(i_{th}\) tensor in weights array corresponds to the \(i_{th}\) tensor in inputs array, which means that weights[i] must be a two-dimensional tensor (rank==2) of shape \((N_i, M)\), where \(N_i\) is the total number of elements in the inputs[i] tensor and \(M\) is the total number of neurons and is equal to output length.

bias must be a one-dimensional tensor (rank==1). Its length must be equal to \(M\) (number of filters and is equal to output length) of any weights tensor.

out must be a one-dimensional tensor (rank==1). Its length must be equal to \(M\) (number of filters and is equal to output length) of any weights tensor.

Any tensor from inputs array and out tensor must not point to overlapped memory regions.

mem_stride must satisfy the following statements:

For out tensor and all tensors in inputs array memstride must reflect the shape, e.g memory of these tensors must be contiguous.

For all tensors in weights and bias arrays - memstride of the innermost dimension must be equal to 1.

For fx16 and fx16_fx8_fx8 versions of kernel, in addition to the general conditions, ensure that you satisfy the following quantization conditions before calling the function:

The number of frac_bits in the bias tensor must not exceed the sum of frac_bits in the inputs[0] and weights[0] tensors.

The number of frac_bits in the out tensor must not exceed the sum of frac_bits in the any pair of related tensors in inputs and weights arrays.

For sa8_sa8_sa32 versions of kernel, in addition to the general conditions, ensure that you satisfy the following quantization conditions before calling the function:

bias, out, all the tensors in inputs array, and all tensors in weights array must be quantized on the tensor level. This implies that each tensor contains a single scale factor and a single zero offset.

Zero offset of each tensor in inputs and out tensor must be within [-128, 127] range.

bias and all tensors in weights array must be symmetric. This implies that both tensors contain single zero offset equal to 0.

The scale factor of bias tensor must be equal to the multiplication of the scale factor of the first input and the first weights tensors in corresponding arrays (that is, \(bias.scale = inputs[0].scale * weights[0].scale\)). See the example for the similar condition in the Convolution 2D Prototype and Function List.

Ensure that you satisfy the platform-specific conditions in addition to those listed above (see the Platform Specific Details chapter).

Result¶

These functions only modify the memory pointed by out.data.mem field. It is assumed that all the other fields of out tensor are properly populated to be used in calculations and are not modified by the kernel.

Depending on the debug level (see section Error Codes), this function performs a parameter check and returns the result as an mli_status code as described in section Kernel Specific Configuration Structures.