Permute Prototype and Function List¶

Description¶

The kernel permutes dimensions of input tensor according to provided order. In other words, it transposes input tensors.

Functions¶

The functions which implement Permute have the following prototype:

mli_status mli_krn_permute_<data_format>(
   const mli_tensor *in,
   const mli_permute_cfg *cfg,
   mli_tensor *out);

where data_format is one of the data formats listed in Table MLI Data Formats and the function parameters are shown in the following table:

Permute Function Parameters¶
Parameter	Type	Description
`in`	`mli_tensor *`	[IN] Pointer to constant input tensor
`cfg`	`mli_permute_cfg *`	[IN] Pointer to Permute parameters structure
`out`	`mli_tensor *`	[OUT] Pointer to output tensor. Result is stored here

mli_permute_cfg structure is defined as:

typedef struct {
   uint8_t perm_dim[MLI_MAX_RANK];
}  mli_permute_cfg;

mli_permute_cfg Structure Field Description¶
Field name	Type	Description
`perm_dim`	`uint8_t[]`	A permutation array. Dimensions order for output tensor.

The new order of dimensions is given by perm_dim array of kernel configuration structure. The out tensor’s dimension idx corresponds to the dimension of the in tensor with perm_dim[idx]. The tensor’s data is reordered according to the new shape.

For example, if input tensors have the shape (2, 4, 8) and perm_dim order is (2, 0, 1) then output tensor is of the shape (8, 2, 4). This transpose reflects changing the feature map layout from HWC to CHW.

Here is a list of all available permute functions:

List of Available Permute Functions¶
Function Name	Details
`mli_krn_permute_sa8`	All tensors data format: sa8
`mli_krn_permute_fx8`	All tensors data format: fx8
`mli_krn_permute_fx16`	All tensors data format: fx16

Conditions¶

Ensure that you satisfy the following general conditions before calling the function:

in and out tensors must be valid (see mli_tensor Structure Field Descriptions) and satisfy data requirements of the selected version of the kernel.

Shape of out tensor must reflect already permuted shape of in tensor.

in and out tensors must not point to overlapped memory regions.

Only first N (equal to rank of in tensor) values in permutation order array are considered by kernel. All of them must be unique, non-negative and less than the rank of the in tensor.

Ensure that you satisfy the platform-specific conditions in addition to those listed above (see the Platform Specific Details chapter).

Result¶

These functions modify:

Memory pointed by out.data.mem field.

el_params field of out tensor.

It is assumed that all the other fields and structures are properly populated to be used in calculations and are not modified by the kernel.

For fx8 and fx16 el_params field of in tensor is copyed to the out tensor. The same is applied for sa8 versions of kernel in case of per-tensor quantization (in.el_params.sa.dim < 0)

For sa8 versions of kernel, and in case of per-axis quantization, the el_params field of the out tensor is filled by the kernel using the quantization parameters of the in tensor. The following fields are affected:

out.el_params.sa.zero_point.mem.pi16 and the related capacity field

out.el_params.sa.scale.mem.pi16 and the related capacity field

out.el_params.sa.scale_frac_bits.mem.pi8 and the related capacity field

Depending on the state of the preceding pointer fields, ensure that you choose only one of the following options to initialize all the fields in a consistent way:

If you initialize the pointers with a nullptr, then the corresponding fields from the in tensor are copied to the out tensor. No copy of quantization parameters itself is performed.

If you initialize the pointers with the corresponding fields from the in tensor, then no action is applied.

If you initialize the pointers and capacity fields with pre-allocated memory and its capacity, then a copy of quantization parameters itself is performed. Capacity of allocated memory must be big enough to keep related data from input tensor.

Depending on the debug level (see section Error Codes) this function performs a parameter check and returns the result as an mli_status code as described in section Kernel Specific Configuration Structures.