Data Layouts
------------

Data Layouts define how the multi-dimension tensor data is physically arranged in memory. 
This is relevant in the context of kernels related to convolution and pooling groups 
or other vision-specific layers. These kernels deal with multi-dimensional tensors which 
might be considered as images or a set of images. In this case, specific layout (the order of 
dimensions and its meaning) is important as it dictates the way calculations are performed. 
Other kernels are layout-agnostic, or imply non-vision meaning of dimensions. The table 
:ref:`t_layout_letter_desc` describes the letters used for the layout description.

.. tabularcolumns:: |\Y{0.2}|\Y{0.4}|

.. _t_layout_letter_desc:
.. table:: Layout Letter Description
   :align: center
   :widths: 30, 130 
   
   +------------+----------------------------+
   | **Letter** | **Description**            |
   +============+============================+
   |  H         | Height                     |
   +------------+----------------------------+
   |  W         | Width                      |
   +------------+----------------------------+
   |  C         | Number of (input) channels |
   +------------+----------------------------+
   |  N         | Number of filters, or      |
   |            | number of output channels  |
   +------------+----------------------------+   
..


.. note::

   In this context, the letter N is used for the number of output channels, and not for the 
   number of batches in batch processing.
..

MLI Data layout (HWCN)
^^^^^^^^^^^^^^^^^^^^^^

Layout-dependent MLI kernels use the HWC layout for tensors and HWCN layout for weights. 
The Height/Width/Channel layout is also referred as "Interleaved" or "Channel last". 
The smallest stride between dimension elements in memory is for C (channel or depth) 
followed by the width and then the height. The height is the least frequently changing 
index. 

.. admonition:: Example 
   :class: "admonition tip"

   :math:`in(32,16,3)` is a feature map with 32 rows (height), 16 columns (width), 
   and 3 channels.
..

In the HWCN layout, the smallest stride between dimension elements in memory is for N (filters or 
output channel), followed by (input) channels, width, and finally height, with the latter being 
the least frequently-changing index. 

.. admonition:: Example 
   :class: "admonition tip"
   
   :math:`weights(4,3,2,1)` in this case is 1 filter of 4 rows, 3 columns, and 2 (input) channels.
..

Refer to Figure :ref:`f_hwcn_conv2d` and Table :ref:`t_hwcn_spec` for details. A 
transpose function can also be used to convert one layout into another layout.

.. _f_hwcn_conv2d:  
.. figure::  ../images/app_HWCN_conv2d.png
   :align: center
   :alt: Applicability of HWCN Layout for 2D Convolution Calculation

   Applicability of HWCN Layout for 2D Convolution Calculation
  

Description
"""""""""""

.. tabularcolumns:: |\Y{0.2}|\Y{0.4}|

.. _t_hwcn_spec:
.. table:: The HWCN Layout
   :align: center
     
   +--------------------+---------------------------------------------+
   | **Aspect**         | **Comment**                                 |
   +====================+=============================================+
   | Input data layout  | HWC (Height; Width; Channel)                | 
   +--------------------+---------------------------------------------+
   | Weights layout     | HWCN (Height; Width; Channel; Filter)       |
   +--------------------+---------------------------------------------+
   | Output data layout | Number of (output) channels                 |
   +--------------------+---------------------------------------------+
   | Vectorization      | Vectorization across a depth dimension of   |
   |                    | output is beneficial. Depth of output       |
   |                    | should be bigger than vector size.          |
   |                    | Ideally, a multiple of it.                  |
   +--------------------+---------------------------------------------+   
..

Benefits of the HWCN layout:

  - Typically, in the beginning of a graph, NN data is wider than it is deep.  However, after just 
    few layers it becomes deeper while the Height/Width dimensions become small. In this case, 
    vectorization across depth becomes more beneficial. As statistics of implemented graphs show, 
    most implemented layers fit better to "vectorization across depth" strategy. 
    
  - This layout is more stable to convolution configuration parameters such as stride padding and dilation 
    rate as it typically does not touch depth dimension.
    
  - Slicing of variable tensors is more DMA-friendly as it implies longer linear DMA series and less 
    types of jumping. 

Changing the Data Layout of Three-Dimensional Tensors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Inputs and outputs of vision layers, such as convolution or pooling, are typically three-dimensional tensors 
(also referred to as feature maps) that reflect the value of various features (channels) across 
image-like input (height and width). The two most frequently used layouts are HWC and CHW, depicted 
in :ref:`f_var_tnsr_data`. The MLI data layout is HWC, but there is a conversion (transpose) function 
provided to change from one data layout into another layout.


.. _f_var_tnsr_data:  
.. figure::  ../images/var_tnsr_data_layouts.png
   :align: center
   :alt: Variable Tensor Data Layouts: Visualization of Placing in Memory
   
   Variable Tensor Data Layouts: Visualization of Placing in Memory