.. _mli_kernels: MLI Kernels (Operators) ======================= This chapter introduces into a general concept of ML algorithms (kernels) included in MLI 2.0. Each kernel is implemented as a separate function, and inputs and outputs are represented by the tensor structure defined in section :ref:`mli_tens_data_struct`. A kernel can have multiple specializations, each implemented as a separate function. As a result, the neural network graph implementation is represented as series of function calls, which can either be constructed manually by the user or via some automated front-end tool. The following groups of NN kernels are defined: - :ref:`chap_conv` - :ref:`chap_rec_full` - :ref:`chap_pool` - :ref:`chap_diverse_kern` - :ref:`chap_transform` - :ref:`chap_element_wise` .. _func_names_special: Function Names and Specializations ---------------------------------- All function names of the kernels are constructed in a similar way. To make it easier to navigate through the list of API functions, all function names are built using the following syntax: .. code:: c mli___[layout_][_specializations]([input_tensors],[cfg],[output_tensors]); .. An explanation of each of the preceding elements which make up a name is provided in Table :ref:`t_func_name_conv_fields`. The function arguments start with zero or more input tensors, followed by a kernel specific configuration struct, followed by the output tensors. All the function parameters are grouped inside the config structure. .. tabularcolumns:: |\Y{0.3}|\Y{0.3}|\Y{0.4}| .. _t_func_name_conv_fields: .. table:: Function Naming Convention Fields :align: center :class: longtable +------------------+-----------------+--------------------------------------+ | **Field Name** | **Examples** | **Description** | +==================+=================+======================================+ | Group | krn | krn for compute kernels | | | | | | | hlp | hlp for helper functions | | | | | | | mov | mov for data move kernels | | | | | | | usr | usr for user-defined kernels. | | | | | +------------------+-----------------+--------------------------------------+ | Functions | conv2d | Describes the basic functionality. | | | | | | | fully_connected | Full list of supported function | | | ... | names is described later | +------------------+-----------------+--------------------------------------+ | layout | chw | Optional description of the layout | | | | | | | hwcn | of the input, only relevant for some | | | | | | | nhwc | functions | +------------------+-----------------+--------------------------------------+ | dataformats | fx16 | Specifies the tensor data formats. | | | | | | | sa8_sa8_sa32 | In case of multiple input tensors | | | | | | | sa8 | with different data formats, the | | | | format of each tensor is specified | | | | in the same order as the function | | | | arguments. (for details see | | | | :ref:`t_mli_el_p_union`). | +------------------+-----------------+--------------------------------------+ .. The naming convention for the data formats is as follows and in :ref:`t_data_fmt_fields`: .. code:: c .. .. tabularcolumns:: |\Y{0.2}|\Y{0.2}|\Y{0.5}| .. _t_data_fmt_fields: .. table:: Data Format Naming Convention Fields :align: center +------------------+------------------+----------------------------------------------+ | **Field Name** | **Examples** | **Description** | +==================+==================+==============================================+ | typename | **fx** | Specifies which quantization schema is used: | | | | | | | **sa** | - fx for Fixed point | | | | | | | **fp** | - sa for Signed Asymmetric | | | | | | | | - fp for Floating Point | +------------------+------------------+----------------------------------------------+ | containersize | 1, 4, 8, 16, 32 | Container size in bits of each individual | | | | element. | +------------------+------------------+----------------------------------------------+ .. The following convention is applied to the layout field: - If MLI kernel implies using only three-dimensional variable tensors as input/output, function name should reflect layout of input and output tensors. Layout of input and output must be the same. - If MLI kernel implies using four-dimensional weights tensor in addition to three-dimensional input/output tensors, function name should reflect layout of weights tensor. A Note on Slicing ----------------- The kernels described in the following sections of this document have no notion of slicing (also referred to as tiling). Instead, the responsibility of slicing is left to the caller, which can be either application code, some higher-level API or graph-compiler generated. For cores with any sort of closely-coupled memory (CCMs), the kernels assume that all the input tensors are in CCM. The output tensor can be either in system memory or in CCM. Slicing is required in case there is not enough space in CCM to fit the complete input tensors. In this case, the caller can copy a 'slice' of the input tensor into CCM, and the kernel produces a slice of the output tensor. These output slices can be combined into a full tensor. Because the tensor data does not have to be contiguous in memory, it is possible to create a (slice) tensor that points to a subset of the data of a larger tensor. For instance, if the output tensor is in system memory, each invocation of the kernel that computes a slice can write its output directly into the correct (slice) of the output tensor. This eliminates the need for an extra concatenation copy pass. The slicing concept is illustrated in Figure :ref:`f_slicing_concept`. .. _f_slicing_concept: .. figure:: ../images/slicing_concept.png :align: center Slicing Concept .. If the tensors don't fit into CCM, and there is no data cache, the data move functions can be used to copy full tensors or slices of tensors. (see Chapter :ref:`data_mvmt` ). Slicing with some kernels requires updating the kernel parameters when passing each slice. The number of elements to the next dimension is stored in the ``mem_stride`` field (as described in :ref:`mli_tnsr_struc`). In case there is no slicing or concatenation, the ``mem_stride`` field is populated in alignement with the shape values to reflect the offsets in the data buffers for each dimension.