You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
151 lines
6.2 KiB
151 lines
6.2 KiB
7 months ago
|
# Tensor Operator Set Architecture (TOSA) Dialect
|
||
|
|
||
|
[TOC]
|
||
|
|
||
|
## Rationale
|
||
|
|
||
|
The MLIR TOSA dialect implements the [TOSA
|
||
|
specification](https://developer.mlplatform.org/w/tosa/). This document
|
||
|
describes the decision process for how TOSA expresses operators in
|
||
|
high level dialects.
|
||
|
|
||
|
TOSA was developed after parallel efforts to rationalize the top-down picture
|
||
|
from multiple high-level frameworks, as well as a bottom-up view of different
|
||
|
hardware target concerns (CPU, GPU and NPU), and reflects a set of choices
|
||
|
that attempt to manage both sets of requirements.
|
||
|
|
||
|
## TOSA and Tensor Level Expressiveness
|
||
|
|
||
|
TOSA endeavors to provide an operator set that tries to fulfil the following
|
||
|
expressivenes goals at the *tensor level of abstraction* :
|
||
|
|
||
|
### Complete
|
||
|
|
||
|
This is driven by the top-down perspective, needing to express as much of
|
||
|
multiple high level frameworks fully in TOSA, as possible. This was originally
|
||
|
done from an operator frequency analysis done upon dozens of high level
|
||
|
networks in different frameworks, to select the most frequently occuring ones
|
||
|
and establish a common set of tensor-level operators that could express them.
|
||
|
|
||
|
TOSA categorizes its operator set into classes and attempts to address major
|
||
|
functional operations at the tensor level, including compute, reduction,
|
||
|
elementwise transformations, comparison and control flow.
|
||
|
|
||
|
### Minimal
|
||
|
|
||
|
This takes the bottom-up approach - keep the TOSA operator set minimal in
|
||
|
order to bound the design of hardware, operator kernels, code generation
|
||
|
strategies and associated considerations that effect the executability of TOSA
|
||
|
content.
|
||
|
|
||
|
In this regard TOSA seeks to avoid creating compound operators, instead
|
||
|
leaving it to compiler backend to fuse multiple TOSA ops if required. This
|
||
|
choice also benefits the numerical precision goal, since it is easier to fuse the
|
||
|
numerical functionality of successive operators, than to split the numerical
|
||
|
functionality of a compound operator.
|
||
|
|
||
|
### Numerical Precision
|
||
|
|
||
|
TOSA began as a means to address operator-level numerical precision for
|
||
|
code generation and hardware development. It therefore incorporates precision
|
||
|
detail into the operator set.
|
||
|
|
||
|
In this regard, TOSA operators are best understood as a combination of the visible
|
||
|
quantization information embedded within an operation, together with the
|
||
|
functional information about how that information is used, as described in the
|
||
|
specification of the operation.
|
||
|
|
||
|
## TOSA Operator Rationale
|
||
|
|
||
|
The general basis of selection of the operator set that constitutes TOSA is
|
||
|
described in the TOSA specification document under Section 1.3 Operator
|
||
|
Selection. Explanation of the thinking behind some operators is listed here:
|
||
|
|
||
|
### IDENTITYN
|
||
|
|
||
|
tosa.IDENTITYN is used to form a list of Operator results during
|
||
|
lowering of operations such as tf.Split from a sequence of tosa.SLICE
|
||
|
ops. If there are alternate ways to express this lowering without the
|
||
|
tosa.IDENTITYN op, the tosa.IDENTITYN op could be removed from TOSA.
|
||
|
|
||
|
```
|
||
|
Value lower_split_op(Value %value, size_t axis, size_t
|
||
|
num_split) { Value %output[]
|
||
|
|
||
|
size_t slice_size = %value.shape[axis] / num_split
|
||
|
|
||
|
for (int i = 0; i < num_split; i++) {
|
||
|
vector <size_t> begin_vals, size_vals
|
||
|
|
||
|
for (int j = 0; j < %value.rank; j++) {
|
||
|
if (j == axis) {
|
||
|
begin_vals.push_back(slice_size * i)
|
||
|
size_vals.push_back(slice_size)
|
||
|
} else {
|
||
|
begin_vals.push_back(0)
|
||
|
size_vals.push_bac(%value.shape[j])
|
||
|
}
|
||
|
|
||
|
%output[i] = tosa.SLICE(%value) {start=begin_vals, size=size_vals} (tensor<%value.type>) -> tensor<size_vals, %value.dtype>
|
||
|
}
|
||
|
|
||
|
}
|
||
|
|
||
|
%output_list = tosa.IDENTITYN(%output) (tensor<%output:*.type>) -> tensor<%output_list:*.type>
|
||
|
return %output_list
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### COND\_IF and WHILE\_LOOP
|
||
|
|
||
|
Several neural networks express conditional control flow at the tensor level.
|
||
|
A survey of multiple high level frameworks indicated that conditional if and
|
||
|
a loop construct are common in all major frameworks, with some variation.
|
||
|
Since TOSA endeavors to be complete in expressing tensor level functionality
|
||
|
including control flow, it implements these constructs.
|
||
|
|
||
|
The COND\_IF and WHILE\_LOOP operators implement such structured control
|
||
|
flow forms and should be lowerable to corresponding ops in the scf dialect.
|
||
|
Since the dialect seeks to remain isomorphic with an external, serialized form,
|
||
|
the decision was to keep these ops in the dialect (as opposed to deferring
|
||
|
completely to scf), and this may be re-evaluated if this turns out to not yield
|
||
|
the expected value.
|
||
|
|
||
|
## Using TOSA In A Compiler
|
||
|
|
||
|
The TOSA specification describes each operator in functional detail. It is
|
||
|
expected that compilers that use TOSA will use its builders to construct the
|
||
|
operators so that the quantization information for the operator is correctly
|
||
|
generated.
|
||
|
|
||
|
The functional steps described in the pseudocode of the specification enables
|
||
|
the construction of code generation for that operation, or decisions on the
|
||
|
design of underlying hardware. The functional pseudocode also describes
|
||
|
how the quantization parameters are utilized within the operation.
|
||
|
|
||
|
### Quantization Parameters in Ops vs Tensors
|
||
|
|
||
|
TOSA uses the quantization parameters embedded in the input and output
|
||
|
tensors to construct the quantization attributes that sit within the operator.
|
||
|
Once these attributes are constructed, the quantization information within
|
||
|
the tensors are no longer necessary for code generation.
|
||
|
|
||
|
This enables the tensors to be subsequently interpreted simply as contiguous
|
||
|
buffers containing raw data, with no 'meta information' in the form of the
|
||
|
quantization_type. Precision related manipulation of the input or output are
|
||
|
instead described by the operator itself which describes, for example, when
|
||
|
the zero point is applied, or when the scale multiplication is done.
|
||
|
|
||
|
However, TOSA does *not* eliminate the existing MLIR QuantOps quantization
|
||
|
type information within the tensors; this leaves the choice of how to handle
|
||
|
quantization information, to later backend code generation steps.
|
||
|
|
||
|
Maintaining the ability to overlap these different representations of
|
||
|
quantization parameters (i.e. tensor-carried vs op-carried) is an important
|
||
|
capability when considering progressive lowering between uses that expect one
|
||
|
scheme vs the other.
|
||
|
|
||
|
## Operation definitions
|
||
|
|
||
|
[include "Dialects/TosaOps.md"]
|