MIOpen is AMD’s counter-part of NVIDIA cuDNN. In this article, we’ll go through the implementation details of MIOpen for some typical APIs.

## Solution finding for a given problem

Solution finding is the inevitable step to run the MIOpen routine.
The diagram above demonstrates the underlying process when user calls `miopenFindSolution`

.
We’ll show MIOpen basic conceptions and procedures based on this.

MIOpen users will call `miopenFindSolutions`

to query the solutions sorted by either kernel duration or workspace size.
Inside MIOpen, how does MIOpen describe a problem, and how does it fetch and determine which solutions are “applicable” for a given problem?

### Problem description

There are different problem types (called primitive) supported in MIOpen.
Currently, MIOpen only support convolution to find the solution (other operators have their dedicated APIs, e.g., pooling calls the kernel directly with `miopenPoolingForward`

).

MIOpen uses the abstract type `miopen::Problem`

to store problem description, which is convertible to `miopen::conv::ProblemDescription`

.
There contains the convolution parameters (direction, strides, paddings, dilations) used to determine the solutions (and the kernels) which are capable of solving this problem.

### Solver and its source

The solution finding routine finally locates at one of the member functions in `miopen::ConvolutionDescriptor`

, based of the problem type and direction:

`FindConvFwdAlgorithm`

for the activation, if forward and not transposed, or backward and transposed`FindConvBwdDataAlgorithm`

for the backward gradient, if backward not transposed, or forward and transposed`FindConvBwdWeightsAlgorithm`

for the weight updating gradient, if backward and the problem is for weight updating

In MIOpen, there are two approaches to evaluate which solution is *most suitable* (e.g., fastest, or requires minimal workspace size) for a given problem:

- from
*FindDB*, which records all possible kernels being able to solve a specific problem, and their duration - list all possible kernels, and run on the real hardware to measure their performance (i.e., pre-run)

In the aforementioned three `FindConv*`

functions, MIOpen will first try to load the optimal solution from the FindDB, based on the queried records.
Then the library is attempting to “compile” the solution, if it’s not in the maintained **invoker cache**.
The compile procedure comprises of 3 steps:

- load the “program” (load ELF
`.so`

shared library via`hipModuleLoad`

), which is the most time-consuming - parse the function from the program (get the kernel object from HIP module via
`hipModuleGetFunction`

) - register the invoker, i.e., put it into the invoker cache maintained by MIOpen, which is indexed by pair
`(network_config, solver_id)`

or`(network_config, algorithm)`

, where the network config is the collection of operator parameters (strides, paddings, etc.)

If no record can be found in FindDB, MIOpen falls back to **registered** solutions which are linked against to the `libMIOpen.so`

dynamic library during building.
MIOpen kernels are implemented based on some abstract design **algorithms**, such as the convolution operator can be materialized via:

`Direct`

`Winograd`

`ImplicitGemm`

`FFT`

A solver is a series of pre-defined kernels to implement one of these algorithms to cope with the given problem. Once MIOpen collects all applicable solvers for a problem, it will firstly compile the kernel codes (if any, haven’t been compiled during build time). And then run them on the target GPU to evaluate which one is optimal.

`Invoker`

and `InvokerFactory`

Each `src/solver/*.cpp`

defines a `Solver`

struct which should return a `ConvSolution`

in its member function `GetSolution`

.
It specifies the kernel source file (located in `src/kernels/*`

), macros and JIT compile options determined by the problem, as well as other options like tunable settings.

`Invoker`

is an invokable functor to substantially start the kernel, and `InvokerFactory`

is the prepare step to specify their calling order, make sure the parameter format, etc.
Each solver constructs its own `InvokerFactory`

.

`miopenRunSolution`

After finding the optimal solution, MIOpen users will call `miopenRunSolution`

to invoke the corresponding kernels.
If the invoker has been cached with handle previously (during the finding procedure), MIOpen can directly call the kernel.
Otherwise it has to repeat the aforementioned steps (load and then register).

## More than convolution: other operations

As a DL primitive library, MIOpen also offers operators beyond convolution: pooling, activation, layer normalization, etc. Nevertheless, these types of operations do not require MIOpen to invoke the solution finding routine to determine the “optimal” solution in advance. Next we’ll try to demystify this doubt.

Todo

TODO…