操作实例
==================================================================================

.. _obtain_sample:

获取Sample
------------------------------------------------------------------------------------

Lyngor随产品提供示例代码，存储路径： *lyngor_sample.zip*

.. note:: 其中，存储路径与安装包路径同级，如无法获取请联系售后 :doc:`../../_common/9_fae_support` 。

.. _sample_code:

示例代码说明
------------------------------------------------------------------------------------

本章节操作步骤相关示例代码包括：

- 示例A1：lyngor_sample/basic/add_argmax.py
- 示例B：lyngor_sample/tensorflow/inference/simplenet/inference_simplenet.py
- 示例C：lyngor_sample/quant/inference_resnet50_quant.py
- 示例D：lyngor_sample/pytorch/resnet/inference_resnet50.py
- 示例E：lyngor_sample/train/finetune_ResNet50_cats_dogs.py

基本算子实例
------------------------------------------------------------------------------------

基本算子实例展示了用户通过Lyngor提供API进行构造计算图，到编译计算图获得推理引擎，
并运行得到结果的全流程。用户在后续的使用过程中，除了第一部分的组图略有差异，
后续过程基本完全一致。

1. 使用lyn.nn和lyn.math等内置库，手动建立算子图。
2. 通过lyn.Builder对算子图进行编译得到推理引擎。
3. 给推理引擎送输入数据，运行引擎后得到输出数据。

示例：完整代码参见 :ref:`sample_code` 的【示例A1】

::

    import numpy as np
    import lyngor as lyn

    #1. 创建一个待加速的计算图（或对训练好的模型进行转换得到计算图）
    shape = (1,1,10,2)
    x1 = lyn.var("in_x1", shape=shape, dtype="float16")
    x2 = lyn.var("in_x2", shape=shape, dtype="float16")
    
    # 支持+-\*/运算符
    # 支持多输入，多输出
    #y = lyn.math.multiply(lyn.math.subtract(x1,x2), lyn.math.add(x1,x2))
    y = (x1-x2)*(x1+x2)
    z = lyn.math.reduce_max(x1*x2, axis=2)
    graph = lyn.gen_graph([x1,x2], [y,z])
    
    #2. 创建一个Builder来编译计算图，并保存推理引擎
    builder = lyn.Builder(target='apu')
    out_path = builder.build(graph, out_path='./tmp_net/')
    
    #3. 直接Load即可得到推理引擎
    r_engine = lyn.load(path=out_path + '/Net_0/', device=0)
    
    # 4.给引擎提供输入数据，执行引擎后，读取引擎输出。
    vx1 = np.random.random(shape).astype(np.float16)
    vx2 = np.random.random(shape).astype(np.float16)
    
    # 输入名字和x1，x2的字符串名字相同
    r_engine.run(in_x1=vx1, in_x2=vx2)
    result = r_engine.get_output()
    
    # 输出顺序和get_mod设置的输出顺序一致
    print(result[0])
    print(result[1])
    print(result[0].shape)
    print(result[1].shape)

.. _network_model_inference_example:

网络模型推理实例
------------------------------------------------------------------------------------

网络模型推理实例，以Tensorflow为例，说明导入第三方框架模型生成推理引擎，
并得到运行结果的完整过程。同时提供其他第三方框架的网络模型推理示例及模型文件。

1. 使用lyn.DLModel可直接将不同框架的模型转换为算子图。
2. 使用add_preprocess和add_postprocess方法可以在原有模型上添加前处理和后处理过程。
3. 通过lyn.Builder对算子图进行编译，得到推理引擎。

    .. note:: 单输入输出模型时，DLModel().load()无需提供inputs_dict和outputs，多输入多输出时需要指定。

4. 给推理引擎送输入数据，运行后得到输出数据。

示例：完整代码参见 :ref:`sample_code` 的【示例B】

::

    import numpy as np
    import tensorflow as tf
    import cv2

    # 导入Lyngor库
    import lyngor as lyn

    # 定义常数变量
    CLASS_NUM = 10
    CLASS = {0: 'airplane', 1: 'automobile', 2: 'bird', 3: 'cat', 4: 'deer', 5: 'dog', 6: 'frog', 7: 'horse', 8: 'ship', 9: 'truck'}
    pb_file = './simplenet.pb'
    target = 'apu'
    dtype = 'float32'

    #1. 创建一个待加速的计算图（或对训练好的模型进行转换得到计算图）
    model = lyn.DLModel()
    model.load(pb_file, 'Tensorflow', {'input_1':(1,32,32,3)}, ['dense_1/Softmax'])

    #2. 创建一个Builder来编译计算图，保存并加载runtime引擎
    offline_builder = lyn.Builder(target=target)
    r_engine = lyn.load(offline_builder.build(model.graph, model.param) + '/Net_0/')

    #3. 给引擎提供输入数据，执行引擎后，读取引擎输出。
    for filename in ["airplane.jpg", "bird.jpg", "boat.jpg"]:
        data = cv2.imread(filename)
        data4d = data[np.newaxis, :]
        r_engine.run(data_format='numpy', **{"input_1":data4d.astype(dtype)/255.})
        result = r_engine.get_output(data_format='numpy')
        print(CLASS[np.argmax(result[0])])

量化校准实例
------------------------------------------------------------------------------------

量化实例展示了用户如何生成量化数据集，并调用Lyngor提供的量化接口如何生成量化配置，
并传递给推理引擎。一般用于用户对模型体积和推理速度有要求前提下调用量化接口在保证
精度不受影响的前提下来提升运行速度，减少模型体积。

量化校准流程与网络模型推理实例的流程基本相同，只是在第2步build前需要准备一个校准
数据集，然后用校准数据集预先生成一个量化配置。为方便用户清晰的了解量化需进行的具
体操作，避免歧义，本实例中仅说明了量化这一步骤中需进行的相关操作，并未描述生成计
算图、以及运行推理引擎获取结果等过程。

示例代码：

::

    #2. 创建一个Builder来编译计算图，产生runtime引擎
    offline_builder = lyn.Builder(target='apu')

    # 为量化准备校准数据集（字典的列表）
    dataset = []
    for home, dirs, files in os.walk('./qimages/'):
        for filename in files:
            data = cv2.imread("./qimages/"+filename)
            data4d = data[np.newaxis, :]
            dataset.append({'input_1':data4d.astype(dtype)/255.})

    # 生成量化配置
    qconf = lyn.qconf(dataset, wscale='max')
    sim_r_engine = offline_builder.build(model.graph, model.param,
    qconf=qconf)

.. _pytorch_sample:

Pytorch模型编译推理实例
------------------------------------------------------------------------------------

Pytorch网络模型推理实例，导入Pytorch框架模型通过lyngor API生成推理引擎，并得到运行
结果的完整过程。同时提供Pytorch的网络模型推理示例及模型文件。

1. 使用lyn.DLModel可直接将不同框架的模型转换为算子图。
2. 通过lyn.Builder对算子图进行编译，得到推理引擎。

    .. note:: 单输入输出模型时，DLModel().load()无需提供inputs_dict和outputs，多输入多输出时需要指定。

3. 使用lyn.load加载推理引擎。
4. 给推理引擎送输入数据，运行后得到输出数据。

示例：完整代码参见 :ref:`sample_code` 的【示例E】

::

    #1. 定义常数变量
    data_path = '../../data/resnet/'
    CLASS = load_classname(filename = data_path + "imagenet_class.txt")
    model_file = '../../models/pytorch/resnet50_v2.pth'
    target = 'apu'
    dtype = 'float32'
    inputs_dict={'data':(1,3,224,224)}
    path='tmp_net'

    #2. 加载模型
    model = lyn.DLModel()
    model.load(model_file, model_type='Pytorch', inputs_dict=inputs_dict)

    #3. 创建一个Builder来编译计算图，产生runtime引擎
    offline_builder = lyn.Builder(target='apu', is_map=True)
    offline_builder.build(model.graph, model.params, out_path=path, build_mode='auto')

    #4. 获取引擎
    r_engine = lyn.load(path=path+'Net_0', device=0)

    #5. 给引擎提供输入数据，执行引擎后，读取引擎输出
    for filename in ["test.png", "003.JPEG", "006.JPEG", "009.JPEG"]:
        data = cv2.imread(data_path + filename)
        from PIL import Image
        PIL_image = Image.fromarray(data)
        image = transf(PIL_image)
        image_extend = np.expand_dims(image, axis=0)
        r_engine.run(data=image_extend.astype(dtype))
        result = r_engine.get_output()
        print(CLASS[int(np.argmax(result[0]))])
        print(np.sort(result[0])[0,-5:])

模型训练实例
------------------------------------------------------------------------------------

模型微调实例，以ResNet50在猫狗数据集上的微调为例，说明导入第三方框架模型生成训练
引擎，并执行训练的完整过程。

1. 加载预训练模型并修改分类类别数；
2. 使用lynbwd_graph()将模型转换为算子；
3. 使用gragh.set_lr_vec()设置学习率；
4. 通过sigmoid_CE_head()设置损失函数；
5. 通过graph.build()构建训练引擎；
6. 给训练引擎输入数据，执行训练过程；
7. 训练完成后保存模型。

示例：完整代码参见 :ref:`sample_code` 的【示例F】

::
        
    # 转换训练引擎
    def lyn_build(model, input_shape, output_shape, lr, out_path):
        inputs_dict = {"data" : input_shape}

    # 模型转换
        graph = lynbwd_graph()
        graph.load_pth(model, inputs_dict)

    # 设置学习率
        w_names = graph.get_w_names()
        trained = [
            "fc.weight",
            "fc.bias",
        ]
        graph.set_lr_vec([lr if name in trained else 0 for name in w_names])

    # 设置损失函数
        loss_head = sigmoid_CE_head("gt", output_shape)

    # 构建训练引擎
        engine = graph.build(loss_head, out_path=out_path)
        print("###[lynbuild] model build end! output_path is:", out_path)
        return engine

    if __name__ == "__main__":
        model_file = "./ResNet50_2classes.pth"
        num_classes = 2
        batch_size = 4
        input_shape = (batch_size, 3, 224, 224)
        output_shape = (batch_size, num_classes)
        lr = 0.001
        out_path = "./ResNet50_2classes_finetune/"

    # 加载预训练模型并输出类别
        model = torchvision.models.resnet50(pretrained=True)
        change_pth_fc_out_channels(model, model_file, num_classes)

    # 转换训练引擎
        engine = lyn_build(model_file, input_shape, output_shape, lr, out_path)

        train_images, train_labels = load_cats_dogs_dataset("./datasets/")
        train_images = preprocess_images(train_images)
        index = np.arange(len(train_images))

        epoch_num = 1
        grid_size = 64
        grid_num = len(train_images) // (grid_size \* batch_size)

    # 模型训练
        for epoch in range(epoch_num):
            np.random.shuffle(index)
            correct_cnt = 0
            image_cnt = 0

            for grid in range(grid_num):
                t0 = time.time()
                for i in range(grid_size):
                    offset = (grid * grid_size + i) * batch_size
                    images = [train_images[index[offset + j]] for j in range(batch_size)]
                    labels = [train_labels[index[offset + j]] for j in range(batch_size)]
                    x = np.concatenate(images, axis=0)
                    x = x.reshape(*input_shape)
                    y = np.zeros(output_shape, dtype="float16")
                    for j in range(batch_size):
                        y[j, labels[j]] = 1

    # 模型运行
                    in_dict = {"data" : x, "gt" : y}
                    pred = engine.run(in_dict)[0]
                    for j in range(batch_size):
                        vec = pred[j].reshape(-1)
                        if vec.argmax() == labels[j]:
                            correct_cnt += 1
                t1 = time.time()
                dt = (t1 - t0) * 1000.0 / (grid_size * batch_size)
                image_cnt += grid_size * batch_size
                train_acc = correct_cnt / image_cnt
                print("epoch: {}, grid: {}, acc: {:.4f}, dt: {:.2f} ms".format(epoch, grid, train_acc, dt))

    # 保存模型
            engine.update_pth_weights(model_file, "./ResNet50_cats_dogs.pth")