What does this repo signal mean?

Baidu (ERNIE) published PaddlePaddle/tape (C++). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo PaddlePaddle/tape · language C++. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Baidu (ERNIE) Repo: PaddlePaddle/tape

Captured source

source ↗

GitHub/github.com/PaddlePaddle/tape

PaddlePaddle/tape repository metadata

Source ↗

published Jun 18, 2018seen 5dcaptured 8hhttp 200method plain

PaddlePaddle/tape

Language: C++

Stars: 14

Forks: 9

Open issues: 6

Created: 2018-06-18T18:53:15Z

Pushed: 2020-01-14T09:42:14Z

Default branch: develop

Fork: no

Archived: yes

README:

Dynamic Graph on Fluid

PaddlePaddle Fluid is targeting the autodiff without tape, which, however, is very challenging and we are still way from there. DyNet and PyTorch provide a good design idea, the *tape*, that significantly eases the challenge. Also, DyNet provides a C++ API that is as convenient as Python but with higher efficiency and could conveniently integrate with industrial/production systems. This package, tape, combines the good of

1. tape from PyTorch and DyNet 2. C++ API and core from DyNet 3. rich set of operators from PaddlePaddle

Overview

We can implement Dynet-like Tape(See this survey) by wrapping Paddle Fluid's Operator and Variable.

The user API is straight forward since

1. it is imperative. And it uses host language's control flow logic. 1. it avoids extra concepts such as Scope and Executor.

All of these benefits come at the cost of just adding one line reset_global_tape at every iteration.

Code Structure

In short, the Tape contains a vector of OpHandles. And an OpHandle contains its type, the pointers to the Variables, and necessary attributes.

class Variable {
public:
VriableHandle Grad(); // returns its gradient variable
private:
framework::VarDesc desc_; // compile time infershape, necessary for lazy execution
framework::Variable var_; // run time variable, holds data memory
};

using VariableHandle = shared_ptr;

struct OpHandle {
string type_;
map> inputs_;
map> outputs_;
AttributeMap attrs_;
};

class Tape {
public:
void AddOp(OpHandle); // add op
void Forward(); // execute the tape_
void Backward(); // execute the backward of the tape_
private:
vector tape_;
};

We uses Function to indicate layers. It takes care of parameter initialization and AddOp to the Tape when it is called.

class Linear {
public:
Linear(int in_dim, int out_dim, const std::string &act)
: w_(new Variable("LinearWeight")),
b_(new Variable("LinearBias")),
act_(act) {
Tape init_tape;

std::string initializer = "fill_constant";
framework::AttributeMap attrs;
attrs["dtype"] = paddle::framework::proto::VarType::Type::VarType_Type_FP32;
attrs["shape"] = std::vector{in_dim, out_dim};
attrs["value"] = 1.0f;
init_tape.AddOp(initializer, {}, {{"Out", {w_}}}, attrs);

attrs["dtype"] = paddle::framework::proto::VarType::Type::VarType_Type_FP32;
attrs["shape"] = std::vector{out_dim};
attrs["value"] = 1.0f;
init_tape.AddOp(initializer, {}, {{"Out", {b_}}}, attrs);

init_tape.Forward();
}

VariableHandle operator()(VariableHandle input) {
VariableHandle pre_bias(new Variable("linear"));
get_global_tape().AddOp("mul",
{{"X", {input}}, {"Y", {w_}}},
{{"Out", {pre_bias}}},
{{"x_num_col_dims", 1}, {"y_num_col_dims", 1}});
VariableHandle pre_act(new Variable("linear"));
get_global_tape().AddOp("elementwise_add",
{{"X", {pre_bias}}, {"Y", {b_}}},
{{"Out", {pre_act}}},
{{"axis", 1}});
VariableHandle post_act(new Variable("linear"));
get_global_tape().AddOp(act_,
{{"X", {pre_act}}},
{{"Out", {post_act}}},
{});
return post_act;
}

std::vector Params() { return {w_, b_}; }

private:
VariableHandle w_;
VariableHandle b_;
std::string act_;
};

User API

// Model function
paddle::tape::Linear linear1(3, 3, "relu"); // init weight and bias
paddle::tape::Linear linear2(3, 3, "relu"); // init weight and bias
paddle::tape::Mean mean;

// Optimizer
paddle::tape::SGD sgd(0.001);

// Data Feeder
paddle::tape::Fill data_feeder(...);
VariableHandle input(new paddle::tape::Variable("input"));
VariableHandle label(new paddle::tape::Variable("label"));

for (int i = 0; i Grad()
get_global_tape.Backward(loss);

// Update w
sgd(linear1.Params());
sgd(linear2.Params());
}

digraph G {

subgraph cluster_0 { node [shape=record,style=filled]; style=filled; color=lightgrey; linear1 [label="{type: mul | {input | {X: before_mul1 | Y: weight1}} | {output | Out: before_bias1}}"]; elementwise_add1 [label="{type: elementwise_add | {input | {X: before_bias1 | Y: bias1}} | {output | Out: before_act1}}"]; relu1 [label="{type: relu | {input | {X: before_act1 }} | {output | Out: after_act1}}"];

linear1 -> elementwise_add1->relu1; label = "forward tape"; }

linear1:before_mul1->before_mul1 linear1:weight1->weight1 linear1:before_bias1->before_bias1

elementwise_add1:bias1->bias1 elementwise_add1:before_bias1->before_bias1 elementwise_add1:before_act1->before_act1

relu1:before_act1->before_act1 relu1:after_act1->after_act1

subgraph cluster_1 { node [shape=record,style=filled]; style=filled; color=lightgrey; linear1_grad [label="{type: mul_grad | {input | {X: before_mul1 | Y: weight1| Out_grad: before_bias1_grad}} | {output |{X_grad: before_mul1_grad | Y_grad: weight1_grad}}}"];

elementwise_add1_grad [label="{type: elementwise_add_grad | {input | Out_grad: before_act1_grad} | {output |{X_grad: before_bias1_grad | Y_grad: bias1_grad}}}"];

relu1_grad [label="{type: relu_grad | {input | Out_grad: after_act1_grad} | {ouput | {X_grad: before_act1_grad }}}"];

linear1_grad -> elementwise_add1_grad ->relu1_grad [dir=back]; label = "backward tape"; }

relu1_grad:after_act1_grad->after_act1_grad relu1_grad:before_act1_grad->before_act1_grad

elementwise_add1_grad:before_act1_grad->before_act1_grad elementwise_add1_grad:before_bias1_grad->before_bias1_grad elementwise_add1_grad:bias1_grad->bias1_grad

linear1_grad:before_mul1->before_mul1 linear1_grad:weight1->weight1 linear1_grad:before_bias1_grad->before_bias1_grad linear1_grad:before_mul1_grad->before_mul1_grad linear1_grad:weight1_grad->weight1_grad

subgraph cluster_2 { node [shape=record]; label = "Linear1"; weight1 bias1 }

weight1 -> weight1_grad [ label="Grad()", style="dashed" ]; bias1 -> bias1_grad [ label="Grad()", style="dashed"];

}

!Image

Code Reuse

We want to stay close to Paddle Fluid as much as possible.

Reuse All Operators

As all Ops are registered at OpInfoMap, the effort of adding a new Function is about 10 lines of code, similar to expose an operator to Python.

Reuse Compile Time InferShape and InferVarType

Note that all the symbolic information is stored at tape::Varaible::desc_, instead of…

Excerpt shown — open the source for the full document.