C++簡(jiǎn)易版Tensor實(shí)現(xiàn)方法詳解

更新時(shí)間：2022年08月11日 10:09:36 作者：AI_潛行者

這篇文章主要介紹了C++簡(jiǎn)易版Tensor的實(shí)現(xiàn)方法，文中通過(guò)示例代碼介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值

基礎(chǔ)知識(shí)鋪墊

缺省參數(shù)
異常處理
如果有模板元編程經(jīng)驗(yàn)更好
std::memset、std::fill、std::fill_n、std::memcpy

std::memset 的內(nèi)存填充單位固定為字節(jié)（char），所以不能應(yīng)用與double，非char類(lèi)型只適合置0。

std::fill 和 std::fill_n 則可以對(duì)指定類(lèi)型進(jìn)行內(nèi)存填充，更加通用。

std::memcpy 則可以講內(nèi)存中排列好的數(shù)據(jù)拷貝過(guò)去，不同位置可填充不同值。

double dp[505];
std::memset(dp, -1.0, 505 * sizeof(double));//錯(cuò)誤的 ★★★，memset的單位是字節(jié)（char），我們需要的是fill

double dp[505];
std::fill(dp, dp + 505, -1.0);
std::fill(std::begin(dp), std::end(dp), -1.0);
std::fill_n(dp, 505, -1.0);

double dp[505];
double data[5] = {11,22,33,44,55};
std::memcpy(dp, data, 5 * sizeof(double))

內(nèi)存管理 allocate

在c++11中引入了智能指針這個(gè)概念，這個(gè)非常好，但是有一個(gè)問(wèn)題顯然被忘記了，如何動(dòng)態(tài)創(chuàng)建智能指針數(shù)組，在c++11中沒(méi)有提供直接的函數(shù)。換句話說(shuō)，創(chuàng)建智能指針的make_shared，不支持創(chuàng)建數(shù)組。那在c++11中如何創(chuàng)建一個(gè)智能指針數(shù)組呢？只能自己封裝或者變通實(shí)現(xiàn)，在c++14后可以支持構(gòu)造函數(shù)創(chuàng)建智能指針數(shù)組，可這仍然不太符合技術(shù)規(guī)范發(fā)展的一致性，可繼承性。

共享指針share_ptr 和唯一指針unique_ptr 可能并不是一個(gè)很完整的方式，因?yàn)槟J(rèn)情況下需要開(kāi)發(fā)人員手動(dòng)的指定 delete handler。但是只需要簡(jiǎn)單的封裝一下就可以是更智能的方式，就是自動(dòng)生成 delete handler。并且不必使用new（或者其他的指針形式）作為構(gòu)造參數(shù)，而是直接通過(guò) allocate 和 construct 兩種形式，最抽象簡(jiǎn)單直觀的方式得到想要的。

shared_ptr<T> pt0(new T());// 將會(huì)自動(dòng)采用 std::default_delete
shared_ptr<int> p1 = make_shared<int>();
//指定 default_delete 作為釋放規(guī)則
std::shared_ptr<int> p6(new int[10], std::default_delete<int[]>());
//自定義釋放規(guī)則
void deleteInt(int*p) { delete []p; }
std::shared_ptr<int> p3(new int[10], deleteInt);

我們期待的規(guī)范后是這樣使用的：不用考慮釋放規(guī)則，而且分為 allocate 和 construct 兩種形式。

auto uptr = Alloc::unique_allocate<Foo>(sizeof(Foo));
auto sptr = Alloc::shared_allocate<Foo>(sizeof(Foo));
auto uptr = Alloc::unique_construct<Foo>();
auto sptr = Alloc::shared_construct<Foo>('6', '7');

allocator.h

#ifndef UTILS_ALLOCATOR_H
#define UTILS_ALLOCATOR_H
#include <cstdlib>
#include <map>
#include <memory>
#include <utility>
#include "base_config.h"
namespace st {
// 工具類(lèi)（單例）
class Alloc {
public:
    // allocate 刪除器
    class trivial_delete_handler {
    public:
        trivial_delete_handler(index_t size_) : size(size_) {}
        void operator()(void* ptr) { deallocate(ptr, size); }
    private:
        index_t size;
    };
    // construct 刪除器
    template<typename T>
    class nontrivial_delete_handler {
    public:
        void operator()(void* ptr) {
            static_cast<T*>(ptr)->~T();
            deallocate(ptr, sizeof(T));
        }
    };
    // unique_ptr ：對(duì)應(yīng) allocate
    template<typename T> 
    using TrivialUniquePtr = std::unique_ptr<T, trivial_delete_handler>;
    // unique_ptr ：對(duì)應(yīng) construct
    template<typename T>
    using NontrivialUniquePtr = std::unique_ptr<T, nontrivial_delete_handler<T>>;
    // I know it's weird here. The type has been already passed in as T, but the
    // function parameter still need the number of bytes, instead of objects.
    // And their relationship is 
    //          nbytes = nobjects * sizeof(T).
    // Check what I do in "tensor/storage.cpp", and you'll understand.
    // Or maybe changing the parameter here and doing some extra work in 
    // "tensor/storage.cpp" is better.
    // 共享指針 allocate
    // 目的：自動(dòng)生成 delete handler
    template<typename T> 
    static std::shared_ptr<T> shared_allocate(index_t nbytes) {
        void* raw_ptr = allocate(nbytes);
        return std::shared_ptr<T>(
            static_cast<T*>(raw_ptr),
            trivial_delete_handler(nbytes)
        );
    }
    // 唯一指針 allocate
    // 目的：自動(dòng)生成 delete handler
    template<typename T>
    static TrivialUniquePtr<T> unique_allocate(index_t nbytes) {
        //開(kāi)辟 內(nèi)存
        void* raw_ptr = allocate(nbytes); 
        //返回 unique_ptr（自動(dòng)生成了刪除器）
        return TrivialUniquePtr<T>(
            static_cast<T*>(raw_ptr),
            trivial_delete_handler(nbytes)
        );
    }
    // 共享指針 construct
    // 目的：自動(dòng)生成 delete handler
    template<typename T, typename... Args>
    static std::shared_ptr<T> shared_construct(Args&&... args) {
        void* raw_ptr = allocate(sizeof(T));
        new(raw_ptr) T(std::forward<Args>(args)...); 
        return std::shared_ptr<T>(
            static_cast<T*>(raw_ptr),
            nontrivial_delete_handler<T>()
        );
    }
    // 唯一指針 construct
    // 目的：自動(dòng)生成 delete handler
    template<typename T, typename... Args>
    static NontrivialUniquePtr<T> unique_construct(Args&&... args) {
        void* raw_ptr = allocate(sizeof(T));
        new(raw_ptr) T(std::forward<Args>(args)...);
        return NontrivialUniquePtr<T>(
            static_cast<T*>(raw_ptr),
            nontrivial_delete_handler<T>()
        );
    }
    static bool all_clear(void);
private:
    Alloc() = default;
    ~Alloc(){ 
        /* release unique ptr, the map will not do destruction!!! */
        for (auto iter = cache_.begin(); iter != cache_.end(); ++iter) {  iter->second.release(); }
    }
    static Alloc& self(); // 單例
    static void* allocate(index_t size);
    static void deallocate(void* ptr, index_t size);
    static index_t allocate_memory_size;
    static index_t deallocate_memory_size;
    struct free_deletor {
        void operator()(void* ptr) { std::free(ptr); }
    };
    // multimap 允許容器有重復(fù)的 key 值
    // 保留開(kāi)辟過(guò)又釋放掉的堆內(nèi)存，再次使用的時(shí)候可重復(fù)使用（省略了查找可用堆內(nèi)存的操作）
    std::multimap<index_t, std::unique_ptr<void, free_deletor>> cache_;
};
} // namespace st
#endif

allocator.cpp

#include "allocator.h"
#include "exception.h"
#include <iostream>
namespace st {
index_t Alloc::allocate_memory_size = 0;
index_t Alloc::deallocate_memory_size = 0;
Alloc& Alloc::self() {
    static Alloc alloc;
    return alloc;
}
void* Alloc::allocate(index_t size) {
    auto iter = self().cache_.find(size);
    void* res;
    if(iter != self().cache_.end()) {
        // 臨時(shí)：為什么要這么做？找到了為社么要?jiǎng)h除
        res = iter->second.release();//釋放指針指向內(nèi)存
        self().cache_.erase(iter);//擦除
    } else {
        res = std::malloc(size); 
        CHECK_NOT_NULL(res, "failed to allocate %d memory.", size);
    }
    allocate_memory_size += size;
    return res;
}
void Alloc::deallocate(void* ptr, index_t size) {
    deallocate_memory_size += size;
    // 本質(zhì)上是保留保留 堆內(nèi)存中的位置,下一次可直接使用,而不是重新開(kāi)辟
    self().cache_.emplace(size, ptr); // 插入
}
bool Alloc::all_clear() {
    return allocate_memory_size == deallocate_memory_size;
}
} // namespace st

使用：封裝成 unique_allocate、unique_construct、share_allocate、share_construct 的目的就是對(duì) share_ptr 和 unique_ptr 的生成自動(dòng)賦予其對(duì)應(yīng)的 delete handler。

struct Foo {
    static int ctr_call_counter;
    static int dectr_call_counter;
    char x_;
    char y_;
    Foo() { ++ctr_call_counter; }
    Foo(char x, char y) : x_(x), y_(y) { ++ctr_call_counter; }
    ~Foo() { ++dectr_call_counter; }
};
int Foo::ctr_call_counter = 0;
int Foo::dectr_call_counter = 0;
void test_Alloc() {
    using namespace st;
    // allocate 開(kāi)辟空間
    // construct 開(kāi)辟空間 + 賦值
    void* ptr;
    {//
        auto uptr = Alloc::unique_allocate<Foo>(sizeof(Foo));
        CHECK_EQUAL(Foo::ctr_call_counter, 0, "check 1");
        ptr = uptr.get();
    }
    CHECK_EQUAL(Foo::dectr_call_counter, 0, "check 1");
    {
        auto sptr = Alloc::shared_allocate<Foo>(sizeof(Foo));
        // The strategy of allocator.
        CHECK_EQUAL(ptr, static_cast<void*>(sptr.get()), "check 2");
    }
    {
        auto uptr = Alloc::unique_construct<Foo>();
        CHECK_EQUAL(Foo::ctr_call_counter, 1, "check 3");
        CHECK_EQUAL(ptr, static_cast<void*>(uptr.get()), "check 3");
    }
    CHECK_EQUAL(Foo::dectr_call_counter, 1, "check 3");
    {
        auto sptr = Alloc::shared_construct<Foo>('6', '7');
        CHECK_EQUAL(Foo::ctr_call_counter, 2, "check 4");
        CHECK_TRUE(sptr->x_ == '6' && sptr->y_ == '7', "check 4");
        CHECK_EQUAL(ptr, static_cast<void*>(sptr.get()), "check 4");
    }
    CHECK_EQUAL(Foo::dectr_call_counter, 2, "check 4");
}

實(shí)現(xiàn)Tensor需要準(zhǔn)備shape和storage

shape 管理形狀，每一個(gè)Tensor的形狀都是唯一的（采用 unique_ptr管理數(shù)據(jù)），見(jiàn)array.h 個(gè) shape.h。

storage：管理數(shù)據(jù)，不同的Tensor的數(shù)據(jù)可能是同一份數(shù)據(jù)（share_ptr管理數(shù)據(jù)），見(jiàn)stroage.h。

array.h

#ifndef UTILS_ARRAY_H
#define UTILS_ARRAY_H
#include <initializer_list>
#include <memory>
#include <cstring>
#include <iostream>
// utils
#include "base_config.h"
#include "allocator.h"
namespace st {
	// 應(yīng)用是 tensor 的 shape， shape 是唯一的， 所以用 unique_ptr
	// 臨時(shí)：實(shí)際上并不是很完善，目前的樣子有點(diǎn)對(duì)不起這個(gè) Dynamic 單詞
	template<typename Dtype>
	class DynamicArray {
	public:
	    explicit DynamicArray(index_t size) 
	            : size_(size),
	              dptr_(Alloc::unique_allocate<Dtype>(size_ * sizeof(Dtype))) {
	    }
	    DynamicArray(std::initializer_list<Dtype> data) 
	            : DynamicArray(data.size()) {
	        auto ptr = dptr_.get();
	        for(auto d: data) {
	            *ptr = d;
	            ++ptr;
	        }
	    }
	    DynamicArray(const DynamicArray<Dtype>& other) 
	            : DynamicArray(other.size()) {
	        std::memcpy(dptr_.get(), other.dptr_.get(), size_ * sizeof(Dtype));
	    }
	    DynamicArray(const Dtype* data, index_t size) 
	            : DynamicArray(size) {
	        std::memcpy(dptr_.get(), data, size_ * sizeof(Dtype));
	    }
	    explicit DynamicArray(DynamicArray<Dtype>&& other) = default;
	    ~DynamicArray() = default;
	    Dtype& operator[](index_t idx) { return dptr_.get()[idx]; }
	    Dtype operator[](index_t idx) const { return dptr_.get()[idx]; }
	    index_t size() const { return size_; }
	    // 注意 std::memset 的單位是字節(jié)（char），若不是char類(lèi)型，只用來(lái)置0，否則結(jié)果錯(cuò)誤
	    // 臨時(shí)：std::memset 對(duì)非char類(lèi)型只適合內(nèi)存置0，如果想要更加通用，不妨考慮一下 std::fill 和 std::fill_n
	    void memset(int value) const { std::memset(dptr_.get(), value, size_ * sizeof(Dtype)); } //原
	    void fill(int value) const (std::fill_n, size_, value); //改：見(jiàn)名知意
	private:
	    index_t size_;
	    Alloc::TrivialUniquePtr<Dtype> dptr_;
	};
} // namespace st
#endif

stroage.h

#ifndef TENSOR_STORAGE_H
#define TENSOR_STORAGE_H
#include <memory>
#include "base_config.h"
#include "allocator.h"
namespace st {
    namespace nn {
        class InitializerBase;
        class OptimizerBase;
    }
    class Storage {
    public:
        explicit Storage(index_t size);
        Storage(const Storage& other, index_t offset); //觀察：offset 具體應(yīng)用？bptr_數(shù)據(jù)依然是同一份，只是dptr_指向位置不同，這是關(guān)于pytorch的clip，切片等操作的設(shè)計(jì)方法
        Storage(index_t size, data_t value);
        Storage(const data_t* data, index_t size);
        explicit Storage(const Storage& other) = default;//復(fù)制構(gòu)造（因?yàn)閿?shù)據(jù)都是指針形式，所以直接默認(rèn)就行）
        explicit Storage(Storage&& other) = default;//移動(dòng)構(gòu)造（因?yàn)閿?shù)據(jù)都是指針形式，所以直接默認(rèn)就行）
        ~Storage() = default;
        Storage& operator=(const Storage& other) = delete;
        // inline function
        data_t operator[](index_t idx) const { return dptr_[idx]; }
        data_t& operator[](index_t idx) { return dptr_[idx]; }
        index_t offset(void) const { return dptr_ - bptr_->data_; }//
        index_t version(void) const { return bptr_->version_; }//
        void increment_version(void) const { ++bptr_->version_; }//???
        // friend function
        friend class nn::InitializerBase;
        friend class nn::OptimizerBase;
    public:
        index_t size_;
    private:
        struct Vdata {
            index_t version_; //???
            data_t data_[1]; //永遠(yuǎn)指向數(shù)據(jù)頭
        };
        std::shared_ptr<Vdata> bptr_;  // base pointer, share_ptr 的原因是不同的tensor可能指向的是storage數(shù)據(jù)
        data_t* dptr_;  // data pointer, 指向 Vdata 中的 data_, 他是移動(dòng)的（游標(biāo)）
    };
}  // namespace st
#endif

storage.cpp

#include <iostream>
#include <cstring>
#include <algorithm>
#include "storage.h"
namespace st {
    Storage::Storage(index_t size)
        : bptr_(Alloc::shared_allocate<Vdata>(size * sizeof(data_t) + sizeof(index_t))),
        dptr_(bptr_->data_)
    {
        bptr_->version_ = 0;
        this->size_ = size;
    }
    Storage::Storage(const Storage& other, index_t offset)
        : bptr_(other.bptr_),
        dptr_(other.dptr_ + offset)
    {
        this->size_ = other.size_;
    }
    Storage::Storage(index_t size, data_t value)
        : Storage(size) {
        //std::memset(dptr_, value, size * sizeof(data_t)); // 臨時(shí)
        std::fill_n(dptr_, size, value);
    }
    Storage::Storage(const data_t* data, index_t size)
        : Storage(size) {
        std::memcpy(dptr_, data, size * sizeof(data_t));
    }
}  // namespace st

shape.h

#ifndef TENSOR_SHAPE_H
#define TENSOR_SHAPE_H
#include <initializer_list>
#include <ostream>
#include "base_config.h"
#include "allocator.h"
#include "array.h"
namespace st {
    class Shape {
    public:
        // constructor
        Shape(std::initializer_list<index_t> dims);
        Shape(const Shape& other, index_t skip);
        Shape(index_t* dims, index_t dim);
        Shape(IndexArray&& shape);
        Shape(const Shape& other) = default;
        Shape(Shape&& other) = default;
        ~Shape() = default;
        // method
        index_t dsize() const;
        index_t subsize(index_t start_dim, index_t end_dim) const;
        index_t subsize(index_t start_dim) const;
        bool operator==(const Shape& other) const;
        // inline function
        index_t ndim(void) const { return dims_.size(); }
        index_t operator[](index_t idx) const { return dims_[idx]; }
        index_t& operator[](index_t idx) { return dims_[idx]; }
        operator const IndexArray() const { return dims_; }
        // friend function
        friend std::ostream& operator<<(std::ostream& out, const Shape& s);
    private:
        IndexArray dims_; // IndexArray 就是（DynamicArray）
    };
}  // namespace st
#endif

shape.cpp

#include "shape.h"
namespace st {
    Shape::Shape(std::initializer_list<index_t> dims) : dims_(dims) {}
    Shape::Shape(const Shape& other, index_t skip) : dims_(other.ndim() - 1) {
        int i = 0;
        for (; i < skip; ++i)
            dims_[i] = other.dims_[i];
        for (; i < dims_.size(); ++i)
            dims_[i] = other.dims_[i + 1];
    }
    Shape::Shape(index_t* dims, index_t dim_) : dims_(dims, dim_) {}
    Shape::Shape(IndexArray&& shape) : dims_(std::move(shape)) {}
    index_t Shape::dsize() const {
        int res = 1;
        for (int i = 0; i < dims_.size(); ++i)
            res *= dims_[i];
        return res;
    }
    index_t Shape::subsize(index_t start_dim, index_t end_dim) const {
        int res = 1;
        for (; start_dim < end_dim; ++start_dim)
            res *= dims_[start_dim];
        return res;
    }
    index_t Shape::subsize(index_t start_dim) const {
        return subsize(start_dim, dims_.size());
    }
    bool Shape::operator==(const Shape& other) const {
        if (this->ndim() != other.ndim()) return false;
        index_t i = 0;
        for (; i < dims_.size() && dims_[i] == other.dims_[i]; ++i)
            ;
        return i == dims_.size();
    }
    std::ostream& operator<<(std::ostream& out, const Shape& s) {
        out << '(' << s[0];
        for (int i = 1; i < s.ndim(); ++i)
            out << ", " << s[i];
        out << ")";
        return out;
    }
}  // namespace st

Tensor的設(shè)計(jì)方法（基礎(chǔ)）

知識(shí)準(zhǔn)備：繼承、指針類(lèi)、奇異遞歸模板（靜態(tài)多態(tài)）、表達(dá)式模板、Impl設(shè)計(jì)模式（聲明實(shí)現(xiàn)分離）、友元類(lèi)、模板特化。

tensor的設(shè)計(jì)采用的 impl 方法（聲明和實(shí)現(xiàn)分離），采用了奇異遞歸模板（靜態(tài)多態(tài)），Tensor本身管理Tensor的張量運(yùn)算，Exp則管理引用計(jì)數(shù)、梯度計(jì)數(shù)（反向求導(dǎo)，梯度更新時(shí)需要用到）的運(yùn)算。

一共5個(gè)類(lèi)：Tensor，TensorImpl，Exp，ExpImpl，ExpImplPtr，他們之間的關(guān)系由下圖體現(xiàn)。

先上圖：

代碼：

// 代碼比較多,就不放在這了,參看源碼結(jié)合注釋理解

Tensor的設(shè)計(jì)方法（更進(jìn)一步）

Tensor 數(shù)據(jù)內(nèi)存分布管理

Tensor的數(shù)據(jù)只有獨(dú)一份，那么Tensor的各種操作 transpose，purmute，slice，等等，難道都要生出一個(gè)新的 tensor 和對(duì)應(yīng)新的數(shù)據(jù)嗎？當(dāng)然不可能，能用一份數(shù)據(jù)的絕不用兩份！tensor 數(shù)據(jù)的描述主要有 size（總數(shù)數(shù)據(jù)量），offset（此 tensor 相對(duì)于原始base數(shù)據(jù)的一個(gè)偏移量） ndim（幾個(gè)維度），shape（每個(gè)維度映射的個(gè)數(shù)），stride（每個(gè)維度中數(shù)據(jù)的索引步長(zhǎng)），stride 和 shape是一一對(duì)應(yīng)的，通過(guò)這個(gè)stride的索引公式，我們就可以用一份數(shù)據(jù)幻化出不同的tensor表象了。解析如下圖