CMU15445 2023fall——PROJECT #1 - BUFFER POOL

发布时间 2023-11-02 17:52:52作者: vLiion

PROJECT #1 - BUFFER POOL


ASSIGNMENT 翻译

点击查看 Task #2 - Disk Scheduler 翻译

Task #2 - Disk Scheduler (磁盘调度程序)

该组件负责调度 DiskManager 上的读写操作。实现 disk_scheduler.h 文件和 disk_scheduler.cpp 文件。

This component is responsible for scheduling read and write operations on the DiskManager. You will implement a new class called DiskScheduler in src/include/storage/disk/disk_scheduler.h and its corresponding implementation file in src/storage/disk/disk_scheduler.cpp.


disk scheduler 能够被其它组件(比如 BufferPoolPanager)使用,来进行 disk requests 的队列操作(disk requests 是由 DiskRequest 结构体实现)。disk scheduler 会维护一个后台工作线程,负责处理调度的 requests

The disk scheduler can be used by other components (in this case, your BufferPoolManager in Task #3) to queue disk requests, represented by a DiskRequest struct (already defined in src/include/storage/disk/disk_scheduler.h). The disk scheduler will maintain a background worker thread which is responsible for processing scheduled requests.


disk scheduler 利用 共享队列 来调度和处理 disk requests。一个线程将往队列中添加一个请求,disk scheduler 的后台工作线程就会处理队列中的这些请求。我们提供了一个 Channel 类来实现线程间安全的数据共享,也可以自己实现。

The disk scheduler will utilize a shared queue to schedule and process the DiskRequests. One thread will add a request to the queue, and the disk scheduler's background worker will process the queued requests. We have provided a Channel class in src/include/common/channel.h to facilitate the safe sharing of data between threads, but feel free to use your own implementation if you find it necessary.


DiskScheduler 的构造函数和析构函数已经实现,以用来负责创建和加入后台工作线程。你只要实现以下方法:

Schedule(DiskRequest r)

调度 DiskManager 上的一个请求去执行。DiskRequest 结构体中指明了请求是读还是写,数据应当写到哪儿或者从哪儿读取,和需要操作的 page IDDisRequest 结构体还包含了一个 std::promise,一旦处理请求,将其值设置为 true

StartWorkerThread()

启动用于处理调度请求的工作线程。工作线程应当在 DiskScheduler 构造函数中创建,调用这个方法。这个方法负责获取队列请求并且分派到 DiskManager
请记住在 DiskRequest 的回调中设置值,以向请求发出者发出请求已完成的信号。
在调用 DiskScheduler 的析构函数之前,该值不应返回。

查看 disk_scheduler_test.cpp 中回调的示例并且确保线程安全。


The DiskScheduler constructor and destructor are already implemented and are responsible creating and joining the background worker thread. You will only need to implement the following methods as defined in the header file (src/include/storage/disk/disk_scheduler.h) and in the source file (src/storage/disk/disk_scheduler.cpp):

Schedule(DiskRequest r) : Schedules a request for the DiskManager to execute. The DiskRequest struct specifies whether the request is for a read/write, where the data should be written into/from, and the page ID for the operation. The DiskRequest also includes a std::promise whose value should be set to true once the request is processed.
StartWorkerThread() : Start method for the background worker thread which processes the scheduled requests. The worker thread is created in the DiskScheduler constructor and calls this method. This method is responsible for getting queued requests and dispatching them to the DiskManager. Remember to set the value on the DiskRequest's callback to signal to the request issuer that the request has been completed. This should not return until the DiskScheduler's destructor is called.
Lastly, one of the fields of a DiskRequest is a std::promise. If you are unfamiliar with C++ promises and futures, you can check out their documentation. For the purposes of this project, they essentially provide a callback mechanism for a thread to know when their scheduled request is completed. To see an example of how they might be used, check out disk_scheduler_test.cpp.

Again, the implementation details are up to you, but you must make sure that your implementation is thread-safe.


点击查看 Task #3 - Buffer Pool Manager 翻译

Task #3 - Buffer Pool Manager (缓冲池管理器)


接下来,实现 the buffer pool manager.
BufferPoolManager 负责使用 DiskScheduler 从磁盘获取数据库页面并将其存储在内存中。BufferPoolManager 可以在需要通过逐出页面为新页面腾出空间时,或者被人为指示时,安排将脏页面写入磁盘。

Next, implement the buffer pool manager (BufferPoolManager). The BufferPoolManager is responsible for fetching database pages from disk with the DiskScheduler and storing them in memory. The BufferPoolManager can also schedule writes of dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.


为了确保你的实现能够在系统中正常工作,已经提供了一些已经写好的函数。你也不需要实现读取数据并将数据写入磁盘的代码(已经由 DiskManager 实现)。你需要实现的就是实现 DiskScheduler 来将磁盘请求分派给 DiskManager(这就是任务二做的东西)

To make sure that your implementation works correctly with the rest of the system, we will provide you with some functions already filled in. You will also not need to implement the code that actually reads and writes data to disk (this is called the DiskManager in our implementation). We will provide that functionality. You do, however, need to implement the DiskScheduler to process disk requests and dispatch them to the DiskManager (this is Task #2).


系统中的所有内存页面都由 Page 对象表示。BufferPoolManager 不需要了解 page 的内容。但是你需要知道Page对象只是缓冲池中内存的容器。也就是说,每个 Page 对象都包含一个内存块,DiskManager 将使用该内存块来复制从磁盘读取的物理页的内容。当数据在磁盘上来回移动时,BufferPoolManager 将重用相同的 Page 对象来存储数据。 这意味着在系统的整个生命周期中,同一个 Page 对象可能包含不同的物理页。 Page 对象的标识符 (page_id) 跟踪它包含的物理页; 如果 Page 对象不包含物理页,则其 page_id 必须设置为 INVALID_PAGE_ID

All in-memory pages in the system are represented by Page objects. The BufferPoolManager does not need to understand the contents of these pages. But it is important for you as the system developer to understand that Page objects are just containers for memory in the buffer pool and thus are not specific to a unique page. That is, each Page object contains a block of memory that the DiskManager will use as a location to copy the contents of a physical page that it reads from disk. The BufferPoolManager will reuse the same Page object to store data as it moves back and forth to disk. This means that the same Page object may contain a different physical page throughout the life of the system. The Page object's identifer (page_id) keeps track of what physical page it contains; if a Page object does not contain a physical page, then its page_id must be set to INVALID_PAGE_ID.


每个 Page 对象还维护一个计数器,用于记录“固定pinned”该页面的线程数(大概意思是 Page 对象可能会被多个线程使用,维护一个变量用于记录关联的线程数,只要计数器大于0,也就是说 Page 会被某个线程使用,这时候就不允许释放页面)。BufferPoolManager 不允许释放 pinnedPage。每个 Page 还跟踪它是否是 dirty page。你的任务是记录页面在准备 unpinned 之前,这个页面有没有被修改(如果没有被修改,就标记为 unpinned)。BufferPoolManager 必须将脏页的内容写回磁盘,然后才能重用该对象。(当一个 Page 在缓冲池中被修改后,它被标记为“脏”的,这表示它的内容与磁盘上的对应物理页面的内容不一致,已经发生了修改。为了提高数据库系统的性能。如果每次修改都立即写回磁盘,将会导致大量的磁盘I/O操作,影响性能。因此,数据库系统采用了延迟写回策略,将修改后的页暂时保存在缓冲池中,并在适当的时候批量写回磁盘,以减少磁盘I/O的开销。)

Each Page object also maintains a counter for the number of threads that have "pinned" that page. Your BufferPoolManager is not allowed to free a Page that is pinned. Each Page object also keeps track of whether it is dirty or not. It is your job to record whether a page was modified before it is unpinned. Your BufferPoolManager must write the contents of a dirty Page back to disk before that object can be reused.


LRUKReplacer 将跟踪记录什么时候 Page 对象有权访问,以便于在必须要腾出空间存放新的从磁盘复制的 Page 时,逐出哪个 Page。当将 page_id 映射为 frame_id 时,要注意 STL 容器不是线程安全的。DiskScheduler 将通过 DiskManager 调度读和写操作。

Your BufferPoolManager implementation will use the LRUKReplacer and DiskScheduler classes that you created in the previous steps of this assignment. The LRUKReplacer will keep track of when Page objects are accessed so that it can decide which one to evict when it must free a frame to make room for copying a new physical page from disk. When mapping page_id to frame_id in the BufferPoolManager, again be warned that STL containers are not thread-safe. The DiskScheduler will schedule writes and reads to disk on the DiskManager.

You will need to implement the following functions defined in the header file (src/include/buffer/buffer_pool_manager.h) and in the source file (src/buffer/buffer_pool_manager.cpp):


FetchPage(page_id_t page_id)

UnpinPage(page_id_t page_id, bool is_dirty)

FlushPage(page_id_t page_id)

NewPage(page_id_t* page_id)

DeletePage(page_id_t page_id)


FlushAllPages()

当 在可利用的列表中没用可以用的 page 时并且其它所有的 page 的状态都是pinned 时,返回 nullptr。无论页面状态是 pinned 或者是 unpinned,都要刷新 page


FlushAllPages()

For FetchPage, you should return nullptr if no page is available in the free list and all other pages are currently pinned. FlushPage should flush a page regardless of its pin status.

对于 UnpinPageis_dirty 参数记录 page 在变为 pinned 时是否已经被修改(是否成为了一个脏页)。

For UnpinPage, the is_dirty parameter keeps track of whether a page was modified while it was pinned.


如果你想用 NewPage() 函数创建一个新 page 时, AllocatePage 私有方法会提供一个唯一的新页面id。同样的,DeallocatePage() 方法没有内容,它模拟释放磁盘上的页面,你应该在 DeletePage() 中调用这个方法。

The AllocatePage private method provides the BufferPoolManager a unique new page id when you want to create a new page in NewPage(). On the other hand, the DeallocatePage() method is a no-op that imitates freeing a page on the disk and you should call this in your DeletePage() implementation.


您不需要使缓冲池管理器超级高效——在每个面向公众的缓冲池管理器函数中从开始到结束持有缓冲池管理器锁就足够了。 但是,您确实需要确保您的缓冲池管理器具有合理的性能,否则在将来的项目中将会出现问题。 您可以将您的基准测试结果(QPS.1 和 QPS.2)与其他学生进行比较,看看您的实施是否太慢。

You do not need to make your buffer pool manager super efficient -- holding the buffer pool manager lock from the start to the end in each public-facing buffer pool manager function should be enough. However, you do need to ensure your buffer pool manager has reasonable performance, otherwise there will be problems in future projects. You can compare your benchmark result (QPS.1 and QPS.2) with other students and see if your implementation is too slow.


具体信息参照头文件(lru_k_replacer.hdisk_scheduler.hbuffer_pool_manager.h)以获取更详细的规范和文档。

Please refer to the header files (lru_k_replacer.h, disk_scheduler.h, buffer_pool_manager.h) for more detailed specs and documentations.



BufferPool 实现逻辑

image


NewPage() 函数实现逻辑

image


FetchPage() 函数实现逻辑

image


buffer_pool_manager.cpp 文件实现代码 (85分,除了排行榜之外均通过)

点击查看 buffer_pool_manager.cpp 代码

//===----------------------------------------------------------------------===//
//
//                         BusTub
//
// buffer_pool_manager.cpp
//
// Identification: src/buffer/buffer_pool_manager.cpp
//
// Copyright (c) 2015-2021, Carnegie Mellon University Database Group
//
//===----------------------------------------------------------------------===//

#include "buffer/buffer_pool_manager.h"

#include "common/exception.h"
#include "common/macros.h"
#include "storage/page/page_guard.h"

namespace bustub {

BufferPoolManager::BufferPoolManager(size_t pool_size, DiskManager *disk_manager, size_t replacer_k,
                                     LogManager *log_manager)
    : pool_size_(pool_size), disk_scheduler_(std::make_unique<DiskScheduler>(disk_manager)), log_manager_(log_manager) {
  // we allocate a consecutive memory space for the buffer pool
  pages_ = new Page[pool_size_];
  replacer_ = std::make_unique<LRUKReplacer>(pool_size, replacer_k);

  // 初始时,所有 page 都在 free list 中
  // Initially, every page is in the free list.
  for (size_t i = 0; i < pool_size_; ++i) {
    free_list_.emplace_back(static_cast<int>(i));
  }
}

BufferPoolManager::~BufferPoolManager() { delete[] pages_; }

auto BufferPoolManager::NewPage(page_id_t *page_id) -> Page * {
  std::unique_lock<std::mutex> lock(latch_);

  page_id_t new_page_id;

  // 获取 frame_id
  frame_id_t frame_id;

  if (!free_list_.empty()) {
    frame_id = free_list_.front();
    free_list_.pop_front();

  } else if (replacer_->Evict(&frame_id)) {
    // 对已经逐出的 page 进行处理
    Page *old_page = &pages_[frame_id];

    if (old_page->IsDirty()) {
      FlushPage(old_page->GetPageId());
    }

    page_table_.erase(old_page->GetPageId());

  } else {
    return nullptr;
  }

  // 准备新 page_id
  new_page_id = this->AllocatePage();

  // 加入 replacer_ 计算
  replacer_->RecordAccess(frame_id);
  replacer_->SetEvictable(frame_id, false);

  // 添加映射关系
  page_table_[new_page_id] = frame_id;

  // 在内存创建实体结构
  Page *page = &pages_[frame_id];
  page->ResetMemory();
  page->pin_count_ = 1;
  page->page_id_ = new_page_id;
  page->is_dirty_ = false;

  *page_id = new_page_id;

  return &pages_[frame_id];
}

auto BufferPoolManager::FetchPage(page_id_t page_id, [[maybe_unused]] AccessType access_type) -> Page * {
  std::unique_lock<std::mutex> lock(latch_);
  frame_id_t frame_id;

  // 尝试从内存中获取数据
  if (page_table_.find(page_id) != page_table_.end()) {
    frame_id = page_table_[page_id];
    pages_[frame_id].pin_count_++;
    pages_[frame_id].is_dirty_ = true;
    replacer_->RecordAccess(frame_id);
    replacer_->SetEvictable(frame_id, false);
    return &pages_[frame_id];
  }

  // 从磁盘中读取数据
  if (!free_list_.empty()) {
    frame_id = free_list_.front();
    free_list_.pop_front();

  } else if (replacer_->Evict(&frame_id)) {
    Page *old_page = &pages_[frame_id];

    if (old_page->IsDirty()) {
      FlushPage(old_page->GetPageId());
    }

    page_table_.erase(old_page->GetPageId());

  } else {
    return nullptr;
  }

  // 加入 replacer_ 处理
  replacer_->RecordAccess(frame_id);
  replacer_->SetEvictable(frame_id, false);

  // 添加映射关系
  page_table_[page_id] = frame_id;

  // 读取磁盘内容
  auto promise = disk_scheduler_->CreatePromise();
  auto future = promise.get_future();
  disk_scheduler_->Schedule({false, pages_[frame_id].data_, page_id, std::move(promise)});
  future.get();

  // 更新内存状态
  pages_[frame_id].page_id_ = page_id;
  pages_[frame_id].pin_count_ = 1;
  pages_[frame_id].is_dirty_ = false;

  return &pages_[frame_id];
}

auto BufferPoolManager::UnpinPage(page_id_t page_id, bool is_dirty, [[maybe_unused]] AccessType access_type) -> bool {
  std::unique_lock<std::mutex> lock(latch_);
  if (page_table_.find(page_id) == page_table_.end() || pages_[page_table_[page_id]].GetPinCount() == 0) {
    return false;
  }

  Page *page = &pages_[page_table_[page_id]];

  page->pin_count_--;

  if (page->pin_count_ == 0) {
    replacer_->SetEvictable(page_table_[page_id], true);
  }

  page->is_dirty_ = is_dirty;

  return true;
}

auto BufferPoolManager::FlushPage(page_id_t page_id) -> bool {
  BUSTUB_ASSERT(page_id != INVALID_PAGE_ID, "Flush Page is not allow INVALID_PAGE_ID page");

  if (page_table_.find(page_id) == page_table_.end()) {
    return false;
  }

  // 获取 frame_id
  frame_id_t frame_id = page_table_[page_id];

  // 获取 page
  Page *page = &pages_[frame_id];

  // 更新内存状态
  page->is_dirty_ = false;

  // 写入磁盘中
  auto promise = disk_scheduler_->CreatePromise();
  auto future = promise.get_future();
  disk_scheduler_->Schedule({true, page->data_, page_id, std::move(promise)});
  future.get();

  return true;
}

void BufferPoolManager::FlushAllPages() {}

auto BufferPoolManager::DeletePage(page_id_t page_id) -> bool {
  std::unique_lock<std::mutex> lock(latch_);
  if (page_table_.find(page_id) == page_table_.end()) {
    return true;
  }

  Page *page = &pages_[page_table_[page_id]];
  if (page->GetPinCount() > 0) {
    return false;
  }

  if (page->is_dirty_) {
    FlushPage(page_id);
  }

  frame_id_t frame_id = page_table_[page_id];

  page->ResetMemory();
  page_table_.erase(page->GetPageId());
  replacer_->Remove(frame_id);
  free_list_.push_back(frame_id);
  DeallocatePage(page_id);

  return true;
}

auto BufferPoolManager::AllocatePage() -> page_id_t { return next_page_id_++; }

auto BufferPoolManager::FetchPageBasic(page_id_t page_id) -> BasicPageGuard { return {this, nullptr}; }

auto BufferPoolManager::FetchPageRead(page_id_t page_id) -> ReadPageGuard { return {this, nullptr}; }

auto BufferPoolManager::FetchPageWrite(page_id_t page_id) -> WritePageGuard { return {this, nullptr}; }

auto BufferPoolManager::NewPageGuarded(page_id_t *page_id) -> BasicPageGuard { return {this, nullptr}; }

}  // namespace bustub