Neko

Building a log-structured key-value store in Rust

Table of Contents #


This blog is WIP

Introduction #

This blog is about a log structured key-value store that I’ve been working on, inspired by Bitcask and written in Rust.

Link to the Github Repo

I have been following the PingCAP Talent Plan guide. It breaks down the project into smaller building blocks and provides 5 sets of incremental tests. Highly recommended for learning about databases and Rust!


What is Bitcask #

Bitcask paper

WIP


Initial Implementation #

WIP

Serialization format #

kvs persists the commands to the disk in a binary format. The MessagePack format was chosen for its impressive serialization and deserialization performance as well as compact binary representations, while remaining easy to use.


Major update: Log Compaction #

The write-ahead log (WAL) appends new data to the log for every command. Older key-value pairs might be invalidated by newer entries or deletions. Log compaction is necessary to:

Use a similar storage format as Bitcask, where multiple files are present in the storage directory. The directory will consist of a single mutable file opened for appending the log entries and several immutable files holding records which can be read.


The KVStore struct #

  1. mem_index - An in-memory index mapping keys to log pointers, currently a HashTable(can be swapped out later with a better data structure such as a SkipList)

  2. folder_path - Path to the folder storing logs

  3. log_writer - A write handle to the active log

  4. log_readers - A set of readers (Vec<BufReader>) to access all logs - This minimises the file open overhead for repeated GET commands.

  5. compactable_bytes - The number of redundant bytes that can be compacted in the current log

Store initialisation #

Determine if the store is a new or existing one #

Obtain directory entries for the given folder using fs::read_dir() and find the max value - use numbers for log filenames such as 1.db and 2.db