Neko

Building a log-structured key-value store in Rust

Table of Contents #


This blog is WIP

Introduction #

This blog is about a log structured key-value store that I’ve been working on, inspired by Bitcask and written in Rust.

Link to the Github Repo

I have been following the PingCAP Talent Plan guide. It breaks down the project into smaller building blocks and provides 5 sets of incremental tests. Highly recommended for learning about databases and Rust!


What is Bitcask #

Bitcask paper

WIP


Initial Implementation #

WIP

Serialization format #

kvs persists the commands to the disk in a binary format. The MessagePack format was chosen for its impressive serialization and deserialization performance as well as compact binary representations, while remaining easy to use.


Major update: Log Compaction #

The write-ahead log (WAL) appends new data to the log for every command. Older key-value pairs might be invalidated by newer entries or deletions. Log compaction is necessary to:

Use a similar storage format as Bitcask - The storage directory will consist of a single mutable append-only log and several immutable read-only logs holding compacted records.


The KVStore struct #

  1. mem_index - An in-memory index mapping keys to a ValueMetadata struct, currently a HashTable
  2. logs_dir - Path to the folder storing logs
  3. log_writer - A write handle to the active log
  4. log_readers - A collection of readers (HashMap<u64, BufReader>) to access all logs
  1. current_log_id - The id of the active mutable log

The ValueMetadata struct #

  1. log_pointer - Offset to the start of the log command such as ‘SET x 5’
  2. file_id - Name of the log containing the command

Initialise the store #

Determine if the store is a new or existing one Obtain directory entries of the log folder using fs::read_dir()

  1. find the maximum value - use integers for log names such as 1.db and 2.db
  2. Initialise all write and read handles
  3. Replay logs to recreate the index
  1. Create 1.db and initialise the write handle

Compaction #

  1. Lock KvStore
  2. Update the log id and writer to a new file and add a reader to the log_readers collection
  3. Replay the current logs and store the latest key-value pairs in a hashmap
  4. Persist the key-value pairs in an updated log such as 5_compacted.db
  1. Lock KVStore