This blog is about a log structured key-value store that I’ve been working on, inspired by Bitcask and written in Rust.
Link to the Github Repo
I have been following the PingCAP Talent Plan guide. It breaks down the project into smaller building blocks and provides 5 sets of incremental tests. Highly recommended for learning about databases and Rust!
I presented a talk on this project at the October 2024 Rust Bangalore Meetup at Couchbase! The slides can be found here.
kvs persists the commands to the disk in a binary format. The MessagePack format was chosen for its impressive serialization and deserialization performance as well as compact binary representations, while remaining easy to use.
The write-ahead log (WAL) appends new data to the log for every command. Older key-value pairs might be invalidated by newer entries or deletions. Log compaction is necessary to:
Use a similar storage format as Bitcask - The storage directory will consist of a single mutable append-only log and several immutable read-only logs holding compacted records.
mem_index - An in-memory index mapping keys to a ValueMetadata struct, currently a HashTable
logs_dir - Path to the folder storing logs
log_writer - A write handle to the active log
log_readers - A collection of readers (HashMap<u64, BufReader<File>>
) to access all logs
current_log_id - The id of the active mutable log
Determine if the store is a new or existing one Obtain directory entries of the log folder using fs::read_dir()
Compaction is kicked off when a log size hits a certain threshold, such as 10kB. All writes are made to a new log while the current log is compacted. (The store was single-threaded during the initial compaction implementation and did not require locks)
Update the log id and writer to a new file and add a reader to the log_readers collection
Replay the current logs and store the latest key-value pairs in a hashmap
Persist the key-value pairs in an updated log such as 5_compacted.db
Update KVStore
A potential problem with this compaction approach is that if a key update or delete command is appended to a newer log, the stale entry in the older log is not compacted.
A possible approach to solve this issue:
The key-value store is a server that listens for commands on the specified address. You may use a tool such as netcat instead of the hobbes client to send commands
echo "<length_of_cmd>\r\nSET\r\n<key>\r\n<val>\r\n" | nc <addr> <port>
echo "10\r\nGET\r\nfoo\r\n" | nc localhost 4000
echo "15\r\nSET\r\nfoo\r\nbar\r\n" | nc localhost 4000
echo "9\r\nRM\r\nfoo\r\n" | nc localhost 4000
The length of the command is prepended before being sent. For instance, GET\r\nfoo\r\n
is 10 bytes long. 10\r\n
is prefixed to the command and sent.
The command and arguments are separated and terminated by a carriage return line feed (CRLF)(\r\n
).
Hobbes offers pluggable storage backends. Currently, there are two choices: