ChubaoFS(Chubao File System) is a distributed fle system that is designed to natively support large scale container platforms.
High Level Architecture
ChubaoFS consists of a metadata subsystem, a data subsystem, and a resource manager, and can be accessed by different clients (as a set of application processes) hosted on the containers through different file system instances called volumes.
The metadata subsystem stores the file metadata, and consists of a set of meta nodes. Each meta node consists of a set of meta partitions.
The data subsystem stores the file contents, and consists of a set of data nodes. Each data node consists of a set of data partitions.
The volume is a logical concept in ChubaoFS and consists of one or multiple meta partitions and one or multiple data partitions. Each partition can only be assigned to a single volume. From a client’s perspective, the volume can be viewed as a file system instance that contains data accessible by the containers. A volume can be mounted to multiple containers so that files can be shared among different clients simultaneously, and needs to be created at the very beginning before the any file operation. A ChubaoFS cluster deployed at each data center can have hundreds of thousands of volumes, whose data sizes vary from a few gigabytes to several terabytes.
Generally speaking, the resource manager periodically communicates with the metadata subsystem and data subsystem to manage the meta nodes and data nodes, respectively. Each client periodically communicates with the resource manager to obtain the up-to-date view of the mounted volume. A file operation usually initiates the communications from the client to the corresponding meta node and data node directly, without the involvement of the resource manager. The updated view of the mounted volume, as well as the file metadata are usually cached at the client side to reduce the communication overhead.
General-Purpose Storage Engine
To reduce the storage cost, many applications and services are served from the same shared storage infrastructure (aka “multi-tenancy”). The workloads of different applications and services are mixed together, where the file size can vary from a few kilobytes to hundreds of gigabytes, and the files can be written in a sequential or random fashion. For example, the log files usually need to be written sequentially in the execution order of the code; some data analytics in the machine learning domain are based on the data stored on the underlying file system; and a database engine running on top of the file system can modify the stored data frequently. A dedicated file system needs to be able to serve for all these different workloads with excellent performance.
Strong Replication Consistency
E-commence venders who move their line of business applications to the cloud usually prefer strong consistency. For example, an image processing service may not want to provide the customer with an outdated image that does not match the product description. This can be easily achieved if there is only one copy of the file. But to ensure a distributed file system to continue operating properly in the event of machines failures, which can be caused by various reasons such as faulty hard drives, bad motherboards, etc, there are usually multiple replicas of the same file. As a result, in a desired file system, the data read from any of the replicas must be consistent with each other.