Friday, March 12, 2021

ZFS Object Store - Why are there 3 APIs?

 In talking to others that are new to object stores, there is always a complicated conversation on why there are different API interfaces. I will try to go through the history of object stores and talk about the reason why.


First, I want to say up front I am not going to talk about WebDav.  From what I can find, Web Dav is more of a web page authoring platform.

Next I am going to define a few terms.

OPC - Oracle Public Cloud. This is the Oracle Public Cloud Offering, though there are flavors of the OPC that use the same GUI and interfaces (Cloud@Customer for example). When I refer to OPC, I am talking about anything that uses the standard Oracle Public Cloud BUI interfaces.

OCI - Oracle Cloud Interface.   This is one of the most confusing terms used when talking about the ZFS object store.  For most people, when referring to OCI, they are talking about the interface to the Oracle Public Cloud (OPC) offerings in general. On ZFS, this refers to a specific API for the OPC object store.  When I talk about OCI, I am talking about the object store interface.

OCI Gen 1/OCI Gen 2.  In the history of the Oracle Public Cloud OCI, Generation 1 was Version one of the Oracle Public Cloud.  The object store in the first version utilized the Swift interface (which I will get into later).  Of course, following Generation 1 (Gen 1), there was Generation 2 (Gen 2) which uses a different API.  When I refer to these terms, I am referring to the Object store APIs available in the OPC.

S3 or S3 API. When you think of S3, you are probably thinking AWS. The reality is, AWS built the standard for the cloud object store, but many other vendors offer an object store, on-prem or publicly, that follow the AWS standard.  This is the most commonly used Object Store API standard.

Large objects. This is a special term when talking about an object store. Object stores typically have a limit of 5GB on the size of objects. This made sense in the beginning as object stores where not as widely used for all kinds of objects as they are today.  As data grew the need for Object Stores to handle "Large Objects" becomes clear. When I go through the history, along with features of  object stores, "Large Objects" will refer to any object greater than 5 GB.

Bucket : There may be other terms used to describe a "bucket" or container, but a bucket is the high level identifier where objects are stores.  You can think of it as file drawer, or anything thing else that reminds you that it is a level of separation. This is what I mean  when I refer to a Bucket.

Tenancy : In todays cloud, resources are shared, and each user is a "Tenant" in the Multi-tenant cloud paradigm.  This allows for the sharing of resources while still providing isolation.

Now some history

When the object store world began, there was Swift.  Swift was a simple object store, with a simple interface. OCI Gen 1 uses swift, and ZFS offers Swift as an API interface. Swift was designed more for command line interaction than GUI.  If you look for tools to access a Swift object store you find that "curl", and the "swift" CLI (built in python) are the most common.

Swift  : Below are the highlights of Swift.

  • Swift V1 requires a 2 step authentication, though V2 removed this restriction. A username and password are passed to swift and an authentication token is returned. The token is then used to all subsequent calls.
  • Swift multi-tenancy. Because Swift uses a simple Username/Password authentication (though the idea of tenancy was added later), it does not work well as a shared cloud resource. V1 of swift had no concept of multi-tenancy so every bucket name had to be unique. There was no easy way to tie storage utilization to a specific "tenant", especially when multiple users shared a tenancy.
  • Support of large objects was originally an issue for Swift, and there multiple ways of dealing with the support.  Swift eventually added Dynamic Large Object (DLO) support which allowed for the storage of large objects.  Some vendors/applications using Swift took advantage of DLO, some wrote their own.  The swift CLI for example, uses it's own method of storing large objects by created a "shadow" bucket containing individual pieces (5GB per piece) and then storing a manifest file that tells swift where the objects are. Many other vendors (including Oracle) wrote their own large object support that can only be read by them.
Issues - As you can see there were many issues with Swift and this explains why most vendors have moved away from swift. OCI Gen 1 was a swift V2 interface and it is still available to access object stores. ZFS uses the swift V1 interface.
  • Authentication was difficult
  • no multi-tenancy support for users
  • inability to create cost models based on tenancy
  • no true standard for large objects.
S3 - Along came S3 to provide solutions to these issues.

To solve these issues, AWS came up with a standard that solved these issues and below are the highlights.

  • Authentication is based on Key/passcode. The Key/passcode is uniquely generated for each user of a tenancy.  When accessing the object store, the Key/passcode allows the object store to identify the tenancy and provide the necessary isolation, and of course billing.
  • Large object support was provided.  S3 added the idea of "multi-part uploads". When the client prepares to upload an object, it tells the S3 object store that this is a multi-part upload, along with how many pieces are being uploaded.  This allows the client to break the upload into multi parts (of 5GB or less) and upload each part individually even in parallel. the object store will then join all the parts into a single large object.  The process is reversed for downloads.
  • Having a standard provides for tools like rsync, and cloudberry to be able to synchronize the object store (regardless of vendor) with a file system, upload files through a windows client, even mount the object store a file system through Fuse.

Issues - As you can see many of the issues with swift were corrected, and this is now the most widely used API for an object store.


OCI Gen 2 - Along with the second version of the Oracle Public Cloud came a new API fo the object store.

There was one remaining issue with the AWS API that was solved by this interface. The idea of compartments within a tenancy.  Within a tenancy, the bucket must be unique, but when a new bucket is created, it is created in a compartment. This gives an additional level of organization for objects.


A few of the highlights of OCI Gen 2 are.

  • Authentication uses the RSA public/private key model which is more secure than AWS authentication.
  • The idea of compartments is supported.

Issues - As you can see many of the issues with swift were corrected, and the concept of compartments was added.

  • The only issue I've encountered is the lack of a GUI interface for uploading objects.


SUMMARY :


In summary, on ZFS, all 3 object store are available as separate object stores. pick your object store.

Also to note, in the OPC, all 3 object stores are compatible and can be used to access the same object. This is not the case with ZFS.


No comments:

Post a Comment