Join the Community
and take part in the story

OpenIO Sharder - Manual & Utilization?


#1

As a (future) Object Storage provider it’s very important for me to be sure that clients don’t hit “unusual limits”. One of the previous OpenIO limitations was the number of files that can be uploaded into a single “bucket”. This was fixed by implementing a sharding feature.

Unfortunately this feature is non-documented and I personally couldn’t find much on this subject (disclosure: I received the necessary information directly from Guillaume).

I think that most users will hit the 1+ million files mark at some point, on which case they will already be in trouble.

So here are some questions that might interest prospect OpenIO users:

  • is the sharding feature ready for production?
  • can the sharding feature be activated on a production/running cluster, or it needs to be setup at start?
  • in theory what is the recommended number of items that can be uploaded into a bucket, for non-sharded environments?

Thank you


#2

Hi Razva,

Thanks for asking these questions.

Just to give a bit of background to everybody, sharded bucket means to have virtually one single bucket, that is sharded “behind the scene” between multiple buckets (or container, using OpenIO terminology). But from the application perspective, you have only one bucket, no matter how much buckets are involved in the sharding.
The goal here is to have at the end of the day, a bucket without any limitation in term of number of objects stored.

I do agree that this feature is not yet documented. I created an issue on github to implement it, so you can follow its status: https://github.com/open-io/oio-docs/issues/99

More than that, it’s already in production on many platforms.

Very good question. For now it has to be activated during the setup.
But we will release by the end of this month (June 2017) the possibility to enable it at the container (bucket) level. So at the end of the day, you will be able to mix sharded and non-sharded container in the same cluster.

It depends on many factors, but in theory the recommended number is 100 000 objects per container.
You can go up to that, but the performance may suffer. That’s also why the sharding is valuable.

Cheers!