Segmenting Huge Blobs - Part II

I implemented the ‘multipart’/’segmented blob’ upload as described in my previous post. This ‘segmented blob’ API is not yet an official part of the Deltacloud API as it still needs to be discussed/improved/prodded etc. Right now I’ve implemented this functionality for the Openstack and EC2 drivers (i.e. swift and S3 respectively) as a proof of concept - so azure and google storage are still pending.

For the EC2 driver, some changes were required in the aws rubygem and I’m waiting for those to be pulled in - otherwise this functionality (for S3 at least) is not yet available. You could build the gem with this code included from my fork.

The notes here show the “segmented blob” upload in action using cURL examples:


1. Initiate:

Need to set the ‘X-Deltacloud-BlobType’ header to ‘segmented’:

Request:

[marios@name ~]$ curl -v --user "KEY:PASS" -X PUT -H "X-Deltacloud-BlobType: segmented"
http://localhost:3001/api/buckets/mariosbucket/segmentedblob

> PUT /api/buckets/mariosbucket/segmentedblob HTTP/1.1
> Authorization: Basic DdCeFNttJQJNkdfUVDhXdQUlXRM2KZVEE62N2lLc05MeitIMi9Lg==
> X-Deltacloud-BlobType: segmented
> User-Agent: curl/7.24.0
> Host: localhost:3001
> Accept: */*

Response:

Note the ‘segmented blob ID’ returned in the X-Deltacloud-SegmentedBlob header. This will be needed to upload any segments and to complete the segmented blob upload.

< HTTP/1.1 204 No Content
< X-Frame-Options: sameorigin
< X-XSS-Protection: 1; mode=block
< X-Deltacloud-SegmentedBlob: WL7dk8sqbtk3Rg6410mLF81T6fpof2Ha_0Upt
< Server: Apache-Deltacloud/1.0.4
< X-Deltacloud-Driver: ec2
< Date: Tue, 23 Oct 2012 13:16:26 GMT
< Connection: close
<
* Closing connection #0

2. Upload Segment:

As before clients must specify that this is a segmented blob operation with the X-DeltacloudBlobType header set to ‘segmented’. You need to specify the segmented blob ID obtained from step 1 with the X-Deltacloud-SegmentedBlob header, and also specify the order that this given segment should appear when the blob is reassembled using the X-Deltacloud-SegmentOrder header.

One final note here: each cloud provider may specify a minimum for the size of a blob ‘segment’. For example, in AWS S3 this is 5 MB whereas Openstack doesn’t specify a minimum. The back-end provider specified minimums notwithstanding, for Deltacloud a blob segment must be at least 112 Kbytes or you’ll get a “500 Internal Server Error”. The reason for this is because anything with a body less than 112 KBytes is stored by the Thin server in memory rather than being moved to a tempfile. Which means that it wouldn’t hit our Thin monkeypatch for streaming blobs - which is where the process of sending the blob data out to the providers begins.

Request:

[marios@name ~]$ curl -v --user "KEY:PASS" -H "X-Deltacloud-BlobType: segmented"
 -H "X-Deltacloud-SegmentedBlob: WL7dk8sqbtk3Rg6410mLF81T6fpof2Ha_0Upt"
 -H "X-Deltacloud-SegmentOrder: 2" --upload-file "/home/herp/derp/files/blob2.txt"
 http://localhost:3001/api/buckets/mariosbucket/segmentedblob

> PUT /api/buckets/mariosbucket/segmentedblob HTTP/1.1
> Authorization: Basic DdCeFNttJQJNkdfUVDhXdQUlXRM2KZVEE62N2lLc05MeitIMi9Lg
> X-Deltacloud-BlobType: segmented
> X-Deltacloud-SegmentedBlob: WL7dk8sqbtk3Rg6410mLF81T6fpof2Ha_0Upt
> X-Deltacloud-SegmentOrder: 2
> User-Agent: curl/7.24.0
> Host: localhost:3001
> Accept: */*
> Content-Length: 7454095
> Expect: 100-continue
>

Response:

The response contains the blob segment ID in the ‘X-Deltacloud-BlobSegmentId’ header. This will be needed in order to complete the segmented blob upload in step 3.

< HTTP/1.1 204 No Content
< X-Frame-Options: sameorigin
< X-XSS-Protection: 1; mode=block
< X-Deltacloud-BlobSegmentId: e7b94a1e959ca066026da3ec63aad321
< Server: Apache-Deltacloud/1.0.4
< X-Deltacloud-Driver: ec2
< Date: Tue, 23 Oct 2012 13:52:57 GMT
< Connection: close
<
* Closing connection #0

This step needs to be repeated for as many segments as you have divided your blob into, each time specifying a different value for the ‘X-Deltacloud-SegmentOrder’ header.


3. Complete Upload:

Finally you need to provide the segmented blob ‘manifest’ in the message body. This specifies the order and ID of each blob segment you have uploaded - e.g. when you uploaded blob segment with “X-Deltacloud-SegmentOrder: 1” and got “X-Deltacloud-BlobSegmentId: FOO” in the response, you would specify this as “1=FOO, 2=BAR… “ etc for each segment. As before, the “X-Deltacloud-BlobType” and “X-Deltacloud-SegmentedBlob” headers must be provided.

Request:

curl -v --user "KEY:PASS" -X PUT -H "X-Deltacloud-BlobType: segmented"
-H "X-Deltacloud-SegmentedBlob: WL7dk8sqbtk3Rg6410mLF81T6fpof2Ha_0Upt"
-d "1=78f871f6f01673a4aca05b1f8e26df08, 2=e7b94a1e959ca066026da3ec63aad321"
http://localhost:3001/api/buckets/mariosbucket/segmentedblob?format=xml

> PUT /api/buckets/mariosbucket/segmentedblob?format=xml HTTP/1.1
> Authorization: Basic DdCeFNttJQJNkdfUVDhXdQUlXRM2KZVEE62N2lLc05MeitIMi9Lg
> X-Deltacloud-BlobType: segmented
> X-Deltacloud-SegmentedBlob: WL7dk8sqbtk3Rg6410mLF81T6fpof2Ha_0Upt
> User-Agent: curl/7.24.0
> Host: localhost:3001
> Accept: */*
> Content-Length: 70
> Content-Type: application/x-www-form-urlencoded

Response:

< HTTP/1.1 200 OK
< Content-Type: application/xml
< Server: Apache-Deltacloud/1.0.4
< X-Deltacloud-Driver: ec2
< Content-Length: 430
< ETag: 59f1ee57a61a3ce2c2d5d8923ed77ee9
< Cache-Control: max-age=0, private, must-revalidate
< Date: Thu, 25 Oct 2012 10:13:45 GMT
< Connection: keep-alive
<
<?xml version='1.0' encoding='utf-8' ?>
<blob href='http://localhost:3001/api/buckets/mariosbucket/segmentedblob' id='segmentedblob'>
  <bucket>mariosbucket</bucket>
  <content_length></content_length>
  <content_type></content_type>
  <last_modified></last_modified>
  <user_metadata>
  </user_metadata>
  <content href='http://localhost:3001/api/buckets/mariosbucket/segmentedblob/content' rel='blob_content'></content>
</blob>


blog comments powered by Disqus
RSS Feed Icon site.xml
RSS Feed Icon tripleo.xml