Binary Data (Blobs)¶
Blobs provide efficient binary data storage with typed layouts and zero-copy NumPy integration.
When to use: Use blobs for binary data like images, meshes, and raw buffers.
BlobArray for typed arrays, BlobPack for structured multi-region data.
Blob vs BlobId¶
Viper offers two DSM types for binary data:
Type |
Use Case |
Storage |
|---|---|---|
|
Small data (thumbnails, icons) |
Inline in document |
|
Large data (textures, meshes) |
Database blob API |
blob (Inline)¶
A blob field stores binary data directly in the document. No special API needed.
blob_id (Reference)¶
A blob_id is a SHA-1 hash referencing a blob managed by the Database blob API. This
requires the dedicated blob API:
>>> layout = BlobLayout()
>>> content = ValueBlob(bytes([1, 2, 3, 4, 5]))
>>> blob_id = db.create_blob(layout, content)
>>> blob_id
f07d73c81ed1a91165a75c5cc22253cc7895a8b2
>>> db.blob(blob_id)
blob(5)
>>> blob_id in db.blob_ids()
True
>>> info = db.blob_info(blob_id)
>>> info.size()
5
Content-addressable: The blob_id is computed from layout + content (SHA-1).
Identical content always produces the same blob_id.
Constraint: A document referencing a blob_id cannot be committed unless the blob
exists in the database. Always call create_blob() before using the
blob_id in a document.
ValueBlob¶
A ValueBlob holds inline binary data. To pull the bytes back out, use the
bytes() builtin (the buffer protocol) — there is no .bytes() method:
>>> data = bytes([1, 2, 3, 4, 5])
>>> blob = ValueBlob(data)
>>> bytes(blob)
b'\x01\x02\x03\x04\x05'
>>> len(blob)
5
ValueBlobId¶
A ValueBlobId references external binary data. The id is computed from the layout
and content (deterministic SHA-1):
>>> layout = BlobLayout()
>>> content = ValueBlob(bytes([1, 2, 3, 4]))
>>> blob_id = ValueBlobId(layout, content)
>>> blob_id
6b8f3ca756046be29244d9bdb6b5ca5c00468ad5
>>> ValueBlobId.try_parse("6b8f3ca756046be29244d9bdb6b5ca5c00468ad5")
6b8f3ca756046be29244d9bdb6b5ca5c00468ad5
BlobLayout - Metadata Everywhere¶
A BlobLayout describes how to interpret blob bytes. This is the Metadata Everywhere
principle applied to binary data: the layout is metadata that gives meaning to raw bytes.
>>> BlobLayout()
'uchar-1'
>>> BlobLayout('float', 3)
'float-3'
>>> BlobLayout('uint', 3)
'uint-3'
>>> BlobLayout('float', 2)
'float-2'
The layout enables:
Type-safe interpretation of binary data
Cross-platform compatibility (endianness handled)
Validation at decode time
BlobView (Read-Only)¶
A BlobView interprets an existing blob with a given layout (read-only). When
the layout has more than one component per element, indexing returns a tuple:
>>> import struct
>>> _buf = bytearray()
>>> for i in range(100):
... _ = _buf.extend(struct.pack('<fff', float(i), float(i*2), float(i*3)))
>>> raw = ValueBlob(bytes(_buf))
>>> view = BlobView(BlobLayout('float', 3), raw)
>>> view.count()
100
>>> view[0]
(0.0, 0.0, 0.0)
>>> view[99]
(99.0, 198.0, 297.0)
Use BlobView when you need to read blob data without copying.
BlobArray (Read-Write)¶
A BlobArray is a typed array backed by a blob. Writes and reads use the
NumPy buffer protocol — the array exposes a flat (N*components,) view, so
reshape it to (N, components) to assign per-element tuples:
>>> import numpy as np
>>> layout = BlobLayout('float', 3)
>>> array = BlobArray(layout, 100)
>>> np_view = np.array(array, copy=False).reshape(100, 3)
>>> np_view[0] = [1.0, 2.0, 3.0]
>>> np_view[1] = [4.0, 5.0, 6.0]
>>> view = BlobView(layout, array.blob())
>>> view[0]
(1.0, 2.0, 3.0)
>>> view[1]
(4.0, 5.0, 6.0)
BlobPack - Structured Binary Data¶
A BlobPack groups multiple named regions with different layouts into a single blob. This
is ideal for complex structures like 3D meshes.
Example: 3D Mesh Storage¶
A mesh has positions, normals, UVs, and triangle indices — each with a different layout. Define the structure with a descriptor, then create the pack:
>>> descriptor = BlobPackDescriptor()
>>> descriptor.add_region('positions', BlobLayout('float', 3), 4)
>>> descriptor.add_region('normals', BlobLayout('float', 3), 4)
>>> descriptor.add_region('uvs', BlobLayout('float', 2), 4)
>>> descriptor.add_region('indices', BlobLayout('uint', 3), 2)
>>> mesh = BlobPack(descriptor)
>>> len(mesh)
4
Fill the Mesh Data¶
Each region exposes a flat NumPy view; reshape to write per-vertex tuples:
>>> import numpy as np
>>> pos = np.array(mesh['positions'], copy=False).reshape(4, 3)
>>> pos[:] = [[-1.0, -1.0, 0.0], [1.0, -1.0, 0.0],
... [ 1.0, 1.0, 0.0], [-1.0, 1.0, 0.0]]
>>> normals = np.array(mesh['normals'], copy=False).reshape(4, 3)
>>> normals[:] = [[0.0, 0.0, 1.0]] * 4
>>> uvs = np.array(mesh['uvs'], copy=False).reshape(4, 2)
>>> uvs[:] = [[0.0, 0.0], [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]
>>> indices = np.array(mesh['indices'], copy=False).reshape(2, 3)
>>> indices[:] = [[0, 1, 2], [0, 2, 3]]
Serialize and Restore¶
>>> blob = mesh.blob()
>>> restored = BlobPack.from_blob(blob)
>>> np.array(restored['positions'], copy=False).reshape(4, 3)[0].tolist()
[-1.0, -1.0, 0.0]
>>> np.array(restored['indices'], copy=False).reshape(2, 3)[1].tolist()
[0, 2, 3]
Region Access¶
>>> 'positions' in mesh
True
>>> 'colors' in mesh
False
>>> mesh['positions'].name()
'positions'
>>> mesh['positions'].count()
4
>>> mesh['positions'].blob_layout()
'float-3'
>>> mesh['positions'].data_count()
12
>>> mesh['positions'].byte_count()
48
Why BlobPack?¶
Benefit |
Description |
|---|---|
Single blob |
All mesh data in one blob_id |
Typed regions |
Each region has its own layout |
Self-describing |
Layout metadata embedded in header |
Efficient |
Direct memory mapping, no parsing |
NumPy Integration¶
BlobArray implements the Python Buffer Protocol, enabling zero-copy interoperability
with NumPy and other array libraries.
Zero-Copy View¶
The buffer is exposed as a flat 1D array of components — reshape to give it the geometric shape you want:
>>> import numpy as np
>>> layout = BlobLayout('float', 3)
>>> positions = BlobArray(layout, 100)
>>> np_view = np.array(positions, copy=False)
>>> np_view.shape
(300,)
>>> np_view.dtype
dtype('float32')
>>> np_view.reshape(100, 3).shape
(100, 3)
Bidirectional Modifications¶
Changes through NumPy affect the original BlobArray:
>>> reshaped = np.array(positions, copy=False).reshape(100, 3)
>>> reshaped[0] = [10.0, 20.0, 30.0]
>>> view = BlobView(layout, positions.blob())
>>> view[0]
(10.0, 20.0, 30.0)
Direct Memory Access¶
>>> mv = memoryview(positions)
>>> mv.nbytes
1200
BlobPack Regions¶
BlobPackRegion also supports the Buffer Protocol — same flat-then-reshape
idiom:
>>> positions_np = np.array(mesh['positions'], copy=False).reshape(4, 3)
>>> normals_np = np.array(mesh['normals'], copy=False).reshape(4, 3)
>>> uvs_np = np.array(mesh['uvs'], copy=False).reshape(4, 2)
>>> positions_np.shape
(4, 3)
>>> uvs_np.shape
(4, 2)
>>> positions_np *= 2.0
>>> positions_np[0].tolist()
[-2.0, -2.0, 0.0]
Why Zero-Copy Matters¶
Scenario |
Without Zero-Copy |
With Zero-Copy |
|---|---|---|
1M vertices |
Copy 12MB |
Share pointer |
GPU upload |
Python → copy → C++ |
Direct access |
Scientific compute |
Data duplication |
In-place ops |
Blob in Attachments¶
Blobs can be used in attachments:
// Small data inline
struct Thumbnail {
uint16 width;
uint16 height;
blob data;
// Inline binary
};
// Large data by reference
struct Texture {
uint16 width;
uint16 height;
blob_id pixels;
// External reference
};
Storing Blobs in Database¶
Inline Blobs¶
Inline blobs are stored directly in the document. The Tuto fixture
exposes a Thumbnail struct with a blob field, attached to User as
avatar:
>>> user_key = TUTO_A_USER_AVATAR.create_key()
>>> thumb = TUTO_A_USER_AVATAR.create_document()
>>> thumb.width = 64
>>> thumb.height = 64
>>> thumb.data = ValueBlob(bytes([1, 2, 3, 4, 5]))
>>> thumb
{width=64, height=64, data=blob(5)}
>>> ms = CommitMutableState(db.initial_state())
>>> ms.attachment_mutating().set(TUTO_A_USER_AVATAR, user_key, thumb)
>>> avatar_commit = db.commit_mutations("Add avatar", ms)
>>> db.state(avatar_commit).attachment_getting().get(TUTO_A_USER_AVATAR, user_key)
Optional({width=64, height=64, data=blob(5)})
Referenced Blobs (blob_id)¶
Use the database blob API to store and retrieve. The Tuto fixture
exposes a Texture struct with a blob_id field, attached to User
as portrait:
>>> mesh_layout = BlobLayout()
>>> mesh_content = ValueBlob(bytes([10, 20, 30, 40]))
>>> texture_blob_id = db.create_blob(mesh_layout, mesh_content)
>>> texture = TUTO_A_USER_PORTRAIT.create_document()
>>> texture.width = 1024
>>> texture.height = 1024
>>> texture.pixels = texture_blob_id
>>> ms = CommitMutableState(db.state(avatar_commit))
>>> ms.attachment_mutating().set(TUTO_A_USER_PORTRAIT, user_key, texture)
>>> portrait_commit = db.commit_mutations("Add portrait", ms)
>>> db.state(portrait_commit).attachment_getting().get(TUTO_A_USER_PORTRAIT, user_key)
Optional({width=1024, height=1024, pixels=...})
Retrieving Blobs¶
>>> stored_id = db.create_blob(BlobLayout(), ValueBlob(bytes([10, 20, 30, 40, 50])))
>>> bytes(db.blob(stored_id))
b'\n\x14\x1e(2'
>>> db.read_blob(stored_id, size=2, offset=0)
blob(2)
BlobStream (Large Blobs)¶
For very large blobs, use streaming to avoid loading everything in memory.
Required for blobs > 2GB: The standard create_blob() API has a 2GB size limit. Use
BlobStream for larger data:
>>> stream = db.blob_stream_create(BlobLayout('uchar', 1), size=10)
>>> db.blob_stream_append(stream, ValueBlob(bytes([1, 2, 3, 4, 5])))
>>> db.blob_stream_append(stream, ValueBlob(bytes([6, 7, 8, 9, 10])))
>>> stream_blob_id = db.blob_stream_close(stream)
>>> bytes(db.blob(stream_blob_id))
b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n'
This is essential for:
3D meshes with millions of vertices
Video/audio data
When to Use Each Type¶
Scenario |
Recommendation |
|---|---|
Thumbnails (< 64KB) |
Use |
Textures (> 1MB) |
Use |
Mesh geometry |
Use |
Icons, small images |
Use |
Audio/video |
Use |
What’s Next¶
Serialization - JSON and binary encoding