User Guide#


You can also follow this guide by running side-by-side all the commands with the Quickstart example.


dataClay can be installed with pip:

$ python3 -m pip install dataclay

Defining Classes#

The model provider is responsible for designing and implementing class models: the data structure, the methods, and the relationships that applications can use to access and process data.

A minimal dataClay class is defined like this:

from dataclay import DataClayObject, activemethod

class Employee(DataClayObject):
    name: str
    salary: float

    def __init__(self, name, salary): = name
        self.salary = salary

    def get_payroll(self, hours_worked):
        overtime = 0
        if hours_worked > 40:
            overtime = hours_worked - 40
        return self.salary + (overtime * (self.salary / 40))

class Company(DataClayObject):
    name: str
    employees: list[Employee]

    def __init__(self, name, *employees): = name
        self.employees = list(employees)

All dataClay classes must inherit from DataClayObject.

It is required to annotate the fields that are intended to be persisted in dataClay. The remaining fields will be ignored and will only be accessible by the local instance.

The methods should be decorated with @activemethod to specify that they will be executed in dataClay if the object is persistent. The rest of the methods will always be executed locally.

Connect Client#

To connect to a dataClay, create a Client instance and provide the host, username, password and dataset to connect to. You can provide it as arguments or as environment variables:

  • DC_HOST: Host of the dataClay instance (i.e. metadata)

  • DC_PORT: Port of the dataClay instance (i.e. metadata)

  • DC_USERNAME: Username to connect to dataClay

  • DC_PASSWORD: Password to connect to dataClay

  • DC_DATASET: Dataset to connect to

from dataclay import Client

client = Client(
  host="", port="16587", username="testuser", password="s3cret", dataset="testdata"

You can start the connection by calling start() and stop it with stop():

# do something

You can also use the client as a context manager:

with client:
    # do something

Make Persistent#

To make a dataClay object persistent, call its make_persistent() method:

employee = Employee("John", 1000.0)

Then all methods decorated with @activemethod will be executed in dataClay:

payroll = employee.get_payroll(50) # One remote call

And all annotated attributes will be accessed and updated in dataClay, potentially reducing the local memory footprint:

employee.salary = 2000.0 # One remote call
print(, employee.salary) # Two remote calls

Assign backend#

Every dataClay object is owned by a backend. When calling make_persistent() we can specify the backend where the object will be registered. If no backend is specified, the object will be registered in a random backend.

You can get a list of backend IDs with get_backends() and register a dataClay object to one of the backends:

backend_ids = list(client.get_backends())
employee = Employee("John", 1000.0)


By default, make_persistent() registers the current object and all the dataClay objects referenced by it in a recursive manner:

employee = Employee("John", 1000.0)
company = Company("ABC", employee)

# company and employee are registered
assert employee.is_registered

Automatic persistence#

When you add a new reference of a dataClay object to a persistent object, it is automatically registered:

company = Company("ABC")

# New dataClay object
employee = Employee("John", 1000.0)
# This will register the employee in dataClay
company.employees = [employee]

assert employee.is_registered
assert employee in company.employees

However, if you mutate a persistent attribute, the change will not be reflected in dataClay:

company = Company("ABC")

employee = Employee("John", 1000.0)
# This will NOT register the employee in dataClay

assert not employee.is_registered
assert employee not in company.employees

This happens because when accessing company.employees, it creates a local copy of the list. The append() only updates this local copy. To update the list in dataClay, we have to assign the new list to the attribute. For example, this will also register the employee:

company = Company("ABC")

employee = Employee("John", 1000.0)
employees = company.employees
# This will register the employee in dataClay
company.employees = employees

assert employee.is_registered
assert employee in company.employees


Objects with an alias are objects that have been explicitly named (similar to naming files). Not all dataClay objects should have an alias. If an object has an alias, we can access it by using its name. On the other hand, objects without an alias can only be accessed by a reference from another object.


The alias must be unique within the dataset. If we try to create an object with an alias that already exists, an exception will be raised.

To register an object with an alias, we can use the make_persistent() method and pass the alias as the first parameter:

employee = Employee("John", 1000.0)

Then, we can retrieve the object by using get_by_alias():

employee = Employee("John", 1000.0)

new_employee = Employee.get_by_alias("CEO")
assert new_employee is employee

The alias can be removed by calling delete_alias() classmethod:


We can add an alias to a registered object by calling add_alias():


And we can get all the aliases of an object with get_aliases():

aliases = employee.get_aliases()
assert "CFO" in aliases
assert "CEO" in aliases

Get, Put & Update#

Previously we have been using dataClay objects in a object-oriented manner. However, we can also use dataClay like a standard object store with get, put and update methods.

We can register and object with dc_put(alias). This method always requires an alias:

employee = Employee("John", 1000.0)

And we can clone a registered object with dc_clone():

new_employee = employee.dc_clone()
assert ==
assert new_employee is not employee

Or using dc_clone_by_alias(alias) classmethod:

new_employee = Employee.dc_clone_by_alias("CEO")

We can update a registered object from another object of the same class with dc_update(from_object):

new_employee = Employee("Marc", 7000.0)
assert == "Marc"

Or with dc_update_by_alias(alias, from_object) classmethod:

new_employee = Employee("Marc", 7000.0)
Employee.dc_update_by_alias("CEO", new_employee)