User Guide

Note

You can also follow this guide by running side-by-side all the commands with the Quickstart example.

Installation

dataClay can be installed with pip:

$ python3 -m pip install dataclay

Defining Classes

The model provider is responsible for designing and implementing class models: the data structure, the methods, and the relationships that applications can use to access and process data.

A minimal dataClay class is defined like this:

from dataclay import DataClayObject, activemethod


class Employee(DataClayObject):
    name: str
    salary: float

    @activemethod
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary

    @activemethod
    def get_payroll(self, hours_worked):
        overtime = 0
        if hours_worked > 40:
            overtime = hours_worked - 40
        return self.salary + (overtime * (self.salary / 40))


class Company(DataClayObject):
    name: str
    employees: list[Employee]

    @activemethod
    def __init__(self, name, *employees):
        self.name = name
        self.employees = list(employees)

All dataClay classes must inherit from DataClayObject.

It is required to annotate the fields that are intended to be persisted in dataClay. The remaining fields will be ignored and will only be accessible by the local instance.

The methods should be decorated with @activemethod to specify that they will be executed in dataClay if the object is persistent. The rest of the methods will always be executed locally.

Connect Client

To connect to a dataClay, create a Client instance and provide the host, username, password and dataset to connect to. You can provide it as arguments or as environment variables:

  • DC_HOST: Host of the dataClay instance (i.e. metadata)

  • DC_PORT: Port of the dataClay instance (i.e. metadata)

  • DC_USERNAME: Username to connect to dataClay

  • DC_PASSWORD: Password to connect to dataClay

  • DC_DATASET: Dataset to connect to

from dataclay import Client

client = Client(
  host="127.0.0.1", port="16587", username="testuser", password="s3cret", dataset="testdata"
)

You can start the connection by calling start() and stop it with stop():

client.start()
# do something
client.stop()

You can also use the client as a context manager:

with client:
    # do something

Make Persistent

To make a dataClay object persistent, call its make_persistent() method:

employee = Employee("John", 1000.0)
employee.make_persistent()

Then all methods decorated with @activemethod will be executed in dataClay:

payroll = employee.get_payroll(50) # One remote call

And all annotated attributes will be accessed and updated in dataClay, potentially reducing the local memory footprint:

employee.salary = 2000.0 # One remote call
print(employee.name, employee.salary) # Two remote calls

Assign backend

Every dataClay object is owned by a backend. When calling make_persistent() we can specify the backend where the object will be registered. If no backend is specified, the object will be registered in a random backend.

You can get a list of backend IDs with get_backends() and register a dataClay object to one of the backends:

backend_ids = list(client.get_backends())
employee = Employee("John", 1000.0)
employee.make_persistent(backend_id=backend_ids[0])

Recursive

By default, make_persistent() registers the current object and all the dataClay objects referenced by it in a recursive manner:

employee = Employee("John", 1000.0)
company = Company("ABC", employee)

# company and employee are registered
company.make_persistent()
assert employee.is_registered

Automatic persistence

When you add a new reference of a dataClay object to a persistent object, it is automatically registered:

company = Company("ABC")
company.make_persistent()

# New dataClay object
employee = Employee("John", 1000.0)
# This will register the employee in dataClay
company.employees = [employee]

assert employee.is_registered
assert employee in company.employees

However, if you mutate a persistent attribute, the change will not be reflected in dataClay:

company = Company("ABC")
company.make_persistent()

employee = Employee("John", 1000.0)
# This will NOT register the employee in dataClay
company.employees.append(employee)

assert not employee.is_registered
assert employee not in company.employees

This happens because when accessing company.employees, it creates a local copy of the list. The append() only updates this local copy. To update the list in dataClay, we have to assign the new list to the attribute. For example, this will also register the employee:

company = Company("ABC")
company.make_persistent()

employee = Employee("John", 1000.0)
employees = company.employees
employees.append(employee)
# This will register the employee in dataClay
company.employees = employees

assert employee.is_registered
assert employee in company.employees

Alias

Objects with an alias are objects that have been explicitly named (similar to naming files). Not all dataClay objects should have an alias. If an object has an alias, we can access it by using its name. On the other hand, objects without an alias can only be accessed by a reference from another object.

Warning

The alias must be unique within the dataset. If we try to create an object with an alias that already exists, an exception will be raised.

To register an object with an alias, we can use the make_persistent() method and pass the alias as the first parameter:

employee = Employee("John", 1000.0)
employee.make_persistent("CEO")

Then, we can retrieve the object by using get_by_alias():

employee = Employee("John", 1000.0)
employee.make_persistent("CEO")

new_employee = Employee.get_by_alias("CEO")
assert new_employee is employee

The alias can be removed by calling delete_alias() classmethod:

Employee.delete_alias("CEO")

We can add an alias to a registered object by calling add_alias():

employee.add_alias("CFO")

And we can get all the aliases of an object with get_aliases():

aliases = employee.get_aliases()
assert "CFO" in aliases
assert "CEO" in aliases

Get, Put & Update

Previously we have been using dataClay objects in a object-oriented manner. However, we can also use dataClay like a standard object store with get, put and update methods.

We can register and object with dc_put(alias). This method always requires an alias:

employee = Employee("John", 1000.0)
employee.dc_put("CEO")

And we can clone a registered object with dc_clone():

new_employee = employee.dc_clone()
assert new_employee.name == employee.name
assert new_employee is not employee

Or using dc_clone_by_alias(alias) classmethod:

new_employee = Employee.dc_clone_by_alias("CEO")

We can update a registered object from another object of the same class with dc_update(from_object):

new_employee = Employee("Marc", 7000.0)
employee.dc_update(new_employee)
assert employee.name == "Marc"

Or with dc_update_by_alias(alias, from_object) classmethod:

new_employee = Employee("Marc", 7000.0)
Employee.dc_update_by_alias("CEO", new_employee)