You can also follow this guide by running side-by-side all the commands with the Quickstart example.
dataClay can be installed with pip:
$ python3 -m pip install dataclay
The model provider is responsible for designing and implementing class models: the data structure, the methods, and the relationships that applications can use to access and process data.
A minimal dataClay class is defined like this:
from dataclay import DataClayObject, activemethod class Employee(DataClayObject): name: str salary: float @activemethod def __init__(self, name, salary): self.name = name self.salary = salary @activemethod def get_payroll(self, hours_worked): overtime = 0 if hours_worked > 40: overtime = hours_worked - 40 return self.salary + (overtime * (self.salary / 40)) class Company(DataClayObject): name: str employees: list[Employee] @activemethod def __init__(self, name, *employees): self.name = name self.employees = list(employees)
All dataClay classes must inherit from
It is required to annotate the fields that are intended to be persisted in dataClay. The remaining fields will be ignored and will only be accessible by the local instance.
The methods should be decorated with
@activemethod to specify that they will be
executed in dataClay if the object is persistent. The rest of the methods will always be executed locally.
To connect to a dataClay, create a
Client instance and provide the host, username,
password and dataset to connect to. You can provide it as arguments or as environment variables:
DC_HOST: Host of the dataClay instance (i.e. metadata)
DC_PORT: Port of the dataClay instance (i.e. metadata)
DC_USERNAME: Username to connect to dataClay
DC_PASSWORD: Password to connect to dataClay
DC_DATASET: Dataset to connect to
from dataclay import Client client = Client( host="127.0.0.1", port="16587", username="testuser", password="s3cret", dataset="testdata" )
client.start() # do something client.stop()
You can also use the client as a context manager:
with client: # do something
To make a dataClay object persistent, call its
employee = Employee("John", 1000.0) employee.make_persistent()
Then all methods decorated with
@activemethod will be executed in dataClay:
payroll = employee.get_payroll(50) # One remote call
And all annotated attributes will be accessed and updated in dataClay, potentially reducing the local memory footprint:
employee.salary = 2000.0 # One remote call print(employee.name, employee.salary) # Two remote calls
Every dataClay object is owned by a backend. When calling
we can specify the backend where the object will be registered. If no backend is specified, the object will be
registered in a random backend.
You can get a list of backend IDs with
and register a dataClay object to one of the backends:
backend_ids = list(client.get_backends()) employee = Employee("John", 1000.0) employee.make_persistent(backend_id=backend_ids)
make_persistent() registers the current object
and all the dataClay objects referenced by it in a recursive manner:
employee = Employee("John", 1000.0) company = Company("ABC", employee) # company and employee are registered company.make_persistent() assert employee.is_registered
When you add a new reference of a dataClay object to a persistent object, it is automatically registered:
company = Company("ABC") company.make_persistent() # New dataClay object employee = Employee("John", 1000.0) # This will register the employee in dataClay company.employees = [employee] assert employee.is_registered assert employee in company.employees
However, if you mutate a persistent attribute, the change will not be reflected in dataClay:
company = Company("ABC") company.make_persistent() employee = Employee("John", 1000.0) # This will NOT register the employee in dataClay company.employees.append(employee) assert not employee.is_registered assert employee not in company.employees
This happens because when accessing
company.employees, it creates a local copy of the list.
append() only updates this local copy. To update the list in dataClay, we have to assign
the new list to the attribute. For example, this will also register the employee:
company = Company("ABC") company.make_persistent() employee = Employee("John", 1000.0) employees = company.employees employees.append(employee) # This will register the employee in dataClay company.employees = employees assert employee.is_registered assert employee in company.employees
Objects with an alias are objects that have been explicitly named (similar to naming files). Not all dataClay objects should have an alias. If an object has an alias, we can access it by using its name. On the other hand, objects without an alias can only be accessed by a reference from another object.
The alias must be unique within the dataset. If we try to create an object with an alias that already exists, an exception will be raised.
To register an object with an alias, we can use the
method and pass the alias as the first parameter:
employee = Employee("John", 1000.0) employee.make_persistent("CEO")
Then, we can retrieve the object by using
employee = Employee("John", 1000.0) employee.make_persistent("CEO") new_employee = Employee.get_by_alias("CEO") assert new_employee is employee
The alias can be removed by calling
We can add an alias to a registered object by calling
And we can get all the aliases of an object with
aliases = employee.get_aliases() assert "CFO" in aliases assert "CEO" in aliases
Get, Put & Update#
Previously we have been using dataClay objects in a object-oriented manner.
However, we can also use dataClay like a standard object store with
We can register and object with
This method always requires an alias:
employee = Employee("John", 1000.0) employee.dc_put("CEO")
And we can clone a registered object with
new_employee = employee.dc_clone() assert new_employee.name == employee.name assert new_employee is not employee
new_employee = Employee.dc_clone_by_alias("CEO")
We can update a registered object from another object of the same class with
new_employee = Employee("Marc", 7000.0) employee.dc_update(new_employee) assert employee.name == "Marc"
dc_update_by_alias(alias, from_object) classmethod:
new_employee = Employee("Marc", 7000.0) Employee.dc_update_by_alias("CEO", new_employee)