User Guide¶
Note
You can also follow this guide by running side-by-side all the commands with the Quickstart example.
Installation¶
dataClay can be installed with pip:
$ python3 -m pip install dataclay
Defining Classes¶
The model provider is responsible for designing and implementing class models: the data structure, the methods, and the relationships that applications can use to access and process data.
A minimal dataClay class is defined like this:
from dataclay import DataClayObject, activemethod
class Employee(DataClayObject):
name: str
salary: float
@activemethod
def __init__(self, name, salary):
self.name = name
self.salary = salary
@activemethod
def get_payroll(self, hours_worked):
overtime = 0
if hours_worked > 40:
overtime = hours_worked - 40
return self.salary + (overtime * (self.salary / 40))
class Company(DataClayObject):
name: str
employees: list[Employee]
@activemethod
def __init__(self, name, *employees):
self.name = name
self.employees = list(employees)
All dataClay classes must inherit from DataClayObject
.
It is required to annotate the fields that are intended to be persisted in dataClay. The remaining fields will be ignored and will only be accessible by the local instance.
The methods should be decorated with @activemethod
to specify that they will be
executed in dataClay if the object is persistent. The rest of the methods will always be executed locally.
Connect Client¶
To connect to a dataClay, create a Client
instance and provide the host, username,
password and dataset to connect to. You can provide it as arguments or as environment variables:
DC_HOST
: Host of the dataClay instance (i.e. metadata)DC_PORT
: Port of the dataClay instance (i.e. metadata)DC_USERNAME
: Username to connect to dataClayDC_PASSWORD
: Password to connect to dataClayDC_DATASET
: Dataset to connect to
from dataclay import Client
client = Client(
host="127.0.0.1", port="16587", username="testuser", password="s3cret", dataset="testdata"
)
You can start the connection by calling start()
and stop it with stop()
:
client.start()
# do something
client.stop()
You can also use the client as a context manager:
with client:
# do something
Make Persistent¶
To make a dataClay object persistent, call its make_persistent()
method:
employee = Employee("John", 1000.0)
employee.make_persistent()
Then all methods decorated with @activemethod
will be executed in dataClay:
payroll = employee.get_payroll(50) # One remote call
And all annotated attributes will be accessed and updated in dataClay, potentially reducing the local memory footprint:
employee.salary = 2000.0 # One remote call
print(employee.name, employee.salary) # Two remote calls
Assign backend¶
Every dataClay object is owned by a backend. When calling make_persistent()
we can specify the backend where the object will be registered. If no backend is specified, the object will be
registered in a random backend.
You can get a list of backend IDs with get_backends()
and register a dataClay object to one of the backends:
backend_ids = list(client.get_backends())
employee = Employee("John", 1000.0)
employee.make_persistent(backend_id=backend_ids[0])
Recursive¶
By default, make_persistent()
registers the current object
and all the dataClay objects referenced by it in a recursive manner:
employee = Employee("John", 1000.0)
company = Company("ABC", employee)
# company and employee are registered
company.make_persistent()
assert employee.is_registered
Automatic persistence¶
When you add a new reference of a dataClay object to a persistent object, it is automatically registered:
company = Company("ABC")
company.make_persistent()
# New dataClay object
employee = Employee("John", 1000.0)
# This will register the employee in dataClay
company.employees = [employee]
assert employee.is_registered
assert employee in company.employees
However, if you mutate a persistent attribute, the change will not be reflected in dataClay:
company = Company("ABC")
company.make_persistent()
employee = Employee("John", 1000.0)
# This will NOT register the employee in dataClay
company.employees.append(employee)
assert not employee.is_registered
assert employee not in company.employees
This happens because when accessing company.employees
, it creates a local copy of the list.
The append()
only updates this local copy. To update the list in dataClay, we have to assign
the new list to the attribute. For example, this will also register the employee:
company = Company("ABC")
company.make_persistent()
employee = Employee("John", 1000.0)
employees = company.employees
employees.append(employee)
# This will register the employee in dataClay
company.employees = employees
assert employee.is_registered
assert employee in company.employees
Alias¶
Objects with an alias are objects that have been explicitly named (similar to naming files). Not all dataClay objects should have an alias. If an object has an alias, we can access it by using its name. On the other hand, objects without an alias can only be accessed by a reference from another object.
Warning
The alias must be unique within the dataset. If we try to create an object with an alias that already exists, an exception will be raised.
To register an object with an alias, we can use the make_persistent()
method and pass the alias as the first parameter:
employee = Employee("John", 1000.0)
employee.make_persistent("CEO")
Then, we can retrieve the object by using get_by_alias()
:
employee = Employee("John", 1000.0)
employee.make_persistent("CEO")
new_employee = Employee.get_by_alias("CEO")
assert new_employee is employee
The alias can be removed by calling delete_alias()
classmethod:
Employee.delete_alias("CEO")
We can add an alias to a registered object by calling add_alias()
:
employee.add_alias("CFO")
And we can get all the aliases of an object with get_aliases()
:
aliases = employee.get_aliases()
assert "CFO" in aliases
assert "CEO" in aliases
Get, Put & Update¶
Previously we have been using dataClay objects in a object-oriented manner.
However, we can also use dataClay like a standard object store with
get
, put
and update
methods.
We can register and object with dc_put(alias)
.
This method always requires an alias:
employee = Employee("John", 1000.0)
employee.dc_put("CEO")
And we can clone a registered object with dc_clone()
:
new_employee = employee.dc_clone()
assert new_employee.name == employee.name
assert new_employee is not employee
Or using dc_clone_by_alias(alias)
classmethod:
new_employee = Employee.dc_clone_by_alias("CEO")
We can update a registered object from another object of the same class with dc_update(from_object)
:
new_employee = Employee("Marc", 7000.0)
employee.dc_update(new_employee)
assert employee.name == "Marc"
Or with dc_update_by_alias(alias, from_object)
classmethod:
new_employee = Employee("Marc", 7000.0)
Employee.dc_update_by_alias("CEO", new_employee)