Docker-based deployment

The easiest way to deploy dataClay is using the provided docker image. You can deploy a minimal dataClay instance with the following docker-compose:

version: '3.9'
services:

  redis:
    image: redis:latest
    ports:
      - 6379:6379

  metadata-service:
    image: "ghcr.io/bsc-dom/dataclay:edge"
    depends_on:
      - redis
    ports:
      - 16587:16587
    environment:
      - DATACLAY_KV_HOST=redis
      - DATACLAY_PASSWORD=s3cret
      - DATACLAY_USERNAME=testuser
      - DATACLAY_DATASET=testdata
      - DATACLAY_LOGLEVEL=info
    command: python -m dataclay.metadata

  backend:
    image: "ghcr.io/bsc-dom/dataclay:edge"
    depends_on:
      - redis
    environment:
      - DATACLAY_KV_HOST=redis
      - DATACLAY_LOGLEVEL=info
    command: python -m dataclay.backend
    volumes:
      - ./model:/workdir/model:ro

Note

All dataClay classes must be saved in the model folder to allow access by the backends. In more complex deployments, the class models will be embedded in the docker image, installed with pip, or deployed somehow.

To deploy the docker-compose just run:

docker compose up -d

This will deploy a dataClay instance with a single backend. To deploy three backends you can add two more backend services to the previous docker-compose:

backend_2:
  image: "ghcr.io/bsc-dom/dataclay:edge"
  depends_on:
    - redis
  environment:
    - DATACLAY_KV_HOST=redis
  command: python -m dataclay.backend
  volumes:
    - ./model:/workdir/model:ro

backend_3:
  image: "ghcr.io/bsc-dom/dataclay:edge"
  depends_on:
    - redis
  environment:
    - DATACLAY_KV_HOST=redis
  command: python -m dataclay.backend
  volumes:
    - ./model:/workdir/model:ro

This is a list of all environment variables that can be used to configure dataClay.

Environment Variable

Description

Service

Default Value

DATACLAY_KV_HOST

The key-value store hostname

metadata, backend

DATACLAY_KV_PORT

The key-value store port

metadata, backend

6379

DATACLAY_ID

The dataclay instance ID

metadata

random

DATACLAY_METADATA_HOST

The metadata hostname

metadata

socket hostname

DATACLAY_METADATA_PORT

The metadata port

metadata

16587

DATACLAY_PASSWORD

The admin password

metadatas

admin

DATACLAY_USERNAME

The admin username

metadata

admin

DATACLAY_DATASET

The admin dataset

metadata

admin

DATACLAY_BACKEND_ID

The backend ID

backend

random

DATACLAY_BACKEND_NAME

The backend name

backend

DATACLAY_BACKEND_HOST

The backend hostname

backend

socket hostname

DATACLAY_BACKEND_PORT

The backend port

backend

6867

DATACLAY_STORAGE_PATH

The backend storage path

backend

/data/storage/

DATACLAY_LISTEN_ADDRESS

The listen address

metadata, backend

0.0.0.0

DATACLAY_LOGLEVEL

The log level

metadata, backend, client

warning

DATACLAY_METRICS

Enable metrics

metadata, backend, client

false

DATACLAY_TRACING

Enable tracing

metadata, backend, client

false

DATACLAY_TRACING_EXPORTER

The tracing exporter (otlp, console)

metadata, backend, client

otlp

DATACLAY_TRACING_HOST

The tracing host

metadata, backend, client

localhost

DATACLAY_TRACING_PORT

The tracing port

metadata, backend, client

4317

DATACLAY_SERVICE_NAME

The service name

metadata, backend, client

metadata, backend, client

Account Managment

First export the following environment variables with the corresponding value:

export DC_HOST=127.0.0.1

To create a new account:

dataclayctl new_account john s3cret

To create a new dataset:

dataclayctl new_dataset john s3cret mydataset