Skip to content

Using validation mechanisms

Utilizing transpiled schemas

So, you have transpiled your schemas into Pydantic models (found under the PHAISTOS__SCHEMA_PATH path).

Now what? Well, you can use them to validate data against the defined schemas.

Suppose we have a schema defined in a file named person.yaml:

person:
  name:
    type: str
    description: The name of the person
  age:
    type: int
    description: The age of the person
    manager: |
        if age < 18:
            raise ValueError("The age must be at least 18")
  email:
    type: str
    description: The email of the person

This YAML manifest defines a schema named person that has three fields: name, age, and email. The age field has a custom manager that checks if the age is at least 18.

Here is a simple example of how you can use the Manager object to validate data against the schema:

from phaistos import Manager

# Initialize the Manager
manager = Manager.start()

# Validate data against the schema
data = {
    "name": "John Doe",
    "age": 30,
    "email": "joe@yahoo.com"
}

schema_name = "person"

# Validate the data against the schema
result = manager.validate(
    data=data,
    schema=schema_name
)

As You can see, the Manager object is to be instantiated using the start method, due to the fact that it is a singleton object. This means that you can only have one instance of the Manager object in your application.

phaistos.manager.Manager.start

Source code in phaistos/manager.py
32
33
34
35
36
37
38
39
40
41
42
43
44
@classmethod
def start(
    cls,
    discover: bool = True,
    schemas_path: str = ''
) -> Manager:
    cls._discover = discover
    if 'PHAISTOS__DISABLE_SCHEMA_DISCOVERY' in os.environ:
        cls._discover = False
    if not cls._started:
        cls._current_schemas_path = schemas_path or os.environ.get('PHAISTOS__SCHEMA_PATH', '')
        cls.__instance = cls()
    return cls.__instance  # type: ignore

Trying to directly instantiate the Manager object will raise an error:

from phaistos import Manager

# This will raise an error
manager = Manager()

Below is the signature of the validate method:

phaistos.manager.Manager.validate

Source code in phaistos/manager.py
28
29
30
def validate(self, data: dict, schema: str) -> ValidationResults:
    self.logger.info(f'Validating data against schema: {schema}')
    return self.get_factory(schema).validate(data)

Validation result

After each validation, the validate method returns a ValidationResult object that contains the validation result.

phaistos.typings.ValidationResults

A dataclass that represents the results of a validation.

Attributes:

Name Type Description
valid bool

A boolean that represents if the data is valid.

schema dict

The schema of the data.

errors list[FieldValidationErrorInfo]

A list of field validation errors.

data dict

The data that was validated.

Source code in phaistos/typings.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
@dataclasses.dataclass(kw_only=True)
class ValidationResults:
    """
    A dataclass that represents the results of a validation.

    Attributes:
        valid (bool): A boolean that represents if the data is valid.
        schema (dict): The schema of the data.
        errors (list[FieldValidationErrorInfo]): A list of field validation errors.
        data (dict): The data that was validated.
    """
    schema: dict
    errors: list[FieldValidationErrorInfo]
    data: dict = dataclasses.field(default_factory=dict)
    valid: bool = dataclasses.field(init=False)

    def __post_init__(self) -> None:
        self.valid = len(self.errors) == 0

    def __str__(self) -> str:
        is_data_valid = 'Yes' if self.valid else 'No'
        errors_printout = '\nReasons:\n' + '\n'.join(
            f'  {error}'
            for error in self.errors
        ) if self.errors else ""
        return f'Is data valid?: {is_data_valid}{errors_printout}'

You can access these attributes to get more information about the validation result:

if result.is_valid:
    print("Data is valid!")
else:
    print("Data is invalid!")
    print("Errors:")
    for error in result.errors:
        print(error)

That's it! You have successfully validated data against a schema using the Manager object.

Get the data model factory

If You need to get the Pydantic model for a schema, you can use the get_factory method:

from phaistos import Manager

# Initialize the Manager
manager = Manager.start()

schema_name = "person"

# Get the Pydantic model for the schema
model = manager.get_factory(schema_name)

phaistos.manager.Manager.get_factory

Get a schema factory by name

Parameters:

Name Type Description Default
name str

The name of the schema

required

Returns:

Name Type Description
SchemaInstancesFactory SchemaInstancesFactory

The schema factory, that can be used to validate data and create instances of the model

Source code in phaistos/manager.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def get_factory(self, name: str) -> SchemaInstancesFactory:
    """
    Get a schema factory by name

    Args:
        name (str): The name of the schema

    Returns:
        SchemaInstancesFactory: The schema factory, that can be used to validate data and create instances of the model
    """
    if name not in self._schemas:
        raise SchemaLoadingException(
            f'Schema {name} not found'
        )
    return self._schemas[name]

The get_factory method returns a factory object that can be used to create instances of the Pydantic model.

phaistos.schema.SchemaInstancesFactory

A dataclass that represents a validation schema.

Attributes:

Name Type Description
name str

The name of the schema.

_model type[TranspiledSchema]

The model of the schema, used for validation.

Source code in phaistos/schema.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
@dataclasses.dataclass(kw_only=True)
class SchemaInstancesFactory:
    """
    A dataclass that represents a validation schema.

    Attributes:
        name (str): The name of the schema.
        _model (type[TranspiledSchema]): The model of the schema, used for validation.
    """
    name: str
    _model: type[TranspiledSchema]
    errors: list[FieldValidationErrorInfo] = dataclasses.field(default_factory=list)

    def validate(self, data: dict) -> ValidationResults:
        """
        Validate the given data against the schema. Do not return
        the validated data, only the validation results.

        Args:
            data (dict): The data to validate.

        Returns:
            ValidationResults: The validation results, including the schema, errors, and data.
        """
        self._model(**data)
        collected_errors = [
            *set(self._model.parent._validation_errors)  # pylint: disable=protected-access
        ]
        self.errors = collected_errors
        return ValidationResults(
            schema=self._model.model_json_schema(),
            errors=collected_errors,
            data=data
        )

    def build(self, data: dict[str, typing.Any]) -> TranspiledSchema | None:
        return self._model(**data) if self.validate(data).valid else None

validate(data)

Validate the given data against the schema. Do not return the validated data, only the validation results.

Parameters:

Name Type Description Default
data dict

The data to validate.

required

Returns:

Name Type Description
ValidationResults ValidationResults

The validation results, including the schema, errors, and data.

Source code in phaistos/schema.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def validate(self, data: dict) -> ValidationResults:
    """
    Validate the given data against the schema. Do not return
    the validated data, only the validation results.

    Args:
        data (dict): The data to validate.

    Returns:
        ValidationResults: The validation results, including the schema, errors, and data.
    """
    self._model(**data)
    collected_errors = [
        *set(self._model.parent._validation_errors)  # pylint: disable=protected-access
    ]
    self.errors = collected_errors
    return ValidationResults(
        schema=self._model.model_json_schema(),
        errors=collected_errors,
        data=data
    )

This model factory can be then used to create instances of the underlying data classes:

data = {
    "name": "John Doe",
    "age": 30,
    "email": "xxx@yahoo.com"
}

# Create an instance of the Pydantic model using the factory
instance = model_factory.build(**data)

This approch is useful when the data is to be used in a more object-oriented way e.g. when creating entries to a database. As it can be seen, the build method is used to create an instance of the Pydantic model.

phaistos.schema.SchemaInstancesFactory.build

Source code in phaistos/schema.py
136
137
def build(self, data: dict[str, typing.Any]) -> TranspiledSchema | None:
    return self._model(**data) if self.validate(data).valid else None

If the data is valid, the constructor of the Pydantic model will not raise any exceptions, and the instance will be created. If the data is invalid, the result will be None and the last encountered errors can be then accessed via the errors property of the model factory:

factory = manager.get_factory(schema_name)

# Create an instance of the Pydantic model
instance = factory(**data)

if instance is None:
    print(factory.validation_errors)

It is worth noting that the validate method is also available on the model factory:

data = {
    "name": "John Doe",
    "age": 30,
    "email": "xxx@gmail.com"
}

# Validate the data against the schema
result = model_factory.validate(data)

So, why the validate method is also included in the Manager itself? It's just a syntactic sugar, as it retrieves the model factory and then calls the validate method on it for given data.