Jobs
Models
There are many data models associated to Jobs, mainly for the reason that the nested elements are not always the same. Here’s a brief overview of the current models (this is not a complete list):
API Endpoint | Method | Data class |
---|---|---|
Business Policy | POST | JobDetails |
Business Policy | PUT | JobDetails |
Custom Field | POST | JobDetailsCustomFieldPost |
Custom Field Value Async | PUT | JobDetails |
Data Quality Fields | DELETE | JobDetailsDataQuality |
Data Quality Fields | POST | JobDetailsDataQuality |
Data Quality Value | DELETE | JobDetailsDataQuality |
Data Quality Value | POST | JobDetailsDataQuality |
Document | DELETE | JobDetailsDocumentDelete |
Document Hub Folder | DELETE | JobDetailsDocumentHubFolderDelete |
Document Hub Folder | POST | JobDetailsDocumentHubFolderPost |
Document Hub Folder | PUT | JobDetailsDocumentHubFolderPut |
Document | POST | JobDetailsDocumentPost |
Document | PUT | JobDetailsDocumentPut |
RDBMS Column | POST | JobDetailsRdbms |
RDBMS Table | POST | JobDetailsRdbms |
RDMBS Schema | POST | JobDetailsRdbms |
Term | POST | JobDetailsDocumentPost |
Term | PUT | JobDetailsDocumentPut |
User - Remove Duplicated User Accounts | POST | JobDetails |
Virtual Data Source | POST | JobDetailsVirtualDatasourcePost |
Virtual Filesystem | POST | JobDetails |
Methods
init
__init__(access_token: str, session: requests.Session, host: str, job_response: dict)
Creates an instance of the Job object.
Args:
- access_token (str): Alation REST API Access Token.
- session (requests.Session): Python requests common session.
- host (str): Alation URL.
- job_response (dict): Alation REST API Async Job Details.
check_job_status
check_job_status()
Query the Alation Background Job and Log Status until Job has completed
_get_job
_get_job() -> JobDetails
Query the Alation Job.
Returns:
- JobDetails: Alation Job.
For Allie-SDK Developers
Design Decisions
Job results are not standardised. Nearly every endpoint and method use some custom data model. Ideally we create specific data classes for each API endpoint and method combination.
Previously the transformation from the returned dict to an object based on a data classes was done in a global/core module (job.py
). This didn’t provide any context (endpoint name and method).
The solution to this was to have job.py
return just the vanilla job object and then to do the transformation/mapping in the specific methods (e.g. post_custom_fields
in custom_field.py
). This provides the necessary context.
Example of vanilla job details returned by a successful custom field POST request:
[
{
'status': 'successful'
, 'msg': 'Job finished in 0.025975 seconds at 2024-09-17 12:50:42.653908+00:00'
, 'result': [
'{"msg": "Starting bulk creation of Custom Fields...", "data": {}}'
, '{"msg": "Finished bulk creation of Custom Fields", "data": {"field_ids": [10323]}}'
]
}
]
Note that the result includes several messages, seemingly one from the start of the process and the final one. In the specific method post_custom_fields
in custom_field.py
we map this to the relevant data class.
Errors
We could already map the error data to JobDetails
within the _map_request_error_to_job_details
and _map_batch_error_to_job_details
methods, however, that would mean that on the main function level (e.g. put_custom_field_values
) we would need to check whether a list with object based on a data class gets returned (e.g. JobDetails
) or a list with dicts. So at this point the logic could get a bit complex and every main function would have to implement this logic.
Instead, we decided to simple return a dict with the _map_request_error_to_job_details
and _map_batch_error_to_job_details
methods, so that in the main function level (e.g. put_custom_field_values
) we can just call JobDetails.from_api_response(item)
for anything that gets returned.
This in turn means that variations of JobDetails
will have to implement some logic to store these error data within their structure, but in this case it is managed only in one place, so it’s easier to maintain.
Errors are sometimes returned as a list or dictionary:
Example of an error returned as a list (in this case by the custom field PUT API endpoint):
[{'options': ['Expected a list of items but got type "str".']}]
Example of an error returned as a dict (in this case by the Custom Field Value PUT API endpoint):
{'code': '400000', 'detail': 'Please check the API documentation for more details on the spec.', 'errors': [{'non_field_errors': ['No support for updating `description` field for `data`.']}], 'title': 'Invalid Payload'}
In general we don’t make an attempt to map these error structures to a specific data structure since it would take a long time to cover all error messages. Since we want to make sure that the results returned by Allie-SDK functions are always consistent, we nest these error messages/structures within the result
property of a JobDetails
(or similar) based object.
To make sure we integrate these errors correctly into the JobDetails
(or similar) structure, we need to be extra careful with the nested data structure mapping. In the example below, we map result
to JobDetailsDocumentPostResult
if the properties created_term_count
and created_terms
are present in the dictionary. In all other cases the error details, whether of type list or dict, will be left untouched and hence just stay as is within the existing structure:
@dataclass(kw_only = True)
class JobDetailsDocumentPost(JobDetails):
def __post_init__(self):
# Make sure the nested result gets converted to the proper data class
if isinstance(self.result, dict):
if all(var in ("created_term_count", "created_terms") for var in self.result.keys()):
self.result = JobDetailsDocumentPostResult.from_api_response(self.result)