Dataflows Lineage V2
Models
Dataflow
Python object used to define a dataflow object to be created create_or_replace_dataflows.
Attributes:
| Name | Required | Type | Description |
|---|---|---|---|
| external_id | TRUE | str | The external id of the dataflow object. To access this dataflow object uniquely.. “API/” previx is required. |
| title | FALSE | str | Title of the dataflow. |
| description | FALSE | str | The description of the dataflow. |
| content | FALSE | str | The transformation logic, SQL statement. |
| group_name | FALSE | str | The name of the dataflow source group that the dataflow belongs to. Case sensitive. If an exact match is found then it would be used, otherwise it will create a new dataflow source group. |
DataflowPathObject
Python object used to define a lineage path.
Attributes:
| Name | Required | Type | Description |
|---|---|---|---|
| otype | TRUE | str | Type of the object for this lineage segment. (table, attibute, dataflow, etc) |
| key | TRUE | str | ULM key to identify the lineage path object |
DataflowPayload
Python object used to define dataflow objects and lineage paths to be created create_or_replace_dataflows.
Attributes:
| Name | Required | Type | Description |
|---|---|---|---|
| dataflow_objects | FALSE | list | The external id of the dataflow object. To access this dataflow object uniquely.. “API/” previx is required. |
| paths | FALSE | list | “paths” is an array of “path”s. Each “path” specifies the details of sources (-> dataflows) -> targets lineage by listing elements of each step, or “segment”, of the lineages in order.Each “segment” may contain data objects and/or dataflows, but the first and the last “segment” of a “path” SHOULD NOT contain any dataflows. |
DataflowPatchItem
Python object used to define dataflow items to update/patch patch_dataflows.
Attributes:
| Name | Required | Type | Description |
|---|---|---|---|
| id | TRUE | str | The id of the dataflow object. |
| title | FALSE | str | Title of the dataflow. |
| description | FALSE | str | The description of the dataflow. |
| content | FALSE | str | The transformation logic, SQL statement. |
| group_name | FALSE | str | The name of the dataflow source group that the dataflow belongs to. Case sensitive. If an exact match is found then it would be used, otherwise it will create a new dataflow source group. |
Methods
Methods
get_dataflows
get_dataflows(object_ids: list, query_params: DataflowParams) -> DataflowPayload
Get Dataflow objects with related lineage paths
Args:
- object_ids (list): Optional argument to filter dataflow object by.
- query_params: (DataflowParams) : Filter by param (id, external_id)
Returns:
- DataFlowPayload: Resulting object containing a list of requested DataFlow objects and their lineage paths.
create_or_replace_dataflows
create_or_replace_dataflows(payload: DataflowPayload) -> list[JobDetailsDataflowPost]
Create/Replace Dataflow objects with related lineage paths
Args:
- payload (DataflowPayload): Data Class containing lsts of DataFlow and DataFlowPaths objects create or update.
Returns:
- List of JobDetails: Status report of the executed background jobs.
update_dataflows
update_dataflows(payload: list[DataflowPatchItem] -> list[JobDetailsDataflowPost]
Update Dataflow objects definition. This method cannot be used to update lineage paths.
Args:
- payload (DataflowPatchItem(list)): List of DataFlows (DataFLowPatchItem) to update
Returns:
- List of JobDetails: Status report of the executed background jobs.
delete_dataflows
delete_dataflows(object_ids: list, query_params: DataflowParams) -> list[JobDetailsDataflowPost]
Get Dataflow objects with related lineage paths
Args:
- object_ids (list): Optional argument to filter dataflow object by.
- query_params: (DataflowParams) : Filter by param (id, external_id)
Returns:
- List of JobDetails: Status report of the executed background jobs.
Examples
See /examples/example_dataflow.py.