Publish

Explanation of DataHub publishing flow from client and back-end perspectives.

graph TD cli((CLI fa:fa-user)) auth[Auth Service] cli --login--> auth cli --store--> raw[Raw Store API
+ Storage] cli --package-info--> pipeline-store raw --data resource--> pipeline-runner pipeline-store -.generate.-> pipeline-runner pipeline-runner --> package[Package Storage] package --api--> frontend[Frontend] frontend --> user[User fa:fa-user] package -.publish.->metastore[MetaStore] pipeline-store -.publish.-> metastore[MetaStore] metastore[MetaStore] --api--> frontend

Diagram for upload process

graph TD CLI --jwt--> rawstore[RawStore API] rawstore --signed urls--> CLI CLI --upload using signed url--> s3[S3 bucket] s3 --success message--> CLI CLI --metadata--> pipe[Pipe Source]

Identity Pipeline

Context: where this pipeline fits in the system

graph LR specstore --shared db--> assembler assembler --identity pipeline--> pkgstore pkgstore --> frontend

Detailed steps

graph LR load[Load from RawStore] --> encoding[Encoding Check
Add encoding info] encoding --> csvkind[CSV kind check] csvkind --> validate[Validate data] validate --> dump[Dump S3] dump --> pkgstore[Pkg Store fa:fa-database] load -.-> dump validate --> checkoutput[Validation
Reports]

Client Perspective

Publishing flow takes the following steps and processes to communicate with DataHub API:

sequenceDiagram Upload Agent CLI->>Upload Agent CLI: Check Data Package valid Upload Agent CLI-->>Auth(SSO): login Auth(SSO)-->>Upload Agent CLI: JWT token Upload Agent CLI->>RawStore API: upload using signed url RawStore API->>Auth(SSO): Check key / token Auth(SSO)->>RawStore API: OK / Not OK RawStore API->>Upload Agent CLI: success message Upload Agent CLI->>pipeline store: package info pipeline store->>Upload Agent CLI: OK / Not OK pipeline store->>pipeline runner: generate RawStore API->>pipeline runner: data resource pipeline runner->>Package Storage: generated Package Storage->>Metadata Storage API: publish pipeline store->>Metadata Storage API: publish Metadata Storage API->>Upload Agent CLI: OK / Not OK


  • Upload API - see POST /source/upload in source section of API
  • Authentication API - see GET /auth/check in auth section of API.
  • Authorization API - see GET /auth/authorize in auth section of API.

See example code snippet in DataHub CLI