DataWarehousing design patterns for event-driven semi-structured data
We want to able to analyze tickets events ad trigger internal webhooks in real time from Zendesk. Zendesk provides an AWS EventBridge connector which we were thinking of leveraging for this use-case.
For the analytical use-case, I was thinking of dumping events from EventBridge into Redshift. In order to do this, I'll create the table schemas up-front for each Zendesk event-type, and then transform the event into that schema with some eventbridge listener. I was thinking of downsides here, and a few came to mind:
I have the option to keep only the fields I care about. If down the line, I ever need a field, then my only option is to re-query Zendesk and obtain the data needed.
If the Zendesk API changes, then my pipeline may break. It seems like I might be in an endless chase to be in sync with Zendesk schema.
Another option I thought of was to dump the EventBridge output into S3, and then use Glue to compact + crawl the data. I can then query this with AWS Athena (instead of redshift). In this approach, I get the upside that I'll have all fields from the event if I ever need them, and I won't have pipeline breakages. However, I see the following downsides:
Analytical scripts / dashboards could still break since schemas changed.
In our experience, we've found glue + athena to have poor performance, and also to be expensive. It's also not realtime, glue-jobs will be running every so often.
For the Webhook triggering, I was thinking of outputting the Eventbridge events to SQS (stronger delivery / retry guarantees), and having a listener that processes the events and triggers the webhooks accordingly.
Thoughts here? What other design patterns are common for this kind of use-case?
-
Hi Aziz! It sounds like you've thought this out pretty well, so I don't have anything to add here. Maybe we'll hear from other community members to hear their thoughts on the topic.
Iniciar sesión para dejar un comentario.
1 Comentarios