Our Big Blogs

Serverless Stream Data Pipeline in AWS

Written by Andreia Negreira | Aug 13, 2024 2:34:29 PM

 

In this video, Andreia Negreira discusses the scope, technical details, and her insights from her AWS internship project at Big Industries.

The project leverages Amazon MSK for data streaming with AWS Lambda acting as both producers and consumers. It ingests data via the OpenSky API, orchestrates it using various AWS services, and persists it to Amazon S3 and RDS, employing a range of AWS offerings to manage data securely and effectively.

Architecture

Figure 1: Project Architecture

  • Data Ingestion: Data is ingested from the OpenSky API via AWS API Gateway.
  • Data Processing: An AWS Lambda function acts as the producer, sending data to Amazon MSK (Managed Streaming for Apache Kafka).
  • Data Storage: Data is stored in Amazon S3 and Amazon RDS by the consumer Lambdas.
  • Security: The main VPC ensures secure communication, with an Internet Gateway providing necessary external access. An Amazon EC2 bastion host allows secure access to the MSK cluster.

Detailed Architecture

Figure 2: Project Architecture Details

  • Cluster Authentication: SASL/SCRAM (Simple Authentication and Security Layer/Salted Challenge Response Authentication Mechanism) is used for authentication, managed by AWS Secrets Manager.
  • Secure Communication: TLS certificates and SCRAM profiles are managed within the cluster to ensure secure communication.

Technologies Used

  • AWS API Gateway: To manage the OpenSky API endpoints.
  • AWS Lambda: For serverless computing, used as both producer and consumer.
  • Amazon MSK: Managed Kafka service for stream processing.
  • Amazon S3: For scalable object storage.
  • Amazon RDS: For relational database storage.
  • AWS VPC: For creating a secure network environment.
  • AWS EC2: For the bastion host to manage cluster security.
  • AWS Secrets Manager: For managing SASL/SCRAM credentials.
  • Terraform: For infrastructure as code, automating the setup of AWS resources.
  • OpenSSL: For generating TLS certificates.

Contact

For any questions or issues, contact Andreia Negreira at andreia.negreira@bigindustries.be.