AWS Data Engineer Associate

Preamble

A while back I was lucky enough to work on a Data Hub solution for a global logistics firm. I’ll be honest, I knew very little about the subject at the beginning, but I think I picked stuff up fairly well and quickly and delivered some okay stuff.

While working on it I learned about Redshift, Glue, Lakeformation and more in-depth stuff on S3 and DynamoDB. I started to get the hang of the whole data pipeline idea and how things hang together, and I started on the AWS Data Analytics Speciality course (on ACloudGuru). This was initially to gain knowledge on Redshift, but I quickly found that some aspects of the course were a bit 101, and some parts (like Redshift partitioning) were out of scope for me as the infrastructure guy - the Data Engineers and Architects were on that stuff.

I turned instead to a few different AWS certs (SA Pro, Security, DBs), but I was always tempted to return to that course. After all, I’d covered some of the content through lectures, and some I knew from experience. Also (the best reason) I actually found it generally interesting. I had a little dismay when I found the course was being retired - but what’s this? There’s an Assoc Data Engineer course instead? Interesting….

AWS Data Engineer Assoc Courses

This cert was in beta, and due to come out in January 2024, but it makes sense to start prepping now. I found an exam prep section on the AWS SkillBuilder site and noted I’d taken the sample questions and scored 70% a while back, and this will have been without any pre-reading/watching. A nice surprise, but looking at the actual syllabus I realised there were an awful lot of things I didn’t know at all. I’m not all that interested in just passing a cert to get the badge, I do actually want it to have meaning (like, show I know stuff).

I looked on ACloudGuru (as my work gives access to this) but I didn’t see anything there. Not too displeased, there’s only so much “Hello Cloud Gurus” I can stand.

Instead, I noted a Maarek course on Udemy, on offer at a tenner. Bargain! I really like the Maarek courses, I seem to get on well with his presentation and accent (makes a difference to me). I was a little disappointed to find the course started with some other chap going through Data Engineering Fundamentals, but I really quickly warmed to him and the AWS-specific topics seem to be majority Maarek. Looking again, it’s not just billed as a Maarek course - the other chap is Frank Kane and I reckon I’ll try courses with him generally in future.

The course is based on the beta syllabus, but they’ve made the point that it’ll be updated as AWS pushes out the final exams etc. Overall, while I know AWS’s own prep stuff will be good, and I’m sure there will be an ACG course some time soon, I’m very happy to have bought this course.

AWS Data Engineer Syllabus

Looking at the Udemy course contents, the syllabus seems to cover:

  • Data Engineering Fundamentals
  • Storage (S3, EFS, EBS, Backup)
  • DBs (mainly Redshift, a fair bit of DynamoDB, touches on RDS and a few others)
  • Migration (DMS, DataSync, Snow, Transfer family)
  • Compute (EC2, Lambda, SAM)
  • Containers (ECS, ECR, EKS - but not really heavy on this)
  • Analytics (Glue, Athena, EMR, Kinesis, OpenSearch)
  • Application Integration (SQS/SNS, Step Functions, EventBridge)
  • Security etc (IAM, KMS, Secrets, WAF - all those things)
  • Networking (R53 and Cloudfront really)
  • Management & Governance (CloudWatch&Trail, Parameter Store)
  • Machine Learning (Sagemaker)
  • Developer Tools (CDK, Cloud9, Code*, CLI)
  • “Everything Else” (Budgets & Cost Explorer API Gateway)

I’m seeing a lot of these covered in other courses - things like storage, compute, networking, security, management, dev tools etc are all really common topics, and quite honestly bread and butter. I guess as an Assoc course these things need to be covered, not just assumed (as you might be able to with a Specialty cert). I’ll certainly go through all bar the most basic content in case there are nuances I need for this exam - there are certainly some S3 things I’m not sure on (access points) so it’s good for the refresher there anyway!

I’m really looking forward to this course. It’ll help refresh and update my knowledge in an interesting area, and I think this along with the SysOps Assoc I just passed will be a nice additional grounding for the DevOps Pro cert I want to take later in the year. I’ve not really been massively interested in ML before, but I’ll see if the Sagemaker aspects of this tickle my fancy - maybe the next Spec cert could be ML, instead of the Networking one I’ve been considering?

It’s all about learning and moving forwards. I’ve done staying still in the past, and it’s a daft sucky path to take.

Resources