Serverless Selenium: Setting Up Python Selenium in AWS Lambda
TLDR; Follow this repo
What's Selenium Anyway
Selenium is a powerful tool for automating web browsers. Think of it as having a virtual robot that can control your browser to do tasks like visiting websites, filling out forms, and clicking buttons, all without you lifting a finger. It mimics human actions, but it’s completely automated.
Once you’ve got a Selenium script running smoothly on your local machine, the next challenge is figuring out how to deploy it to run somewhere other than your computer. This is where AWS Lambda comes in as a perfect solution—it’s easy to provision, cost-effective, and scales effortlessly. The tricky part is getting the libraries and packaging right for AWS Lambda, but don’t worry, I’ll break it down and make it simple for you below!
Why Selenium in AWS
Running Selenium headless in AWS Lambda is a game-changer for web scraping, automated testing, creating bots, and other browser automation tasks. However, the setup can be a bit tricky, especially when it comes to correctly including chromedriver and Selenium. I’ve found myself repeating this process often, each time with a slightly different approach. In this blog, I’ll show you the most streamlined way to get it set up as a Lambda Layer!
Setup
Before we dive into the setup, ensure you have the following:
- An AWS account
- AWS CLI installed and configured
- Basic knowledge of AWS Lambda and Python
Step 1: Clone the Repository
The first step is to clone the repository
1git clone https://github.com/LamarrD/headless-chrome 2cd headless-chrome
Step 2: Deploy the AWS Resources
Next, we need to deploy the Lambda layer that includes headless Chrome and the necessary dependencies. The repo uses terraform to create all the resources you need including a sample selenium lambda.
1terraform init 2terraform apply
Let's review the Lambda code
1from headless_chrome import create_driver 2from selenium.webdriver.common.by import By 3def lambda_handler(event, context): 4 """ Sample handler using imported the layer """ 5 driver = create_driver() 6 driver.get("https://example.com/") 7 heading = driver.find_element(By.TAG_NAME, 'h1') 8 return heading.text
The sample lambda just goes to example.com and gets the first h1. Notice It uses the create_driver
helper function from the lambda layer.
Step 3: Test the Lambda Function
Finally, you can test your Lambda function using the AWS Management Console or the AWS CLI. To test using the CLI, you can use the following command:
1aws lambda invoke --function-name selenium-test-lambda output.txt
That’s It!
You now have a Lambda layer that you can use in any of your Python Lambdas to enable Selenium for browser automation. This setup opens up a world of possibilities, from web scraping and automated testing to creating bots and handling repetitive web tasks seamlessly. With AWS Lambda, you get the added benefits of scalability and cost-efficiency, allowing your automation scripts to run effortlessly in the cloud. So go ahead, start integrating Selenium into your projects, and watch how it transforms your workflow. If you run into any issues or have questions, feel free to drop an issue on my repo or drop me a line from my contact page. Happy browser automating!