Serverless Selenium: Setting Up Python Selenium in AWS Lambda

Lamarr

6 min read•August 13, 2020

serverless-selenium

What's Selenium Anyway

Selenium is a powerful tool for automating web browsers. Think of it as having a virtual robot that can control your browser to do tasks like visiting websites, filling out forms, and clicking buttons, all without you lifting a finger. It mimics human actions, but it’s completely automated.

Once you’ve got a Selenium script running smoothly on your local machine, the next challenge is figuring out how to deploy it to run somewhere other than your computer. This is where AWS Lambda comes in as a perfect solution—it’s easy to provision, cost-effective, and scales effortlessly. The tricky part is getting the libraries and packaging right for AWS Lambda, but don’t worry, I’ll break it down and make it simple for you below!

Why Selenium in AWS

Running Selenium headless in AWS Lambda is a game-changer for web scraping, automated testing, creating bots, and other browser automation tasks. However, the setup can be a bit tricky, especially when it comes to correctly including chromedriver and Selenium. I’ve found myself repeating this process often, each time with a slightly different approach. In this blog, I’ll show you the most streamlined way to get it set up as a Lambda Layer!

Setup

Before we dive into the setup, ensure you have the following:

An AWS account
AWS CLI installed and configured
Basic knowledge of AWS Lambda and Python

Step 1: Clone the Repository

The first step is to clone the repository

1git clone https://github.com/LamarrD/headless-chrome
2cd headless-chrome

Step 2: Deploy the AWS Resources

Next, we need to deploy the Lambda layer that includes headless Chrome and the necessary dependencies. The repo uses terraform to create all the resources you need including a sample selenium lambda.

1terraform init
2terraform apply

Let's review the Lambda code

1from headless_chrome import create_driver
2from selenium.webdriver.common.by import By
3def lambda_handler(event, context):
4    """ Sample handler using imported the layer """
5    driver = create_driver()
6    driver.get("https://example.com/")
7    heading = driver.find_element(By.TAG_NAME, 'h1')
8    return heading.text

The sample lambda just goes to example.com and gets the first h1. Notice It uses the create_driver helper function from the lambda layer.

Step 3: Test the Lambda Function

Finally, you can test your Lambda function using the AWS Management Console or the AWS CLI. To test using the CLI, you can use the following command:

1aws lambda invoke --function-name selenium-test-lambda output.txt

lambda_selenium_success

That’s It!

You now have a Lambda layer that you can use in any of your Python Lambdas to enable Selenium for browser automation. This setup opens up a world of possibilities, from web scraping and automated testing to creating bots and handling repetitive web tasks seamlessly. With AWS Lambda, you get the added benefits of scalability and cost-efficiency, allowing your automation scripts to run effortlessly in the cloud. So go ahead, start integrating Selenium into your projects, and watch how it transforms your workflow. If you run into any issues or have questions, feel free to drop an issue on my repo or drop me a line from my contact page. Happy browser automating!