
In Selenium WebDriver, accessing the HTML of a web page or specific elements is crucial for many automation scenarios, such as validation, data extraction, or scraping. Whether you’re working with Java, Python, or another language, retrieving HTML content is a common task in Selenium-based test automation. This guide explores multiple methods for retrieving HTML in Selenium, along with best practices for working with dynamic web content.
Run Selenium Tests on Cloud
Retrieving HTML in Selenium allows you to analyze and validate web content during test automation. Whether you’re validating the structure of a page, extracting content for further testing, or ensuring the correctness of dynamic updates, knowing how to get HTML is essential. In Selenium, there are various methods to access the HTML source of a page or specific web elements, each serving different needs within the test process.
Common Use Cases in Test Automation
Some of the most common scenarios where accessing HTML is useful include:
Using driver.page_source in Selenium
One of the simplest and most widely used methods to retrieve the full HTML source of a page is by using the driver.page_source property. This returns the entire HTML code of the page as a string.
Example in Python:from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://example.com")
html_source = driver.page_source
print(html_source)
driver.quit()
This method is helpful when you need to analyze or verify the full structure of a page or simply check the page's content.
Retrieving HTML via JavaScript Execution
Sometimes, particularly when dealing with JavaScript-heavy websites, the driver.page_source might not capture the updated HTML after the page has been modified by JavaScript. In such cases, executing JavaScript directly via Selenium allows you to fetch the current HTML.
Example in JavaScript:
JavascriptExecutor js = (JavascriptExecutor) driver;
String pageHTML = (String) js.executeScript("return document.documentElement.outerHTML;");
System.out.println(pageHTML);
This method ensures you get the most recent HTML, including dynamically injected content that might not appear in the initial page source.
Extracting HTML with XPath Queries
For more targeted extraction, you can use XPath queries to extract specific HTML content from the page. XPath allows you to access elements based on attributes, tag names, or structure, making it a powerful tool when combined with Selenium.
Example:
html_element = driver.find_element_by_xpath("//div[@class='content']")
html_content = html_element.get_attribute("outerHTML")
print(html_content)
Here, the get_attribute("outerHTML") method retrieves the HTML of the element, including the element itself, while innerHTML would only return the inner content, excluding the element’s tags.
Example in Java:WebElement element = driver.findElement(By.id("sampleElement"));
String outerHtml = element.getAttribute("outerHTML");
String innerHtml = element.getAttribute("innerHTML");
System.out.println("Outer HTML: " + outerHtml);
System.out.println("Inner HTML: " + innerHtml);
Python:element = driver.find_element_by_xpath("//h1[@class='title']")
outer_html = element.get_attribute("outerHTML")
print("Outer HTML: ", outer_html)
Java:WebElement element = driver.findElement(By.xpath("//h1[@class='title']"));
String outerHtml = element.getAttribute("outerHTML");
System.out.println("Outer HTML: " + outerHtml);
These methods allow precise extraction of HTML from specific elements, making them useful for tests that focus on verifying the content and structure of web components.
Modern web applications often rely on JavaScript frameworks to load or manipulate content dynamically. This can make it challenging to retrieve the HTML after JavaScript has updated the page. To address this, you can either wait for elements to load before retrieving HTML or execute JavaScript directly to retrieve the latest HTML.
Use WebDriverWait to ensure that content has been dynamically injected:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.id, "dynamicContent"))
)
dynamic_html = driver.page_source
print(dynamic_html)
Here are the best practices for HTML Extraction in Selenium:
When automating Selenium tests at scale, it’s important to have a robust cloud-based solution for running tests across various browsers and devices. BrowserStack Automate is a cloud testing platform that allows you to execute Selenium tests on real browsers and devices, enabling seamless cross-browser and cross-device testing.
Benefits of Using BrowserStack for Selenium Testing
To run your Selenium tests on BrowserStack Automate, simply integrate your existing Selenium scripts with the platform’s desired capabilities. This allows you to perform tests on a wide range of devices and browsers with minimal setup.
Example code to run tests on BrowserStack:
DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability("browserName", "chrome");
caps.setCapability("browserVersion", "latest");
caps.setCapability("os", "Windows");
caps.setCapability("os_version", "10");
caps.setCapability("browserstack.local", "false");
WebDriver driver = new RemoteWebDriver(new URL("https://hub-cloud.browserstack.com/wd/hub"), caps);
Retrieving HTML in Selenium is a versatile skill that is crucial for validating, scraping, and interacting with web content. Whether you need the full page HTML or the HTML of specific elements, Selenium provides various methods for extraction, each suited to different needs. Handling dynamic content and ensuring efficient extraction across multiple browsers is equally important in modern test automation.
As web applications become more complex, the need for dynamic HTML extraction will only increase. Selenium’s integration with cloud-based testing platforms like BrowserStack Automate allows you to run tests at scale, ensuring consistent results across multiple devices and browsers while maintaining high efficiency.
Get visual proof, steps to reproduce and technical logs with one click
Continue reading
Try Bird on your next bug - you’ll love it
“Game changer”
Julie, Head of QA
Try Bird later, from your desktop