Contents

    Guides

    How to Get HTML Source in Selenium

    Published on

    April 6, 2026
    How to Get HTML Source in Selenium

    In Selenium WebDriver, accessing the HTML of a web page or specific elements is crucial for many automation scenarios, such as validation, data extraction, or scraping. Whether you’re working with Java, Python, or another language, retrieving HTML content is a common task in Selenium-based test automation. This guide explores multiple methods for retrieving HTML in Selenium, along with best practices for working with dynamic web content.

    Run Selenium Tests on Cloud

    Understanding the Importance of Accessing HTML

    Retrieving HTML in Selenium allows you to analyze and validate web content during test automation. Whether you’re validating the structure of a page, extracting content for further testing, or ensuring the correctness of dynamic updates, knowing how to get HTML is essential. In Selenium, there are various methods to access the HTML source of a page or specific web elements, each serving different needs within the test process.

    Common Use Cases in Test Automation

    Some of the most common scenarios where accessing HTML is useful include:

    • Page Structure Validation: Ensuring that the correct HTML tags are present and structured properly.
    • Content Validation: Verifying the content of certain tags or elements, such as ensuring specific text exists within a <div> or <span>.
    • Data Extraction: Extracting information from a page for later use, such as getting a list of links or scraping dynamic content for validation.

    Accessing Full Page HTML Source

    Using driver.page_source in Selenium

    One of the simplest and most widely used methods to retrieve the full HTML source of a page is by using the driver.page_source property. This returns the entire HTML code of the page as a string.

    Example in Python:

    from selenium import webdriver

    driver = webdriver.Chrome()

    driver.get("https://example.com")

    html_source = driver.page_source

    print(html_source)

    driver.quit()

    This method is helpful when you need to analyze or verify the full structure of a page or simply check the page's content.

    Retrieving HTML via JavaScript Execution

    Sometimes, particularly when dealing with JavaScript-heavy websites, the driver.page_source might not capture the updated HTML after the page has been modified by JavaScript. In such cases, executing JavaScript directly via Selenium allows you to fetch the current HTML.

    Example in JavaScript:

    JavascriptExecutor js = (JavascriptExecutor) driver;

    String pageHTML = (String) js.executeScript("return document.documentElement.outerHTML;");

    System.out.println(pageHTML);

    This method ensures you get the most recent HTML, including dynamically injected content that might not appear in the initial page source.

    Extracting HTML with XPath Queries

    For more targeted extraction, you can use XPath queries to extract specific HTML content from the page. XPath allows you to access elements based on attributes, tag names, or structure, making it a powerful tool when combined with Selenium.

    Example:

    html_element = driver.find_element_by_xpath("//div[@class='content']")

    html_content = html_element.get_attribute("outerHTML")

    print(html_content)

    Here, the get_attribute("outerHTML") method retrieves the HTML of the element, including the element itself, while innerHTML would only return the inner content, excluding the element’s tags.

    Extracting HTML of Specific Web Elements

    • Utilizing get_attribute('outerHTML') and get_attribute('innerHTML')
      In Selenium, get_attribute("outerHTML") and get_attribute("innerHTML") are commonly used to extract HTML for specific elements.
      • outerHTML: Returns the HTML of the element, including the element itself.
      • innerHTML: Returns only the content inside the element (excluding the tag itself).

    Example in Java:

    WebElement element = driver.findElement(By.id("sampleElement"));

    String outerHtml = element.getAttribute("outerHTML");

    String innerHtml = element.getAttribute("innerHTML");

    System.out.println("Outer HTML: " + outerHtml);

    System.out.println("Inner HTML: " + innerHtml);

    • Use outerHTML when you need the complete HTML markup of an element and innerHTML when you are interested in just the content.
    • Practical Examples in Python and Java
      Whether you are working in Python or Java, these methods are easily implemented. Here’s a comparison of extracting HTML in both languages:

    Python:

    element = driver.find_element_by_xpath("//h1[@class='title']")

    outer_html = element.get_attribute("outerHTML")

    print("Outer HTML: ", outer_html)

    Java:

    WebElement element = driver.findElement(By.xpath("//h1[@class='title']"));

    String outerHtml = element.getAttribute("outerHTML");

    System.out.println("Outer HTML: " + outerHtml);

    These methods allow precise extraction of HTML from specific elements, making them useful for tests that focus on verifying the content and structure of web components.

    Handling Dynamic Content in Modern Web Applications

    Modern web applications often rely on JavaScript frameworks to load or manipulate content dynamically. This can make it challenging to retrieve the HTML after JavaScript has updated the page. To address this, you can either wait for elements to load before retrieving HTML or execute JavaScript directly to retrieve the latest HTML.

    Use WebDriverWait to ensure that content has been dynamically injected:

    from selenium.webdriver.common.by import By

    from selenium.webdriver.support.ui import WebDriverWait

    from selenium.webdriver.support import expected_conditions as EC

    element = WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.id, "dynamicContent"))

    )

    dynamic_html = driver.page_source

    print(dynamic_html)

    • Strategies for Dealing with AJAX and Single Page Applications (SPAs)
      For AJAX-based websites or SPAs (Single Page Applications), elements may not be immediately available when the page is loaded. Selenium’s implicit and explicit waits allow you to manage this by waiting for specific elements to become visible or ready for interaction before extracting the HTML.

    Best Practices for HTML Extraction in Selenium

    Here are the best practices for HTML Extraction in Selenium:

    • Efficient Methods for Large-Scale Data Extraction: When you need to extract a large amount of data from a page, it’s important to use efficient techniques. One option is to limit the scope of your extraction using XPath to target only the necessary elements, rather than extracting the entire page HTML. Additionally, you can batch the extraction process by retrieving HTML in chunks if the page contains a lot of elements.
    • Managing Dynamic Content and Delays: Always account for potential delays when working with dynamic content. Rely on waits (e.g., WebDriverWait) to ensure that content has finished loading before you extract HTML. This prevents errors and ensures that you capture the most accurate HTML.
    • Ensuring Cross-Browser Compatibility: Test your HTML extraction methods across different browsers, as some browsers may handle JavaScript and dynamic content differently. Use a tool like BrowserStack Automate to run your tests across multiple browsers and devices simultaneously.

    Automating Tests at Scale with BrowserStack

    When automating Selenium tests at scale, it’s important to have a robust cloud-based solution for running tests across various browsers and devices. BrowserStack Automate is a cloud testing platform that allows you to execute Selenium tests on real browsers and devices, enabling seamless cross-browser and cross-device testing.

    Benefits of Using BrowserStack for Selenium Testing

    • Real Devices and Browsers: Run tests on real devices and browsers, ensuring more accurate test results.
    • Parallel Test Execution: Execute multiple tests simultaneously, speeding up feedback and reducing test execution time.
    • Access to Latest Browser Versions: Ensure that your tests are run on the latest versions of browsers, providing accurate and up-to-date testing.

    To run your Selenium tests on BrowserStack Automate, simply integrate your existing Selenium scripts with the platform’s desired capabilities. This allows you to perform tests on a wide range of devices and browsers with minimal setup.

    Example code to run tests on BrowserStack:

    DesiredCapabilities caps = new DesiredCapabilities();

    caps.setCapability("browserName", "chrome");

    caps.setCapability("browserVersion", "latest");

    caps.setCapability("os", "Windows");

    caps.setCapability("os_version", "10");

    caps.setCapability("browserstack.local", "false");

    WebDriver driver = new RemoteWebDriver(new URL("https://hub-cloud.browserstack.com/wd/hub"), caps);

    Conclusion

    Retrieving HTML in Selenium is a versatile skill that is crucial for validating, scraping, and interacting with web content. Whether you need the full page HTML or the HTML of specific elements, Selenium provides various methods for extraction, each suited to different needs. Handling dynamic content and ensuring efficient extraction across multiple browsers is equally important in modern test automation.

    As web applications become more complex, the need for dynamic HTML extraction will only increase. Selenium’s integration with cloud-based testing platforms like BrowserStack Automate allows you to run tests at scale, ensuring consistent results across multiple devices and browsers while maintaining high efficiency.

    Data-rich bug reports loved by everyone

    Get visual proof, steps to reproduce and technical logs with one click

    Make bug reporting 50% faster and 100% less painful

    Rating LogosStars
    4.6
    |
    Category leader

    Liked the article? Spread the word

    Put your knowledge to practice

    Try Bird on your next bug - you’ll love it

    “Game changer”

    Julie, Head of QA

    star-ratingstar-ratingstar-ratingstar-ratingstar-rating

    Overall rating: 4.7/5

    Try Bird later, from your desktop