Fetch all links on page

Hey there,
I’m trying to wrap my head around getting all links from my pages for some experiments, and what I got so far is this:

foreach ($site->index()->limit(1) as $child) {
    $dom = new \DOMDocument;
    @$dom->loadHTML($child->render());
    $links = $dom->getElementsByTagName('a');

    foreach ($links as $link) {
        dump($link->getAttribute('href'));
    }
}

… but instead of storing the page’s HTML representation, it renders parts of the page and doesn’t output any links at all …

Where is $html in your code?

I’m sorry, $html simply referred to $child->render(), being loaded with @$dom->loadHTML($child->render()) in line 3 :slight_smile:

You could try $dom->loadHTML(mb_convert_encoding($child->render(), 'HTML-ENTITIES', 'UTF-8')); …if your HTML page is in UTF-8, that is.

loadHTML does not detect the encoding from reading the charset meta tag, which may be the reason behind your inconclusive result. It’s a bit odd though that it would render part of the page, but who knows.