Tools for checking a website for broken links / Posts / Daniel Franklin

Introduction

After realising I have content dating back to the year 2015, I took it upon myself to investigate how I could run a link check across my website. The requirements were:

it must check every link on every page of my website.
it must check external links.
it must not check for links within external links.

From my brief research session I found the following solutions:

Wget;
LinkChecker;
the w3C Link Checker; and
my personal favourite, linkcheck.

Link checkers

Wget

Wget is a package available on most Linux distributions. The following command achieved most of my requirements:

$ wget --spider --recursive -o www.danielfranklin.id.au.log https://www.danielfranklin.id.au

The --spider flag instructs Wget to only crawl the website and not download any of the files or assets it comes across.

The --recursive flag defaults to five levels deep, which covered the majority of my website but extended to external links too, i.e. if a link at the second level was an external link, all the links on the external link would count as the third level, and so on. This meant not all links on my website were covered, and increasing the range would increase the time taken to complete the crawl.

The -o flag instructs Wget to send the crawl log to the file specified. In this case, the log would be www.danielfranklin.id.au.log.

Finally, my website address is passed as the website that Wget will crawl.

LinkChecker

LinkChecker is an open source crawler that has been around since 2000. It improves upon Wget by recognising internal and external links and only following the former, which means there is no need for a --recursive flag:

$ linkchecker -F text https://www.danielfranklin.id.au

The -F flag tells LinkChecker to output a log file in the format specified. There are several output formats available in LinkChecker such as HTML, CSV or XML but I opted for the text format.

As far as requirements, LinkChecker met all of them and I found it quite reliable and easy to set up.

W3C Link Checker

The W3C Link Checker is best described as an online variant of Wget. It does much the same without the need to install or set up anything locally but does suffer from the same recursive issue and will not automatically check every single link as a result.

linkcheck

linkcheck, not to be confused with LinkChecker, also meets all of my requirements and is very speedy. It also offers checking of anchors and notifies if there are any invalid ones which is immensely useful.

I opted to use Docker to run linkcheck. There are other options such as installing an executable or installing locally (which requires installing Google Dart).

Between linkcheck and LinkChecker, linkcheck feels that little bit faster and has a very streamlined user experience, from the crawling to the output logs.