There are countless lists on the internet claiming to be the list of must-read HTML books and it seemed that all those lists always recommended that same books minus two or three odd choices.
Finding good resources for learning programming is always tricky. Every-one has its own opinion about what book is the best to learn, and as we say in french, “Color and tastes should not be argued about”.
However I though it would be interesting to trust the wisdom of the crown and to find the books that appeared the most in those “Best HTML Book” lists.
If you want to jump right on the results go take a look below at the full results. If you want to learn about the methodology, bear with me.
I’ve simply asked Google for a few queries like “Best HTML Books” and its variations of. I have then scrapped all those pages (using ScrapingBee, a web scraping API I’m working on).
I’ve deduplicated the links and ended up with nearly 306 links. Using the title of the pages I was also able to quickly discards:
I ended up with almost 281 HTML files. I went on opening all the files on my browser, open my chrome inspector, found and wrote the CSS selector matching book titles in the article. This took me around 1hours, almost 30 seconds per page.
This also allowed me to discard even more nonrelevant pages, and I discarded a lot. In the end I compiled around 166 lists into this one.
Book titles were then extracted with manuel extraction and some web scraping.
I ended up with a huge list of books, not usable without some post-processing.
To find the most quoted HTML books I needed to normalize my results.
I had to play with all the different variation like “{title} by {author}” or “{title} - {author}”.
Or “{title}:{subtitle}” and “{title}”, or even all the one containing edition number.
And afterquite a bit of manual cleaning.
My list now looked like this:
From there it was easy to compute the most recommended books. You can find all the data used to process this list on this repo. Now let’s take a look at the list:
I've also recently used some data from different book sellers in order to not forget important books and try to give more weight to books with incredible reviews.
‍
Receive weekly update about best programming books!
Just that, no spam, no bs.