Data visualization with d3.js/Observable: The population of France.

It is New Year’s day 2021. Since there is nothing better to do, I have decided to do another chart. No, not a chart about Covid-19? No thanks. There are so many good charts about Covid-19 out there already. A chart about something else. Just anything to forget the horrible year that has mercifully gone away never to come back.

So I decided to look at a few articles on Wikipedia, a website that I have lately fallen in love with because it is a true treasure trove for any budding hobby chart creator like myself. And in particular, I saw an article about the demographics of France, including this table:


At first, I was thinking: Oh how nice! This would look splendid on a graph. I basically just looked at the table without looking at the details (numbers….). I guess I saw the forest and ignored the trees. I was just happy that I had found the perfect table to convert into an Observable chart, which turned out to be the chart above at the very top of the article. If you are on a desktop computer, you can hover over the line and a tooltip will appear including the year and population total for each data point included in the table.

However, for some reason I was not very happy with that chart. The chart, the original of which I forked and adapted to my data (I am not creating a chart from scratch on New Year’s day….), gives equal importance to each year from the year 1 to 2016 (I left out the year 50 BC….). However, all the reliable data comes from the census data after 1801. All the data from before that period is sparse and speculative. Obviously the further we go back in time, the less the numbers are reliable and the less data we have. However, the above chart simply takes those data points and smoothly connects them with a line and you get the false impression that a lot more years are accounted for than is really the case, which is that we have little data to work with for the period before 1801.

In fact, if you were to use a slightly different chart format with vertical lines to visualise the data, you would get this result:

I was actually quite surprised. Often, you look at a table or a chart and you never even question the sources or the data. What I mean is that the table from Wikipedia serves its purpose perfectly well. However, see how relatively few data points we have before 1801? Though with hindsight that makes perfect sense I suppose. After all, there were no censuses in the dark ages….

So what would a more balanced/representative chart look like? Well I got you covered:

The black line represents the data after 1801 and the red line represents the preceding years. The difference between this chart and the first line chart at the very top of the page is startling. There is obviously an incline but it is just not as steep and more importance is given to the period since 1801. The same chart using vertical bars:

In the first two charts the data after 1801 was squished to the right side as the chart gave all periods/years equal importance whereas the two latter charts explicitly show only the years for which there is actually data, whether estimated or real, with the result that the previously squished data on the right expands to take up most of the space.

To conclude, this was a fun exercise. I will from now on be a bit more skeptical of data and tables when I see them. Different charts using different scales for the x and y axes will produce different results for the same set of data. I saw that accidentally. Good to know, however, and you always learn something new in the process.

