Sunday 13 November 2011

Read all about it! British Newspaper Archive beta site

Well the timing could have been better, from my point of view, but I was thrilled to get my hands on the Beta version of the much-anticipated British Newspaper Archive. So when I should have been sorting out my clothes and packing for the Wholly Genes cruise, I was searching, browsing and generally having a good poke about in the databases and images. My hurried 'packing' was reduced to literally throwing things into the case (ironing, what's that?) which is not my usual style at all. Shame on you, Brightsolid for what you've turned me into! Worse still, for much of the time when the site is available I am travelling, and out of 'radio contact'.

Of course, Chris Paton beat me to it with a detailed description on his own blog, so I won't repeat what he has said, instead I will add a few observations of my own. Overall, my impression of the site is very positive, and it compares well with other newspaper sites - no doubt the smart people at brightsolid have looked at them and learnt. It is, of course a beta version, so more content and features could be added before the final version goes live, but it looks a lot more finished than some beta sites I have tested.

I have always been particularly interested in newspapers and periodicals as sources for history and genealogy. For many years I was lucky enough to live within easy reach of the Newspaper Library at Colindale, so have looked at the original versions of quite a number of the titles included here. And some of them really were originals; although a lot of newspapers have been microfilmed, many others have not, so some of the ones in this collection have been copied for the first time. You can easily tell the ones that have been scanned from microfilm because they are in pure black and white, while the first-timers are in colour. In practical terms, of course, 'colour' mostly means black and beige, but the quality of these is particularly good, because they are using the latest equipment, while scans from film will be as good as the technology at the time when the filming was done.

Indexing of newspapers is done by OCR (optical character recognition) because the sheers volume of printed material makes manual indexing impractical. This of course has its limitations, although it is getting better all the time. I can remember when no-one thought it would ever be possible to use the technique on newspapers at all. It works best on nice clear print or typescript, so the results from older papers with very small print and the occasional archaic long 's' that looks like a lower-case 'f' are variable, to say the least. Maybe OCR will be able to cope with this sometime in the future.

in the meantime, the British Newspaper Archive deals with this in an interesting way. Search results include the first few lines of the raw OCR text, so you can see at a glance if this publication is one of the dodgier ones, and for each article you can view the full OCR text and submit corrections.

There are many useful features on the site, such as the basic and advanced searches that we have come to expect, with filtering options that will be familiar to anyone who has used the Times Digital Archive, although it has a cleaner look and is a little more user-friendly. There are day, month and year options so that you can limit your search to a particular date range, or to a specific date of issue. I particularly like the f fact that you can select a range of years without having to also select a day and month from the drop-down menus. It's a minor point, but one that irritates me when I use the (otherwise excellent) London Gazette site.

You can choose to have your search results sorted by relevance, the default setting, or by date, and once you have a set of search results you can filter them using a range of date and place options, or by tags. The site uses tags that have already been assigned, such as 'classified', 'illustrations' and so on, but there is a facility for users to add their own public tags, which could be interesting. You can also bookmark items you have looked at, and create menus for them within your own 'My Research' area. This area contains an edit function for adding notes, which unfortunately does not work on the beta site. Navigation options within the digitised page images are very good, but to get back to your search results you need to use the'back' button or the breadcrumb trail. A 'back to search results' option would be nice.

The beta site does not allow saving or printing options, so we will have to see how they work out later.

I wish I had more time to explore and comment on the site, but my first impressions are that it will be very good indeed, and I can't wait for the full release.



  1. Wonderful! I hope it is as good as the National Library of Australia's site where you can fix the scanned text for other people to use.

  2. The beta site did include a facility to correct the text, which I didn't mention in my haste to get the post finished before I had to go offline. It does not say exactly how this will work, but on past experience of site run by brightsolid, there will probably be some kind of moderation before a correction is made, rather than the facility for several 'alternates' that Ancestry tend to use. There is also a field where users can add comments on an item - I imagine this will be useful for adding extra detail about a person or event, or alerting users to factually incorrect items. After all, we know you can't believe everything you read in the papers!