Google Mini Searching – V4s and Foreign Language Titles

Browse the Library

Visit our Home Page

10% more books added to the Ultrapedia Library

As promised in my December posting the first roll-out of V4s are now available for full text search and retrieval, via our Google Mini search interface. In this first batch we’ve added over 7400 V4s to the library.

New to the library also, are foreign language books, there are 1332 French language books and 431 Spanish language books. Bringing the total number of recognised books in our library to 20866, an increase of almost 10% since 1st January 2008 when our site went live.

Here’s a breakdown

7468 – English Language Recognised Books (V4s)

V4s are a derivative of a V3, they are an optimized or slimmed down version. In this transition stage, the V3s are still in the library, so this means that there are two copies of the same book. V4s aren’t included in the total number of books, as only unique titles are included in the numbers.


19103 – English Language Recognised Books (V3s)


431 – Spanish Language Recognised Books (V3s) ***NEW***


1332 – French Language Recognised Books (V3s) ***NEW***

TOTAL UNIQUE BOOKS: 20, 866

TOTAL PAGES: APPROX 6 MILLION

Other foreign language titles we hope to release in February include, Danish, Dutch, German, Italian, Norwegian, and Swedish.

The browsable library has 21298 English Language books for browsing and downloading. The French and Spanish language titles will be added soon.

Creating V4s

V4 is the version number we give to a recognised book to determine its recognition stage. V4 is appended to the filename, it’s a quick and easy method of keeping track.

V4s begin as V3s. V3s are page checked first, for recognition accuracy. We then extract and remove the Table of Contents, Indexes and Advertisements from the books as these represent ‘dead-end’ searches. We create a new file of the same filename, pre-fixed with TOC for Table of Contents, INDEX for Indexes etc… these new files are saved for rebuilding into the workflow later. Bibliographies, Chronologies and any Plates remain in the book, as they are generally unique content; but are also extracted as separate files – the V3 file then evolves into a V4.

Here are some examples of V4s recently released:

Anaesthetics, their uses and administration by Dudley Wilmot Buxton

Ancient Armour and Weapons in Europe by John Hewitt

Annals of Caesar by E. G. [Ernest Gottlieb] Sihler

The other 7465 V4 books can be found by searching for them via our Google Mini Search interface. You can also search for French and Spanish language books, and our entire collection of English language books.

Remember, that to download books you should Login first or Register. Registering is free and only requires your email address.

I think now would be a good time to recap on the V-numbers we have used so far – so here goes.

 

V0 – The book is not suitable for OCR

V1 – The book is a good candidate for OCR

V2 – Only used in-house

V3 – The book has been OCR’d and published on the website for browse, search and download

V4 – The book has been OCR’d and published on the website for search and download **NEW**

 

Other files that emerge from the V3s are V35s and V53s.

A V35 has pages from a V3 that have pictures and/or graphics and text on a page. To create a V35 we take a V3 and extract all pages of mixed graphics and text, appending V35 to the filename.

A V53 has pages of Plates only, from a V3. So a typical V53 page will consist of two discrete parts – the plate image itself, and the textual ‘Legend’ or description of the plate– the bit of text under the picture.

V35s and V53s will form part of a slideshow collection on our Image Server which we hope to release soon, so watch this blog for future postings.

Highlighting Footnotes

The majority of books in our library are reference and historical works, many of which have footnotes. Some footnotes are so detailed there isn’t enough room on one page, so they span multiple pages. As our recognition process captures the original format and layout of the book, the context of the footnotes is retained, even when footnotes span multiple pages; this wouldn’t be so for plain text OCR.

 

 

 

 

Ultrapedia Library – Browsing Explained – Part 1

Browse the Library

Visit our Home Page

Recent library updates now show more detailed information on each book in the library. Let me explain….

The Library is browsable by Book Title, Book Author, Genre, Publisher and ASIN. Besides the Book Title and Author, each library entry has lots of additional information about the book.

You have to expand the Table View when browsing to see additional information. The Expanded View shows a thumbnail of the Title Page, as well as the Publisher, Genre, Release, Volume No, Pages, Format, LCCN, ISBN, Subject and Summary; plus links to Amazon and a Download link – to download the PDF file of the book. Let me tell you more about each feature and how it works.

 

Expanded View – Title Page and Thumbnail

Browsing the library – mouse over an Entry in the Table View – see example 1 – and click on it to expand the view – see example 2. Now you can see the full information on the book, including Publisher, Genre, Release, Volume No, Pages, Format, LCCN, ISBN, Subject and Summary. Clicking the ‘load’ button, loads a thumbnail of the Title Page – see example 3. The ‘open’ button opens a new browser window with an enlarged view of the thumbnail.

Example 1 – Table View

Mouse over an entry to highlight and click on it to expand the view

 

Example 2 – Expanded View

Click the ‘load’ button to load the Thumbnail of the Title Page

 

Example 3 – Expanded View

Loaded Title Page Thumbnail

 

The Amazon Link

Beside the thumbnail, is an Amazon Link and Download File link. The Amazon link opens in a new browser window and takes you to Amazon if there is a re-print of the book available. Where there is no link to Amazon, our Showcase page opens instead, just for fun.

Download File link

The Download File link will download a copy of the recognised version of the book in PDF format, to your computer. Remember to Login first or Register. Registering is free and only requires your email address.

Author

The Author of the book.

Publisher

The Publisher of the book.

Genre

Genre is similar to a category, class or style. Books can have multiple genre’s.

Release

Release refers to the release date, or the year the book was published.

Volume No:

Refers to the Volume number of the book. You’ll see in example 4 below, that the Volume number is C – not a number at all. C means the book is complete in one volume.

Example 4 – Volume No

Here’s a list of other variations on the Volume No:

1 thru… = The numbers 1 onwards refer to the Volume number. Hansards Parliamentary Debates for example, has more than 300 volumes. You’ll find volume 304 in our library.

ALL = All the Volumes of a multi-volume set are available in one volume.

C = Complete in one volume – there is only one volume.

X = No Volume information available – The volume information generally comes from the title page of the book. In some cases the volume number isn’t marked on the title page, or the title page is missing or illegible.

Blank or empty field = No information available yet, and will be updated the near future.

Language

The language the book is written in.

Pages

The number of pages in the book. It should be noted that there are some inconsistencies. The book Magna Britannia has over 280 pages, while the entry in the example 5 below shows ‘v’.

Example 5 – Expanded View

Shows incorrect number of pages in the expanded view

This blog entry forms part of our help and FAQ system and can be found on our website. Part two is coming soon.