Digital Library Consulting Logo Making digital libraries easy
Veridian™
 

Veridian questions


Veridian answers

  Veridian™ expects source data to be METS/ALTO. What exactly is METS/ALTO?

The Metadata Encoding and Transmission Standard (METS) has been around for some time, and is a standard with which many library professionals will be familiar. It’s a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, using XML. The METS standard is maintained by the Library of Congress, and is developed as an initiative of the Digital Library Federation.

While METS is great at describing the structure of a digital object, it’s missing the ability to describe the content and layout of each piece of the digital object. For that we need an extension to METS called ALTO (Analyzed Layout and Text Object). This combination of METS and ALTO was originally developed by the METAe project, and was later adopted by the Library of Congress for their large-scale National Digital Newspaper Program (NDNP). Since then METS/ALTO has been used in many large (and small) newspaper digitization projects, as well as a number of projects digitizing books and journals.

METS/ALTO provides rich digital objects, which allows for extremely rich digital library presentation systems to be built. For example, a typical METS/ALTO object encodes not only the complete logical and physical structure of a document (i.e. chapters, sections, articles, pages, etc., and their associated metadata), but also the full-text content of each section of the document, and even the physical coordinates of every word in the document!

  We like the idea of using METS/ALTO data for our project, but how do we produce it?

Digital Library Consulting have partnerships with a number of organizations who specialize in converting physical and microfilmed documents to METS/ALTO. Many of these organizations use the CCS docWORKS software from Content Conversion Specialists GmbH in Germany. docWORKS is a software system which automatically zones and identifies all of the sections on an image of a printed page, while at the same time extracting searchable text with Optical Character Recognition (OCR) technology. Following processing all this extracted information is stored in XML, as METS/ALTO. Those intending to digitize their documents to METS/ALTO have two main options — outsource the work to a contractor (like one of our partners) or purchase docWORKS and produce METS/ALTO in-house. We can advise on the most suitable and cost-effective approach for your specific project.

It is important to note that digitizing documents to METS/ALTO format is typically not more expensive than using alternative (and less flexible) formats.

  I’m a little confused now! Which part of the process is carried out by CCS docWORKS and which part does Veridian™ do?

CCS docWORKS is software for converting physical or microfilm documents to the METS/ALTO format. Veridian™ is software for organizing that METS/ALTO data and displaying it on the Internet. Put another way, docWORKS is the “conversion software”, while Veridian™ is the “display software”.

  Can Veridian™ be customized to ingest data which is not METS/ALTO?

Veridian™ was developed specifically to take advantage of the very rich information available with collections digitized using METS/ALTO. It does also support collections of scanned images of documents without METS/ALTO (see below). It is also possible to customize Veridian™ to suit alternative data formats, but this is typically only worthwhile in certain specific cases. For example, in the past Digital Library Consulting have developed customized import modules to support proprietary data formats, where the owners of data in those formats had decided to switch to using METS/ALTO, but didn’t want to have to reprocess all their old data. That is, they ended up with a hybrid version of Veridian which could ingest both their old proprietary data and their new METS/ALTO data. We can also convert certain types of data to METS/ALTO for ingest into Veridian.

  Why is it useful for Veridian to ingest digital images, when those images don't have METS/ALTO?

This feature is often used in large projects where data is digitized to METS/ALTO over many months or years That is, an entire collection of books, newspapers, or other data can be scanned, and all the scanned images can be placed in Veridian right away. This makes it possible to browse and read the new digital data in Veridian, but it is of course not searchable (as it has not yet been through an OCR process). Over time batches of the scanned images can be processed to METS/ALTO, with the METS/ALTO data added to Veridian as it becomes available. The time and expense necessary to produce METS/ALTO for an entire large collection can in this way be spread out over months or years, with the entire collection available in Veridian from the beginning, but slowly being upgraded with searchable text over time.

  I know that METS/ALTO and Veridian™ are used for big newspaper digitization projects, but is it suitable for collections of books, journals, and other printed materials?

Yes, METS/ALTO is a very flexible XML data format, and is regularly used for digitizing books, journals, and other printed materials, as well as newspapers. The Veridian™ software meanwhile has been developed to be equally adept at delivering digitized books and journals as it is at delivering digitized newspapers.

  How many items can I put in a single Veridian™ collection?

Veridian™ is hugely scalable, and there are no practical limits to the amount of information a collection may contain. As an example, collections have been built containing more than two million large newspaper pages, which equates to nearly 30 million articles. Even larger collections are possible, and Veridian™ can even be distributed across multiple servers to allow for almost unlimited scalability.

  Does Veridian™ support languages other than English?

Yes. Veridian™ uses Unicode throughout, so is fully compliant with any language and character set that can be displayed by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean, Hebrew, and other non-latin alphabets. Unicode is used both in creating multi-lingual user interfaces and when displaying (and searching) the content of the collection itself.

  How is new data loaded into Veridian™?

New METS/ALTO data is typically delivered by data conversion contractors in batches, so in the case of a very large digitization project a new batch of data may be delivered every few weeks or few months. For a smaller project all the data may be delivered in just one or two batches. In either case each new batch of data is ingested into Veridian™ with a single batch import process. Detailed documentation is made available to ensure Veridian™ users can easily ingest new data batches into their collection, and every Veridian™ license includes a support and maintenance contract from Digital Library Consulting, allowing us to support you with the addition of new batches of data to your Veridian™ collections.

  What operating systems will Veridian™ run under?

A Veridian™ server runs under Windows Server 2003, Windows Server 2008, Redhat Enterprise Linux, Centos Linux, Solaris 10, OpenSolaris, and Macintosh OS X Server operating systems. Please see our requirements page for more details.

  What web server do I need to run Veridian™?

Veridian™ runs under Microsoft IIS and Apache web servers.

  What are the recommended web browsers for viewing Veridian™ collections?

Veridian collections can be viewed with any of the major modern web browsers. This includes Internet Explorer 6.0+, Firefox 2.0+, Safari 2.0+, and Google Chrome. Adobe Reader or a similar PDF viewing plugin is required to view PDF versions of pages, articles, and sections (though Veridian™ may be configured to not deliver PDF files if required).

  Can a Veridian™ collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?

No. If one of the requirements of your project is to deliver your digitized material on CD-ROM or DVD please ask us about our customized Greenstone solutions.

  Does Veridian™ support OAI-PMH?

Yes, a Veridian™ collection can be configured to make metadata available to other applications through the OAI-PMH protocol for metadata harvesting.

  Is there an open API for Veridian™? Can we develop custom interfaces to the Veridian™ server?

Yes, the optional Veridian™ web services module adds the flexibility of building custom clients to interface with the Veridian™ server.

 
© Digital Library Consulting Ltd.
  feedback
Would you like to give feedback about this web page?  
send feedback   close