Veridian questions
Veridian answers
– Veridian™ expects source data to be METS/ALTO. What exactly is METS/ALTO?
The Metadata Encoding and Transmission Standard (METS) has been around for some time, and is a standard with
which many library professionals will be familiar. It’s a standard for encoding descriptive, administrative,
and structural metadata regarding objects within a digital library, using XML. The METS standard is maintained by
the Library of Congress, and is developed as an initiative of the Digital Library Federation.
While METS is great at describing the structure of a digital object, it’s missing the ability to describe
the content and layout of each piece of the digital object. For that we need an extension to METS called ALTO
(Analyzed Layout and Text Object). This combination of METS and ALTO was originally developed by the METAe project,
and was later adopted by the Library of Congress for their large-scale National Digital Newspaper Program (NDNP).
Since then METS/ALTO has been used in many large (and small) newspaper digitization projects, as well as a number of
projects digitizing books and journals.
METS/ALTO provides rich digital objects, which allows for extremely rich digital library presentation systems
to be built. For example, a typical METS/ALTO object encodes not only the complete logical and physical structure
of a document (i.e. chapters, sections, articles, pages, etc., and their associated metadata), but also the
full-text content of each section of the document, and even the physical coordinates of every word in the document!
– We like the idea of using METS/ALTO data for our project, but how do we produce it?
Digital Library Consulting have partnerships with a number of organizations who specialize in converting
physical and microfilmed documents to METS/ALTO. Many of these organizations use the
CCS docWORKS
software from Content Conversion Specialists GmbH in Germany. docWORKS is a software system which automatically
zones and identifies all of the sections on an image of a printed page, while at the same time extracting
searchable text with Optical Character Recognition (OCR) technology. Following processing all this extracted
information is stored in XML, as METS/ALTO. Those intending to digitize their documents to METS/ALTO have two
main options — outsource the work to a contractor (like one of our partners) or purchase docWORKS and
produce METS/ALTO in-house. We can advise on the most suitable and cost-effective approach for your specific
project.
It is important to note that digitizing documents to METS/ALTO format is typically not more expensive than
using alternative (and less flexible) formats.
– I’m a little confused now! Which part of the process is carried out by CCS docWORKS and which part
does Veridian™ do?
CCS docWORKS is software for converting physical or microfilm documents to the METS/ALTO format.
Veridian™ is software for organizing that METS/ALTO data and displaying it on the Internet.
Put another way, docWORKS is the “conversion software”, while Veridian™
is the “display software”.
– Can Veridian™ be customized to ingest data which is not METS/ALTO?
Veridian™ was developed specifically to take advantage of the very rich information available with
collections digitized using METS/ALTO. It does also support collections of scanned images of documents
without METS/ALTO (see below). It is also possible to customize Veridian™ to
suit alternative data formats, but this is typically only worthwhile in certain specific cases. For example,
in the past Digital Library Consulting have developed customized import modules to support proprietary
data formats, where the owners of data in those formats had decided to switch to using METS/ALTO, but
didn’t want to have to reprocess all their old data. That is, they ended up with a hybrid
version of Veridian which could ingest both their old proprietary data and their new METS/ALTO data.
We can also convert certain types of data to METS/ALTO for ingest into Veridian.
– Why is it useful for Veridian to ingest digital images, when those images don't have METS/ALTO?
This feature is often used in large projects where data is digitized to METS/ALTO over many months or years
That is, an entire collection of books, newspapers, or other data can be scanned, and all the scanned images
can be placed in Veridian right away. This makes it possible to browse and read the new digital data in
Veridian, but it is of course not searchable (as it has not yet been through an OCR process). Over time batches
of the scanned images can be processed to METS/ALTO, with the METS/ALTO data added to Veridian as it becomes
available. The time and expense necessary to produce METS/ALTO for an entire large collection can in this way
be spread out over months or years, with the entire collection available in Veridian from the beginning,
but slowly being upgraded with searchable text over time.
– I know that METS/ALTO and Veridian™ are used for big newspaper digitization projects, but is it suitable
for collections of books, journals, and other printed materials?
Yes, METS/ALTO is a very flexible XML data format, and is regularly used for digitizing books, journals,
and other printed materials, as well as newspapers. The Veridian™ software meanwhile has been developed
to be equally adept at delivering digitized books and journals as it is at delivering digitized newspapers.
– How many items can I put in a single Veridian™ collection?
Veridian™ is hugely scalable, and there are no practical limits to the amount of information a collection
may contain. As an example, collections have been built containing more than two million large newspaper pages,
which equates to nearly 30 million articles. Even larger collections are possible, and Veridian™ can even
be distributed across multiple servers to allow for almost unlimited scalability.
– Does Veridian™ support languages other than English?
Yes. Veridian™ uses Unicode throughout, so is fully compliant with any language and character set that
can be displayed by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean, Hebrew,
and other non-latin alphabets. Unicode is used both in creating multi-lingual user interfaces and when displaying
(and searching) the content of the collection itself.
– How is new data loaded into Veridian™?
New METS/ALTO data is typically delivered by data conversion contractors in batches,
so in the case of a very large digitization project a new batch of data may be delivered
every few weeks or few months. For a smaller project all the data may be delivered in just
one or two batches. In either case each new batch of data is ingested into Veridian™
with a single batch import process. Detailed documentation is made available to ensure
Veridian™ users can easily ingest new data batches into their collection, and every
Veridian™ license includes a support and maintenance contract from Digital Library
Consulting, allowing us to support you with the addition of new batches of data to
your Veridian™ collections.
– What operating systems will Veridian™ run under?
A Veridian™ server runs under Windows Server 2003, Windows Server 2008, Redhat
Enterprise Linux, Centos Linux, Solaris 10, OpenSolaris, and Macintosh OS X Server operating
systems. Please see our requirements page for more details.
– What web server do I need to run Veridian™?
Veridian™ runs under Microsoft IIS and Apache web servers.
– What are the recommended web browsers for viewing Veridian™ collections?
Veridian collections can be viewed with any of the major modern web browsers. This includes Internet
Explorer 6.0+, Firefox 2.0+, Safari 2.0+, and Google Chrome. Adobe Reader or a similar PDF viewing plugin
is required to view PDF versions of pages, articles, and sections (though Veridian™ may be configured
to not deliver PDF files if required).
– Can a Veridian™ collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?
No. If one of the requirements of your project is to deliver your digitized material on CD-ROM or
DVD please ask us about our customized Greenstone solutions.
– Does Veridian™ support OAI-PMH?
Yes, a Veridian™ collection can be configured to make metadata available to other applications
through the OAI-PMH protocol for metadata harvesting.
– Is there an open API for Veridian™? Can we develop custom interfaces to the Veridian™ server?
Yes, the optional Veridian™ web services module adds the flexibility
of building custom clients to interface with the Veridian™ server.
|