Digital Library Consulting Logo Making digital libraries easy
Frequently asked questions

General questions


Veridian questions


Greenstone questions


General answers

  Who is Digital Library Consulting and what do they do?

DL Consulting was founded in 2002, and is based in Hamilton, New Zealand. It was founded by Stefan Boddie, who prior to starting DL Consulting worked for five years as one of the lead designers and developers of the Greenstone Digital Library Software, at the University of Waikato. DL Consulting continues to contribute to the ongoing development of the Greenstone open source software, and is the world's leading provider of commercial support for Greenstone. In addition we develop and support our commercial digital library software platform, Veridian.

DL Consulting is extremely focused on the very specialized field of digital libraries and digitization. We provide expertise, support, and software tools, to organizations undertaking projects to make organized collections of information available and searchable on the Internet.

  We keep reading on this website about “digital library software” and “content management systems” and “digital content delivery systems”. Are they the same thing?

Yes. Unfortunately there is no clear terminology in use to describe software for organizing digital collections and delivering them on the Internet. CONTENTdm from OCLC uses the phrase “digital collection management software”, while DigiTool from Ex Libris uses “digital asset management software”. We’ve traditionally called Greenstone and Veridian™ “digital library software”. All these systems however, despite their relative strengths and weaknesses, are performing a similar role. That is, they’re systems for organizing digital collections and delivering them on the Internet (or in some cases on CD-ROM or DVD).

  Veridian™ and Greenstone are both described as being “digital library software”. What is the difference between them?

Greenstone is very general and very flexible digital library software, which can be applied to nearly any digital library project. It can ingest digital items in nearly any format, including even audio and video files.

Veridian™ is specialized digital library software which has been developed to ingest just digitized printed materials, and specifically those digitized to the METS and ALTO standards. This type of data is very rich in content, and Veridian™ is built to take advantage of that richness in ways which more general digital library software products cannot.

The first question regarding a new digital library project is of course “What materials will the library contain?” If the answer to that question is printed textual materials (books, journals, newspapers, etc.), and those materials are not currently in digital form (i.e. they’re physical documents or are on microfilm) then we encourage the use of METS/ALTO. That is, at present we believe METS/ALTO is the best and richest format available for new digitization of printed materials. If METS/ALTO is selected, then Veridian™ is the best choice of delivery software. If your new digital library is to include non-textual materials (audio, video, photographs, etc.) or if it is already available in digital form (other than as METS/ALTO) then a customized Greenstone solution is probably most appropriate.

  I have a collection of born-digital documents, or documents which have already been converted to digital (but which are not METS/ALTO). Can I still create a digital library with Veridian™ or Greenstone?

Yes, with either Veridian or Greenstone, depending on your data. Certain types of born-digital files (e.g. some PDF documents) are possible to convert to METS/ALTO. In those cases we recommend using Veridian, to take advantage of the more advanced display options it provides. Veridian also supports display of digital images of printed materials, even when no METS/ALTO data is available. For all other digital data (including non-textual data like photographs, audio, and video) we recommend building a digital library using a customized Greenstone solution. Greenstone is very flexible and has built-in support for ingesting a wide range of digital formats. This includes Microsoft Word, HTML, METS, MARC, digital images in most popular formats, a number of database formats, and many others. If your existing digital files are not natively supported by Greenstone we can customize it to suit.


Veridian answers

  Veridian™ expects source data to be METS/ALTO. What exactly is METS/ALTO?

The Metadata Encoding and Transmission Standard (METS) has been around for some time, and is a standard with which many library professionals will be familiar. It’s a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, using XML. The METS standard is maintained by the Library of Congress, and is developed as an initiative of the Digital Library Federation.

While METS is great at describing the structure of a digital object, it’s missing the ability to describe the content and layout of each piece of the digital object. For that we need an extension to METS called ALTO (Analyzed Layout and Text Object). This combination of METS and ALTO was originally developed by the METAe project, and was later adopted by the Library of Congress for their large-scale National Digital Newspaper Program (NDNP). Since then METS/ALTO has been used in many large (and small) newspaper digitization projects, as well as a number of projects digitizing books and journals.

METS/ALTO provides rich digital objects, which allows for extremely rich digital library presentation systems to be built. For example, a typical METS/ALTO object encodes not only the complete logical and physical structure of a document (i.e. chapters, sections, articles, pages, etc., and their associated metadata), but also the full-text content of each section of the document, and even the physical coordinates of every word in the document!

  We like the idea of using METS/ALTO data for our project, but how do we produce it?

Digital Library Consulting have partnerships with a number of organizations who specialize in converting physical and microfilmed documents to METS/ALTO. Many of these organizations use the CCS docWORKS software from Content Conversion Specialists GmbH in Germany. docWORKS is a software system which automatically zones and identifies all of the sections on an image of a printed page, while at the same time extracting searchable text with Optical Character Recognition (OCR) technology. Following processing all this extracted information is stored in XML, as METS/ALTO. Those intending to digitize their documents to METS/ALTO have two main options — outsource the work to a contractor (like one of our partners) or purchase docWORKS and produce METS/ALTO in-house. We can advise on the most suitable and cost-effective approach for your specific project.

It is important to note that digitizing documents to METS/ALTO format is typically not more expensive than using alternative (and less flexible) formats.

  I’m a little confused now! Which part of the process is carried out by CCS docWORKS and which part does Veridian™ do?

CCS docWORKS is software for converting physical or microfilm documents to the METS/ALTO format. Veridian™ is software for organizing that METS/ALTO data and displaying it on the Internet. Put another way, docWORKS is the “conversion software”, while Veridian™ is the “display software”.

  Can Veridian™ be customized to ingest data which is not METS/ALTO?

Veridian™ was developed specifically to take advantage of the very rich information available with collections digitized using METS/ALTO. It does also support collections of scanned images of documents without METS/ALTO (see below). It is also possible to customize Veridian™ to suit alternative data formats, but this is typically only worthwhile in certain specific cases. For example, in the past Digital Library Consulting have developed customized import modules to support proprietary data formats, where the owners of data in those formats had decided to switch to using METS/ALTO, but didn’t want to have to reprocess all their old data. That is, they ended up with a hybrid version of Veridian which could ingest both their old proprietary data and their new METS/ALTO data. We can also convert certain types of data to METS/ALTO for ingest into Veridian.

  Why is it useful for Veridian to ingest digital images, when those images don't have METS/ALTO?

This feature is often used in large projects where data is digitized to METS/ALTO over many months or years That is, an entire collection of books, newspapers, or other data can be scanned, and all the scanned images can be placed in Veridian right away. This makes it possible to browse and read the new digital data in Veridian, but it is of course not searchable (as it has not yet been through an OCR process). Over time batches of the scanned images can be processed to METS/ALTO, with the METS/ALTO data added to Veridian as it becomes available. The time and expense necessary to produce METS/ALTO for an entire large collection can in this way be spread out over months or years, with the entire collection available in Veridian from the beginning, but slowly being upgraded with searchable text over time.

  I know that METS/ALTO and Veridian™ are used for big newspaper digitization projects, but is it suitable for collections of books, journals, and other printed materials?

Yes, METS/ALTO is a very flexible XML data format, and is regularly used for digitizing books, journals, and other printed materials, as well as newspapers. The Veridian™ software meanwhile has been developed to be equally adept at delivering digitized books and journals as it is at delivering digitized newspapers.

  How many items can I put in a single Veridian™ collection?

Veridian™ is hugely scalable, and there are no practical limits to the amount of information a collection may contain. As an example, collections have been built containing more than two million large newspaper pages, which equates to nearly 30 million articles. Even larger collections are possible, and Veridian™ can even be distributed across multiple servers to allow for almost unlimited scalability.

  Does Veridian™ support languages other than English?

Yes. Veridian™ uses Unicode throughout, so is fully compliant with any language and character set that can be displayed by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean, Hebrew, and other non-latin alphabets. Unicode is used both in creating multi-lingual user interfaces and when displaying (and searching) the content of the collection itself.

  How is new data loaded into Veridian™?

New METS/ALTO data is typically delivered by data conversion contractors in batches, so in the case of a very large digitization project a new batch of data may be delivered every few weeks or few months. For a smaller project all the data may be delivered in just one or two batches. In either case each new batch of data is ingested into Veridian™ with a single batch import process. Detailed documentation is made available to ensure Veridian™ users can easily ingest new data batches into their collection, and every Veridian™ license includes a support and maintenance contract from Digital Library Consulting, allowing us to support you with the addition of new batches of data to your Veridian™ collections.

  What operating systems will Veridian™ run under?

A Veridian™ server runs under Windows Server 2003, Windows Server 2008, Redhat Enterprise Linux, Centos Linux, Solaris 10, OpenSolaris, and Macintosh OS X Server operating systems. Please see our requirements page for more details.

  What web server do I need to run Veridian™?

Veridian™ runs under Microsoft IIS and Apache web servers.

  What are the recommended web browsers for viewing Veridian™ collections?

Veridian collections can be viewed with any of the major modern web browsers. This includes Internet Explorer 6.0+, Firefox 2.0+, Safari 2.0+, and Google Chrome. Adobe Reader or a similar PDF viewing plugin is required to view PDF versions of pages, articles, and sections (though Veridian™ may be configured to not deliver PDF files if required).

  Can a Veridian™ collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?

No. If one of the requirements of your project is to deliver your digitized material on CD-ROM or DVD please ask us about our customized Greenstone solutions.

  Does Veridian™ support OAI-PMH?

Yes, a Veridian™ collection can be configured to make metadata available to other applications through the OAI-PMH protocol for metadata harvesting.

  Is there an open API for Veridian™? Can we develop custom interfaces to the Veridian™ server?

Yes, the optional Veridian™ web services module adds the flexibility of building custom clients to interface with the Veridian™ server.


Greenstone answers

  Greenstone is open source software. What does that mean for us?

Greenstone has been under constant development for more than ten years, so is very mature, stable, and well tested. The advantages of using open source software include the ability to modify the software in-house, and a reduced reliance on a single vendor to support and maintain your software. Greenstone can be downloaded for free from www.greenstone.org.

  If Greenstone is free open source software, why do we need Digital Library Consulting?

You don’t! Anyone is free to download, install, and modify Greenstone to suit their needs. Greenstone is complex digital library software however, and there is a steep learning curve for those wishing to make significant changes to the way it works, or even to the way it looks. Digital Library Consulting have been involved with the development of Greenstone since its inception, so are uniquely well qualified to provide Greenstone-related support to those who require it.

  Does Greenstone support languages other than English?

Yes. Like Veridian™, Greenstone uses Unicode throughout, so is fully compliant with any language and character set that can be displayed by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean, Hebrew, and other non-latin alphabets. Unicode is used both in creating multi-lingual user interfaces and when displaying (and searching) the content of the collection itself.

  What file types does Greenstone support?

Greenstone includes built-in support for ingesting a large number of different digital file formats. This includes Microsoft Word, Excel, and Powerpoint, PDF and postscript, RTF, HTML, and plain text, MARC, Refer, Procite, a variety of email formats, lots of audio, video, and image formats, and many others. It also supports the ingest of data exported from other digital library and database systems, including CONTENTdm and DSpace. If you have digital data in a format which is not already supported by Greenstone Digital Library Consulting can help.

  Can a Greenstone collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?

Yes, Greenstone collections can easily be exported to a CD-ROM or DVD. They look the same when run in this way as they do when viewed on the Internet.

 
© Digital Library Consulting Ltd.
  feedback
Would you like to give feedback about this web page?  
send feedback   close