|
General questions
Veridian questions
Greenstone questions
General answers
– Who is Digital Library Consulting and what do they do?
DL Consulting was founded in 2002, and is based in Hamilton, New Zealand. It was founded
by Stefan Boddie, who prior to starting DL Consulting worked for five years as one of the lead designers
and developers of the Greenstone Digital Library Software, at the University of Waikato. DL Consulting
continues to contribute to the ongoing development of the Greenstone open source software, and is the
world's leading provider of commercial support for Greenstone. In addition we develop and support our
commercial digital library software platform, Veridian.
DL Consulting is extremely focused on the very specialized field of digital libraries and digitization.
We provide expertise, support, and software tools, to organizations undertaking projects to make organized
collections of information available and searchable on the Internet.
– We keep reading on this website about “digital library software”
and “content management systems” and “digital content delivery systems”.
Are they the same thing?
Yes. Unfortunately there is no clear terminology in use to describe software for
organizing digital collections and delivering them on the Internet. CONTENTdm from OCLC uses
the phrase “digital collection management software”, while DigiTool from Ex Libris
uses “digital asset management software”. We’ve traditionally called Greenstone
and Veridian™ “digital library software”. All these systems however, despite their
relative strengths and weaknesses, are performing a similar role. That is, they’re systems for
organizing digital collections and delivering them on the Internet (or in some cases on CD-ROM or
DVD).
– Veridian™ and Greenstone are both described as being “digital library software”.
What is the difference between them?
Greenstone is very general and very flexible digital library software, which can be applied to
nearly any digital library project. It can ingest digital items in nearly any format, including even audio
and video files.
Veridian™ is specialized digital library software which has been developed to ingest just digitized
printed materials, and specifically those digitized to the METS and ALTO standards. This type of data is very
rich in content, and Veridian™ is built to take advantage of that richness in ways which more general
digital library software products cannot.
The first question regarding a new digital library project is of course “What materials will the library
contain?” If the answer to that question is printed textual materials (books, journals, newspapers, etc.),
and those materials are not currently in digital form (i.e. they’re physical documents or are on microfilm)
then we encourage the use of METS/ALTO. That is, at present we believe METS/ALTO is the best and richest format
available for new digitization of printed materials. If METS/ALTO is selected, then Veridian™ is the best
choice of delivery software. If your new digital library is to include non-textual materials (audio, video, photographs,
etc.) or if it is already available in digital form (other than as METS/ALTO) then a customized Greenstone solution is
probably most appropriate.
– I have a collection of born-digital documents, or documents which have already been converted to
digital (but which are not METS/ALTO). Can I still create a digital library with
Veridian™ or Greenstone?
Yes, with either Veridian or Greenstone, depending on your data. Certain types of born-digital files (e.g.
some PDF documents) are possible to convert to METS/ALTO. In those cases we recommend using Veridian, to take advantage
of the more advanced display options it provides. Veridian also supports display of digital images of printed materials,
even when no METS/ALTO data is available. For all other digital data (including non-textual data like photographs,
audio, and video) we recommend building a digital library using a customized Greenstone solution. Greenstone is very
flexible and has built-in support for ingesting a wide range of digital formats. This includes Microsoft Word, HTML, METS,
MARC, digital images in most popular formats, a number of database formats, and many others. If your existing digital
files are not natively supported by Greenstone we can customize it to suit.
Veridian answers
– Veridian™ expects source data to be METS/ALTO. What exactly is METS/ALTO?
The Metadata Encoding and Transmission Standard (METS) has been around for some time, and is a standard with
which many library professionals will be familiar. It’s a standard for encoding descriptive, administrative,
and structural metadata regarding objects within a digital library, using XML. The METS standard is maintained by
the Library of Congress, and is developed as an initiative of the Digital Library Federation.
While METS is great at describing the structure of a digital object, it’s missing the ability to describe
the content and layout of each piece of the digital object. For that we need an extension to METS called ALTO
(Analyzed Layout and Text Object). This combination of METS and ALTO was originally developed by the METAe project,
and was later adopted by the Library of Congress for their large-scale National Digital Newspaper Program (NDNP).
Since then METS/ALTO has been used in many large (and small) newspaper digitization projects, as well as a number of
projects digitizing books and journals.
METS/ALTO provides rich digital objects, which allows for extremely rich digital library presentation systems
to be built. For example, a typical METS/ALTO object encodes not only the complete logical and physical structure
of a document (i.e. chapters, sections, articles, pages, etc., and their associated metadata), but also the
full-text content of each section of the document, and even the physical coordinates of every word in the document!
– We like the idea of using METS/ALTO data for our project, but how do we produce it?
Digital Library Consulting have partnerships with a number of organizations who specialize in converting
physical and microfilmed documents to METS/ALTO. Many of these organizations use the
CCS docWORKS
software from Content Conversion Specialists GmbH in Germany. docWORKS is a software system which automatically
zones and identifies all of the sections on an image of a printed page, while at the same time extracting
searchable text with Optical Character Recognition (OCR) technology. Following processing all this extracted
information is stored in XML, as METS/ALTO. Those intending to digitize their documents to METS/ALTO have two
main options — outsource the work to a contractor (like one of our partners) or purchase docWORKS and
produce METS/ALTO in-house. We can advise on the most suitable and cost-effective approach for your specific
project.
It is important to note that digitizing documents to METS/ALTO format is typically not more expensive than
using alternative (and less flexible) formats.
– I’m a little confused now! Which part of the process is carried out by CCS docWORKS and which part
does Veridian™ do?
CCS docWORKS is software for converting physical or microfilm documents to the METS/ALTO format.
Veridian™ is software for organizing that METS/ALTO data and displaying it on the Internet.
Put another way, docWORKS is the “conversion software”, while Veridian™
is the “display software”.
– Can Veridian™ be customized to ingest data which is not METS/ALTO?
Veridian™ was developed specifically to take advantage of the very rich information available with
collections digitized using METS/ALTO. It does also support collections of scanned images of documents
without METS/ALTO (see below). It is also possible to customize Veridian™ to
suit alternative data formats, but this is typically only worthwhile in certain specific cases. For example,
in the past Digital Library Consulting have developed customized import modules to support proprietary
data formats, where the owners of data in those formats had decided to switch to using METS/ALTO, but
didn’t want to have to reprocess all their old data. That is, they ended up with a hybrid
version of Veridian which could ingest both their old proprietary data and their new METS/ALTO data.
We can also convert certain types of data to METS/ALTO for ingest into Veridian.
– Why is it useful for Veridian to ingest digital images, when those images don't have METS/ALTO?
This feature is often used in large projects where data is digitized to METS/ALTO over many months or years
That is, an entire collection of books, newspapers, or other data can be scanned, and all the scanned images
can be placed in Veridian right away. This makes it possible to browse and read the new digital data in
Veridian, but it is of course not searchable (as it has not yet been through an OCR process). Over time batches
of the scanned images can be processed to METS/ALTO, with the METS/ALTO data added to Veridian as it becomes
available. The time and expense necessary to produce METS/ALTO for an entire large collection can in this way
be spread out over months or years, with the entire collection available in Veridian from the beginning,
but slowly being upgraded with searchable text over time.
– I know that METS/ALTO and Veridian™ are used for big newspaper digitization projects, but is it suitable
for collections of books, journals, and other printed materials?
Yes, METS/ALTO is a very flexible XML data format, and is regularly used for digitizing books, journals,
and other printed materials, as well as newspapers. The Veridian™ software meanwhile has been developed
to be equally adept at delivering digitized books and journals as it is at delivering digitized newspapers.
– How many items can I put in a single Veridian™ collection?
Veridian™ is hugely scalable, and there are no practical limits to the amount of information a collection
may contain. As an example, collections have been built containing more than two million large newspaper pages,
which equates to nearly 30 million articles. Even larger collections are possible, and Veridian™ can even
be distributed across multiple servers to allow for almost unlimited scalability.
– Does Veridian™ support languages other than English?
Yes. Veridian™ uses Unicode throughout, so is fully compliant with any language and character set that
can be displayed by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean, Hebrew,
and other non-latin alphabets. Unicode is used both in creating multi-lingual user interfaces and when displaying
(and searching) the content of the collection itself.
– How is new data loaded into Veridian™?
New METS/ALTO data is typically delivered by data conversion contractors in batches,
so in the case of a very large digitization project a new batch of data may be delivered
every few weeks or few months. For a smaller project all the data may be delivered in just
one or two batches. In either case each new batch of data is ingested into Veridian™
with a single batch import process. Detailed documentation is made available to ensure
Veridian™ users can easily ingest new data batches into their collection, and every
Veridian™ license includes a support and maintenance contract from Digital Library
Consulting, allowing us to support you with the addition of new batches of data to
your Veridian™ collections.
– What operating systems will Veridian™ run under?
A Veridian™ server runs under Windows Server 2003, Windows Server 2008, Redhat
Enterprise Linux, Centos Linux, Solaris 10, OpenSolaris, and Macintosh OS X Server operating
systems. Please see our requirements page for more details.
– What web server do I need to run Veridian™?
Veridian™ runs under Microsoft IIS and Apache web servers.
– What are the recommended web browsers for viewing Veridian™ collections?
Veridian collections can be viewed with any of the major modern web browsers. This includes Internet
Explorer 6.0+, Firefox 2.0+, Safari 2.0+, and Google Chrome. Adobe Reader or a similar PDF viewing plugin
is required to view PDF versions of pages, articles, and sections (though Veridian™ may be configured
to not deliver PDF files if required).
– Can a Veridian™ collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?
No. If one of the requirements of your project is to deliver your digitized material on CD-ROM or
DVD please ask us about our customized Greenstone solutions.
– Does Veridian™ support OAI-PMH?
Yes, a Veridian™ collection can be configured to make metadata available to other applications
through the OAI-PMH protocol for metadata harvesting.
– Is there an open API for Veridian™? Can we develop custom interfaces to the Veridian™ server?
Yes, the optional Veridian™ web services module adds the flexibility
of building custom clients to interface with the Veridian™ server.
Greenstone answers
– Greenstone is open source software. What does that mean for us?
Greenstone has been under constant development for more than ten years, so is very mature, stable, and well tested.
The advantages of using open source software include the ability to modify the software in-house, and a reduced
reliance on a single vendor to support and maintain your software. Greenstone can be downloaded for free from
www.greenstone.org.
– If Greenstone is free open source software, why do we need Digital Library Consulting?
You don’t! Anyone is free to download, install, and modify Greenstone to suit their needs. Greenstone
is complex digital library software however, and there is a steep learning curve for those wishing to make
significant changes to the way it works, or even to the way it looks. Digital Library Consulting have been
involved with the development of Greenstone since its inception, so are uniquely well qualified to provide
Greenstone-related support to those who require it.
– Does Greenstone support languages other than English?
Yes. Like Veridian™, Greenstone uses Unicode
throughout, so is fully compliant with any language and character set that can be displayed
by modern web browsers. This includes languages using Cyrillic, Chinese, Japanese, Korean,
Hebrew, and other non-latin alphabets. Unicode is used both in creating multi-lingual user
interfaces and when displaying (and searching) the content of the collection itself.
– What file types does Greenstone support?
Greenstone includes built-in support for ingesting a large number of different digital
file formats. This includes Microsoft Word, Excel, and Powerpoint, PDF and postscript, RTF,
HTML, and plain text, MARC, Refer, Procite, a variety of email formats, lots of audio, video,
and image formats, and many others. It also supports the ingest of data exported from other digital
library and database systems, including CONTENTdm and DSpace. If you have digital data in a format
which is not already supported by Greenstone Digital Library Consulting can help.
– Can a Greenstone collection be put on a DVD or CD-ROM, in addition to being delivered on the Internet?
Yes, Greenstone collections can easily be exported to a CD-ROM or DVD. They look the
same when run in this way as they do when viewed on the Internet.
|