Development documentation

From Armarius

Jump to: navigation, search

We've tried to put together a list of a few things you should know about the Armarius system before trying to hack into it. This documentation does not attempt to describe every tiny detail of the system, and might be outdated from time to time, so please consult the source code if in doubt. Take a look at the roadmap for the features that need your help.

Please make sure that your work conforms to the Coding guides prior to sending in a patch or committing to the svn.

Contents

[edit] Overview and goals

Our prototype system called Armarius was developed for a project that aims to digitize a collection of around 20.000 pages of manuscripts belonging to the Hungarian mathematicians Janos and Farkas Bolyai. These pages are almost 200 years old, and are deteriorating fast. A particularity of these manuscripts is that they are written in all directions and in various sizes, languages and writing styles. They also contain mathematical notation and symbols that are not used in today's typography. Due to the quality of the paper and ink, there is high statistical correlation between images of different pages, making automatic content extraction and annotation very difficult. Certain pages are almost undecipherable, even for a human reader. A number of scholars have spent years trying to transcribe the collection, but despite all their effort, the transcription is far form complete.

Our system had to fulfill the following goals for this project:

  • Provide an easy-to-use web interface to these manuscripts. Users should be able to navigate easily between the manuscript pages, avoiding the drawbacks found in the previously presented on-line archives. As there are multiple criteria according to which to sort the pages, so the user should be able to choose.
  • Because the two mathematicians are little known worldwide, it should integrate a hypermedia system describing their life and scientific activities. This hypermedia system had to integrate well into the manuscript browser, such that the user could seamlessly navigate between the digitized pages and the hypermedia system providing additional information. Pages inside the hypermedia system had not have any content or size limitations imposed on them (meaning they could contain images, tables and other presentation elements). They had to be easily editable, with a markup language that is both simple for human users to write, and for the server-side scripts to parse and format.
  • Because the sheer volume of the collection, it was necessary to allow our visitors to contribute to the transcription and annotation of these manuscripts. Deciding who has access to what is important and hard to formalize, because a balance is needed between the ease of contribution and protection schemes that try to prevent accidental or intentional content altering.
  • Support internationalization. The interface should be available in multiple languages. The system should keep track of the language in which content is generated.
  • Build on cost-effective open-source technologies.

[edit] Design decisions

To meet the goals presented in , particularly the integration of the hypermedia system, Armarius was desinged to be a modular web application that is able to display and handle multiple types of content. Modularity meant that the application had to be composed of smaller components that interact with with each other through a well-defined interface. However, in order to maximize customizability and minimize deployment costs, these modules had to be easily interchangeable and removable without affecting the rest of the system --- that is, low coupling was desired. This was achieved by creating a central component that provides an extensive, but well-defined API to other parts, so the rest of the modules interact only with it and not with each other.

Armarius also makes use of the Model-View-Controller design pattern to separate application logic from presentation. The files responsible for generating the HTML markup reside in a separate skins directory and contain no application logic code. Similarly, all the layout and styling is done with the use of Cascading Style Sheets, and images files are placed within the skin directory. This makes site customization easy, and also allows designers to create additional skins or themes.

To ease of navigation between the page browser and the hypermedia system, Armarius needed an easy to read URL scheme. A naming scheme, inspired from MediaWiki was designed based on the concepts of ``pages and ``namespaces. In this context page refers to a HTML or other document returned by the web server to the user agent. Namespaces are used to group together pages of similar type, so every page generated by Armarius belongs to a namespace. Internally every namespace is mapped to a module that is responsible for generating the markup for that given namespace. Thus, HTML pages presenting the manuscripts belong to one namespace (the ``volume namespace), while pages from the hypermedia system belong to another namespace, and special pages, such as login and registration forms belong to still another namespace (the ``special namespace). This provides a consistent and flexible of linking system and avoids name clashes. Extending Armarius to support other content types is easy, as only new module needs to be written, without affecting existing code.

The manuscript viewer plays a central role in the system, but according to the overall modular design principle, this part had to be interchangeable, too. Armarius is designed to have multiple manuscript viewers, so the application or the user should be able to choose the most suitable among them. Currently, there is an AJAX manuscript viewer that makes extensive use of JavaScript to load images on-demand as the user navigates the collection. However, a simple HTML-only viewer could be implemented for users who have disabled JavaScript, or who's browsers do not support it. A DjVu-based interface provided through Java applet is another possibility for a manuscript browser. DjVu is a very efficient open format for storing digitized material, but unfortunately no browser is able to display it without an external plugin. See also

Both the AJAX-based and DjVu-based interfaces need to communicate asynchronously with the server. This is handled by a webservice module. Having such modules makes data syndication to other sites and services possible. Therefore, search engines or other digital libraries would be able to browse through the collections hosted with Armarius, and even a manuscript viewer desktop application is possible. For compatibility reasons, all communication is done by exchanging XML files. For further details, see .

To meet the internationalization requirements, all language-specific strings were defined as constants in separate language files. The system then decides which language file to load based on configuration settings and user preferences. Every metadata and hypermedia page has a language indicator associated with it, and only content that matches the interface language is presented to the user. UTF-8 encoding is used for all strings in Armarius. Some configuration settings also have an associated language indicator, this allows for example to translate the title of the site.

Due to cost considerations, the application was based on open source technologies, such as PHP (for server-side scripting) and MySQL (for a database manager). This choice was in part motivated by the popularity of these tools. Both MySQL and PHP are supported by almost any hosting provider. A lot of web developers are familiar with PHP, making it easier for libraries to hire a webmaster to take of the system. MySQL is also very robust and fast, and has proved to be successful in projects with large databases, such as Wikipedia.

Due to its modularity and the presence of the hypermedia system, Armarius does somewhat resamble other content management systems(CMS). Then one might as the question: why implement a completly new system from grounds up, when you can modify an existing content managemnt or wiki application? The answer is simple: despite its flexibility Armarius is not not a general purpuse CMS, it is a system for hosting ancient manuscripts. It will never have shopping carts, forums, or product listings. Extending an existing application that was not designed with the goals that Armarius has would only lead to an inconsistent design, lots of bugs, useless code, and bloat in general.

[edit] System architecture

Image:armarius_modules.png

The entry point to the application is index.php. This script checks a few parameters, initializes the connection the the database, and decides which namespace is the user in. This is done by checking the first GET parameter, and taking the substring before the colon from it. Hence the first GET parameter should never be a name-value pair, but a name, with the syntax similar the URI's. So it should look like namespace:content or simply namespace - the content is optional. It might be followed by arbitrary number of GET parameters. Upon decoding the namespace, the index.php checks if a script with the name of the namespace exists inside /app/pagegen/. If it is, it get to be called. This script is then responsible of any further processing.

For the manuscript viewer we have the possibility to choose between multiple frontends. One of them is the AJAX frontend. It displays manuscript pages inside an IFRAME and loads them on-demand as the users scrolls between the pages. Annotations are also displayed on the page and can be edited by the user in-place. There are plans to incorporate a DjVu-based manuscritpt viewer (to save bandwidth) and a simple html based manuscript viewer, for older browsers that cannot run JavaScript. The scripts from the AJAX frontent communicate with the server by sending requests to scripts inside /app/remote/. The communication is always initiated by the client side, by sending a HTTP request the server. For an overview of the communication protocol between the two parts, see Client-Server Communication

We also develop an Import/Export interface to extend the collections, see Import-Export

Here you can find the documentation of User Management (login)

On the backend, we use MySQL as a database server.(there are no plans currently to support other DBMS)

Propositions

Functionnalities of the last programmed version of Armarius

Propositions concerning collections and views

Propositions for the management of the rights

Specifications for the Wordspotting

[edit] Filesystem layout

  • /view/ - contains the files associated with the fornt-ends
  • /view/frontends/ajax/ - the Javascript-based page viewer
  • /view/frontends/djvu/ - the Java applet-based page viewer
  • /view/skins/ - themes for the site
  • /view/lang/ - language files
  • /app/admin/ - administration interface
  • /app/include/ - scripts used by everything else inside /app
  • /app/install/ - installation scripts
  • /app/pagegen/ - scripts that generate CMS pages
  • /app/lib - external libraries used by Armarius
  • /app/remote/ - scripts invoked by the Java and JavaScript fronted
  • /data/config/ - contains config.php
  • /data/images/pages - images for the manuscript pages
  • /data/images/misc - other images used in the CMS
  • /data/cache/ - cache for resized images, pre-generated pages

[edit] Database structure

For and overview of the structure of the database used by Armarius, see the following image: Image:armarius_database.png
In here "PREFIX" marks the string prepended to the name of each table prevent name collision with tables already in the database that Armarius will be using.

[edit] Localizing

To translate Armarius, you will need to create two files inside lang/: a php file and JavaScipt file. Both should define the same constants and objects as en.php and en.js does, with the strings translated to your language, of course. The new files shoul be named <lang>.php and <lang>.js, where <lang> is the language code according to ISO 639-1, Please note that you will need to save both files with UTF-8 encoding. You will also have to add your language to the array defined in /view/lang/langnames.php. The key should be the language code, and the value should the name of the language (in your language).

[edit] Adding a new metadata type

The metadata handling subsystem is easy to extend in Armarius. When adding a new metadata type, there are two places that need modification: the front-end, because it needs to be able to display the new type to the user, and server part, particularly the remotely accessed scripts that need to know how to validate and save the new metadata type. In both cases only a few functions need to be created, and they need to be registered with the metadata handing system.

[edit] Extending the server side

When creating a response to the client or when receiving data, the server side needs to know how to validate validate the data it received from the client and how to map between the metadata type stored inside in the database as an integer, and the XML tag used to describe it to the client. These things are accomplished inside /app/include/metadata.php. Each type of metadata should have a validating function, a function that takes only one string parameter, and returns the string to be inserted inside the database or NULL in case of a failed validation. This function has to be added then to the $VALIDATING_FUNCTIONS associative array, which maps between XML tags used in the description of the annotation and validating functions. The keys in this array are the names of the tags, and the values are the functions. The $META_TYPES array also has to expanded with the code to use inside the database for you metadata. Note that there should be a one-to-one mapping (ie. bijective map) between metadata tag names and database codes.

TODO: metadata comparison functions, and formatters for sending it to the user.(are these necessary?)

[edit] Extending the client side

See AJAX frontend first.

Extending AJAX frontend is bit more difficult. In this case, the prototype of the Annotation object has to be extended with additional functions and properties. More specifically, you will need to add a new object to Annotation.prototype.meta_types. This new object should contain the following function and properties:

  • title - the name of the metadata as it appears in the Add metadata menu on the annotation. This should be a localized string.
  • css_class - the CSS class of the container that will hold the text and other HTML elements that display this type of metadata.
  • priority - currently unused, but it will determine the order in which metadata will be show inside the annotation popup window
  • displayer - a function that gets called to display the metadata on the popup window. It can be invoked in two cases: when the description of the annotation is fetched from the server and is about to be displayed, or when the user added a new metadata to the annotation. In both cases it should take 3 arguments:
    • anno - the annotation that the function should be acting on
    • editing - a boolean value, indicates whether the annotation is being edited. If it is edit boxes or other modifiable visual elements should be presented to the user to allow him to edit the metadata.
    • data - a metadata-specific information. When a new metadata is added to the annotation, this function is called with the data parameter set to NULL

In both case, this function should create a DIV container element with the class css_class inside the annotation's meta_container and place any other content nodes used .(see also the createMetaDataContainer() function). Setting the class of the container to be the same as css_class property is important, since this is how the annotation knows during editing that that specific metadata has been assigned to it, so it doesn't show it in the Add metadata list.

  • desc_parser - this is a function that that is called by the annotation when it's description XML is fetched from the server. The role of this function is to check if it's metadata type is present in the description, and if it is to extract it and call displayer(). It receives 3 parameters:
    • anno - the annotation that the function should be operating on
    • descr_xml - a framgment of the XML docment describing the annotation, more precisely, the content inside <anno>...</anno>. Note that this is an XMLNode object and not a string.
    • editing - a boolean value, indicates whether the annotation is being edited.
  • desc_generator - this a function that gets called when the annotation is about to be saved. It should generate the XML description (as a string) of any metadata of the new type asoociated to the annotation. It should return an empty string is there is no metadata of this type associated with the annotation. The only parameter to this function is the annotation it will be working on.
Personal tools