Import-Export
From Armarius
Main documentation page : Development documentation
Documentation for concerned classes can be found at this page : http://anayaweb.net/armarius/doc/
Specification of the import-export functionnality.
Armarius allows to enlarge existing collection by importing new ones in different ways.
- One way is to use the ftp import function.
- Another way is to import XLM based descriptors resulting from the word spotting unit
[edit] Specification for the "ftp import module"
New image files representing pages stored in folders representing collections can be uploaded to the public ftp folder of the system. Later administrators can include those files to the library trough the administration panel.
1. Putting files in the ftp folder
Inside the ftp folder we must have folders representing possible collections Each of those folders must contain images or folders with more images representing the tree structure of collections and subcollections.
When importing some options will be showed in order to handle the import operation
Example : ftp/ contains folders col1/, col2/, col3/
ftp/col1/ contains pages p11.jpg, p12.jpg, p13.jpg
ftp/col1/ contains folders col11/; col12,
ftp/col1/col11 contains pages p111.jpg, p112.jpg, p113.jpg
the rest of folders contain some images
2. Select the folder(s) to import A new collection having the folder name will be created in the system, the administrator can specify the parent for this collection.
Ex. Available options will be col1/ col2 and col3 We suppose that the administrator choses col1/ and col2/ and he decides to attach it to the root of the system So collection "col1" and "col2" will be created. Those collections will be siblings and their parent will be the root
3. Does the import must be done in a recursive way In a recursive way each folder will be scanned for folder structure and the system will import all the pages inside those folders
Ex. col1/ and its 3 pages are seen but also col1/col11/ and its 3 pages For col2/ there is no difference because the folder does not contain any subfolder
3.1 In that case does the system must reflect folder structure as collection structure If the answer is yes for each found folder inside a new subcollection will be created and attached to its parent.
Ex. collection "col11" will be created having its 3 pages, "col11" will be a sub collection of "col1" "col1" will have its 3 pages and the subcollection "col11"
If the answer is no, all the pages found inside the folders will be given to the parent collection
Ex. "col1" will contain p11.jpg, p12.jpg, p13.jpg, p111.jpg, p112.jpg and p113.jpg as pages.
In a non recursive way only first level images will be found
Ex. col1/col11/ is ignored, its pages will not be added
4. Finally all found pages will be moved from the ftp folder to the system image folder If the folder was treated recursively it will be empty now and can be deleted If it wasn't all subfolders will be moved one level up in order to find them later.
Ex. *Recursive : "col11" has no images, "col1" is empty too => folder ftp/col1/ can be removed ftp has : "col1" to delete, "col2" to delete and "col3" to keep *Non recursive : "col1" has no images however "col11" has until its images => "col11" must be moved up in order to empty "col1" ftp has : "col1" to delete, "col2" to delete, "col11" to keep and "col3" to keep
[edit] Specification for the "Page import/export module"
Vocabulary
- path to pages: defined in the config.php file as "CFG_IMG_IMGDIR" constant with the value "data/images/pages"
- folder : a folder inside this path
Notice :
- All these functions will be only available in administration mode in order to allow the automatical expansion of the collection.
- The values for the operations will be found by parsing the XML file produced by the WordSpotting application that contains the results of the searches made on the collection.
See an exemple of XML result page
See an exemple of Word Spotting result
[edit] Functions Specification
- function listPagesInFolder(folder)
The work of this function is to analyze in a recursive way the content of the folder given in the parameter "folder" in order to search and locate all available images at this place.
The function will return an array containing those results.
- function addPage(page)
The objective of this function is to add the page indicated by the "page" parameter to the namespace handler module on the server side. "page" parameter must represent a relative address into the path.
The page will be stored at the "PREFIX_pages" table where:
- pg_filename will match the location of the file inside the path including the image extension
- pg_title will the file's name without any extension.
In case of the same page already exists in the system the warning "Duplicate page" will be shown.
The function will return the "true" value if the operation was conducted successfully or "false" in case of an error occurs.
- function addTranscription(text, id_page, left, top, width, height, id_user)
This function makes two things :
- First it stores the rectangle given by "left, top width and height" parameters representing the annotation on the indicated page by the "id_page" parameter into the PREFIX_pages table on the Armarius Database.
The function must translate the absolute position of the hit in the page (given by the application) into relative coordinates used by the Armarius system.
The id_user will be used in the future to indicate the user that have created the annotation. At the moment this feature is not used.
- Secondly it saves the string text parameter that represents a comment or a transcrption into the annotation tables.
Notice that transcription must content entire comments and those comments should be longer than 100 characters wich is the original concived length in database structure.
In the future we pretends to save comments in rich text format. At the moment this feature is not still implemented.
The function will return the "true" value if the operation was conducted successfully or "false" in case of an error occurs.


