The Papyrological Navigator (PN) allows both simple and complex string-searching across the entire corpus of documents in the database.
Simply entering characters into the search box at the top of the column and clicking 'Search' will return all documents containing that sequence of characters anywhere in their text.
Many more complex kinds of search are also possible, however.
Kinds of search
- Substring As noted above, this is the default. To narrow search results, the '#' character can be used to indicate a word-boundary. For example, searching for the substring 'και' will return all documents containing that sequence of characters (the word 'καί', in 'καῖσαρ' or ᾽καιρός', etc.). Searching for '#και#', however, will return only documents containing the word 'καί'.
- Phrase Unlike substring searching, phrase searching operates on complete words. It is indicated using quotation marks, whether single or double. For instance, searching for '"και ουκ"' will return only documents that contain that exact phrase.
- Proximity While phrase searching is useful for finding words immediately adjacent to each other, it is also possible to specify a maximum distance between two words - for instance, to search for all documents that contain the words 'καί' and 'οὐ' within 10 words or characters of each other.
- Regular Expression Regular expressions are a powerful means by which any possible text configuration can be sought: for instance, "the substring και, followed immediately by any string of characters other than σαρ, followed immediately by a word beginning with τ and containing either an ε or an η'. Regular expressions are necessarily a complex topic, although an understanding even only of the basics can extend the power of your searches considerably; a good tutorial can be found at http://www.regular-expressions.info/
The buttons beneath the text search box all relate to different aspects of these kinds of search.
- AND, OR, NOT These are the standard boolean search operators, and behave as you would expect. Note, however, the distinction between NOT and START-NOT/END-NOT, as described below.
- THEN, NEAR These are used for proximity searching. THEN means that the second term must follow the first term within the specified range of words or characters; NEAR, that it may occur either before or after the first term within that range.
- LEX This button is used for lemmatised searching, i.e., searching for all possible declined or conjugated forms of the term entered. For instance, searching for 'LEX στρατηγός' will return documents containing στρατηγού, στρατηγῷ, etc.
- REGEX Used to indicate that the search uses regular expression syntax
- ABBR Searches for abbreviated forms. For example, 'στρατ ABBR' (which will appear in the search box as 'στρατ°') will find all documents in which only the string ᾽στρατ' appears, as a shortened form of στρατηγός, στρατηγῶ, etc.
- START-NOT, END-NOT Where NOT is used to exclude documents that contain the following term anywhere in their contents, START-NOT and END-NOT are used to specify more precisely the kind of string being sought. For example, 'NOT καισαρ' will return all documents that do not contain the substring καισαρ anywhere in their contents. By contrast, 'και START-NOT σαρ END-NOT' (appearing in the search box as 'και[-σαρ]') means 'all documents that contain the string και when it is not followed by the string σαρ, regardless of whether the string καισαρ also appears in the document'.
- CLEAR clears the search box
- REMOVE (-) Some searches will involve more than one search box. This button removes search boxes that are no longer needed
A series of checkboxes underneath the search buttons allow:
- Conversion from betacode as you type Users missing a polytonic Greek keyboard or font can check this box to enter text using standard Latin alphabet characters in Betacode. A guide to using betacode can be found at http://www.tlg.uci.edu/encoding/
- Search ignoring case When this box is checked, searches are case-insensitive.
- Search ignoring diacritics When this box is checked, breathing and accent marks are ignored in the search.
The row of radio-buttons beneath the checkboxes allows you to decide what section of each document you wish to search.
- Text Searches the content of the document
- Metadata Searches the metadata associated with the document - for example, the identification number, location, and dating of the document. Because this is a freetext search, note that metadata searches are often better conducted using controls other than text search.
- Translation Searches the available translations (if any) of the document.
Note that it is possible to employ the wildcard characters '?' and '*' when searching. The question-mark character ('?') will match any single character; the asterisk ('*') will match any sequence of any number of characters.
For example, the search-string 'στρατηγο?' will find both 'στρατηγος' and 'στρατηγου', while 'στρατηγ*' will match ᾽στρατηγος', 'στρατηγου', 'στρατηγῳ', ᾽στρατηγον', etc.
Note, however, that the exact effect of the asterisk character will depend upon the kind of search you are performing. If you are doing a search for which the basic unit is words (for instance, proximity searches using words as their unit of measure), then the asterisk character will match no further than the end of the word. For example, if you had a document containing the words 'Πτολεμαῖος στρατηγὸς' and searched using the string 'Πτολ*ος', then the search would find and highlight only the word 'Πτολεμαῖος'. If, on the other hand, you were performing a search using characters as the basic unit (most other kinds of search, including character-proximity searches), then the entire phrase 'Πτολεμαῖος στρατηγὸς' would be retrieved and highlighted.
Under most circumstances, this difference in wildcard meaning is negligible. If you see that the Navigator is highlighting extremely lengthy swathes of text in its results, however, the most likely cause is the use of an asterisk wildcard in a character context.
A note on performance
All of the types of query described above should complete quickly. However, when different query types are combined, searches can take a long time to execute, and the server may even time out, erroneously returning no search hits.
In particular, searches that combine lemmatised, proximity, and wildcard searching may take a long while to complete. If you are attempting such a search, it may be useful to bear in mind that word-proximity searches complete much more quickly than character-proximity searches. If, for instance, you are looking for any form of the word λόγιος, followed by any word beginning with the characters 'τοπ', it is much quicker to search for LEX λόγιος THEN τοπ* within 1 word than it is to search for LEX λόγιος THEN τοπ within 1 character.