edu.vt.marian.Document
Class SgmlDocument

java.lang.Object
  |
  +--edu.vt.marian.Document.SgmlDocument

public class SgmlDocument
extends java.lang.Object
implements edu.vt.marian.common.Document

SgmlDocument

class description: this class represents an NLM SGML document in the system.

designer(s): Jianxin Zhao (jxzhao@csgrad.cs.vt.edu)

implementator(s): Jianxin Zhao (jxzhao@csgrad.cs.vt.edu), Robert France

finished time:

known bugs:

JDK version: 1.1.5

side effects:


Field Summary
static int EXTRACT_ERROR
           
static int INVALID_TAG_NAME
           
static int NULL_DOCUMENT_STRING
           
static int NULL_FIELD_NAME
           
static int NULL_SGML_STRING
           
static int NULL_STREAM
           
static int OK
          those are the return values of methods of this class
 
Constructor Summary
SgmlDocument(java.io.BufferedReader br, edu.vt.marian.common.Debug debug)
          create an SgmlDocument object from the specified stream.
SgmlDocument(java.lang.String documentString, edu.vt.marian.common.Debug debug)
          create an SgmlDocument object from a document string.
 
Method Summary
 edu.vt.marian.common.DigInfObj copy()
          An attempt to get around declaring public clone() methods.
 boolean equals(SgmlDocument d)
          tell whether this object and the parameter object represent the same document.
 java.lang.String getDocumentString()
          return the sgml string of the document this object represents.
 java.lang.String getFieldData(java.lang.String field_name)
          return the data of this document corresponding to the specified field
 java.lang.String getFieldDataByIndex(int index)
          return the data of the specified field.
 java.lang.String getFieldNameByIndex(int index)
          return the name of the specified field.
 java.lang.String getFieldSeparator(java.lang.String fieldName)
          tell the separator between different text strings in the specified field.
 int getNumberFields()
          return the number of fields in this document
 boolean isValid()
          tell whether the object is valid (not whether it has been extracted yet).
 java.util.Vector presentAttributes(int markupType)
          return a Vector of metadata attributes for this document.
 java.lang.String presentFull(int markupType)
          return the full description of this document.
 java.lang.String presentShort(int markupType)
          return the short description of this document in one line.
 int setDocumentString(java.lang.String documentString)
          set the sgml string of the document this object represents.
 int toStream(java.io.PrintWriter pw)
          print the contents of this object to the specified stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OK

public static final int OK
those are the return values of methods of this class

NULL_STREAM

public static final int NULL_STREAM

NULL_DOCUMENT_STRING

public static final int NULL_DOCUMENT_STRING

NULL_SGML_STRING

public static final int NULL_SGML_STRING

EXTRACT_ERROR

public static final int EXTRACT_ERROR

INVALID_TAG_NAME

public static final int INVALID_TAG_NAME

NULL_FIELD_NAME

public static final int NULL_FIELD_NAME
Constructor Detail

SgmlDocument

public SgmlDocument(java.io.BufferedReader br,
                    edu.vt.marian.common.Debug debug)
create an SgmlDocument object from the specified stream.
Parameters:
br - the stream from which to read out this document
debug - used for debugging

SgmlDocument

public SgmlDocument(java.lang.String documentString,
                    edu.vt.marian.common.Debug debug)
create an SgmlDocument object from a document string.
Parameters:
docString - a string encoding this document in SGML
debug - used for debugging
Method Detail

isValid

public boolean isValid()
tell whether the object is valid (not whether it has been extracted yet).

equals

public boolean equals(SgmlDocument d)
tell whether this object and the parameter object represent the same document.

NOTE: At this point we are using String compare on the raw strings to determine equality. This obviously leaves something to be desired.

Parameters:
d - the document used to compare with this object
Returns:
true / false

toStream

public int toStream(java.io.PrintWriter pw)
print the contents of this object to the specified stream.
Parameters:
pw - the stream to which to write this object
Returns:
OK -- this object has been written to the stream correctly

NULL_STREAM -- the parameter stream is null


getDocumentString

public java.lang.String getDocumentString()
return the sgml string of the document this object represents.
Returns:
the raw form of this document as a string

setDocumentString

public int setDocumentString(java.lang.String documentString)
set the sgml string of the document this object represents.
Parameters:
documentString - this will become the new raw string for this document object
Returns:
OK -- the new raw marc record has been set correctly

NULL_DOCUMENT_STRING -- the parameter is null


getNumberFields

public int getNumberFields()
return the number of fields in this document

getFieldNameByIndex

public java.lang.String getFieldNameByIndex(int index)
return the name of the specified field.

getFieldDataByIndex

public java.lang.String getFieldDataByIndex(int index)
return the data of the specified field.

getFieldData

public java.lang.String getFieldData(java.lang.String field_name)
return the data of this document corresponding to the specified field
Returns:
the field data in the form of a String, or

null -- extraction problem


getFieldSeparator

public java.lang.String getFieldSeparator(java.lang.String fieldName)
tell the separator between different text strings in the specified field.
Parameters:
fieldName - name of field to search
Returns:
String, or null if no such field exists.

copy

public edu.vt.marian.common.DigInfObj copy()
An attempt to get around declaring public clone() methods.

presentShort

public java.lang.String presentShort(int markupType)
return the short description of this document in one line.
Parameters:
markupType - how to mark up the string returned (e.g., HTML or ASCII).
Returns:
the short description String.

presentAttributes

public java.util.Vector presentAttributes(int markupType)
return a Vector of metadata attributes for this document.
Parameters:
markupType - how to mark up the string returned (e.g., HTML or ASCII).
Returns:
a Vector of triples [attrName, attrType, attrValue].

presentFull

public java.lang.String presentFull(int markupType)
return the full description of this document.
Parameters:
markupType - how to mark up the string returned (e.g., HTML or ASCII).
Returns:
a (potentially very long) String.