felix.thirdpart
Class XmlInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
          extended by org.apache.hadoop.mapreduce.lib.input.TextInputFormat
              extended by felix.thirdpart.XmlInputFormat

public class XmlInputFormat
extends org.apache.hadoop.mapreduce.lib.input.TextInputFormat

Reads records that are delimited by a specific begin/end tag -- ACK: THIS THIRD-PART CLASS IS NOT WRITTEN BY FELIX'S AUTHORS.


Nested Class Summary
static class XmlInputFormat.XmlRecordReader
          XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
 
Field Summary
static java.lang.String END_TAG_KEY
           
static java.lang.String START_TAG_KEY
           
 
Constructor Summary
XmlInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Returns XMLRecord reader to read xml document.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START_TAG_KEY

public static final java.lang.String START_TAG_KEY
See Also:
Constant Field Values

END_TAG_KEY

public static final java.lang.String END_TAG_KEY
See Also:
Constant Field Values
Constructor Detail

XmlInputFormat

public XmlInputFormat()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                org.apache.hadoop.mapreduce.TaskAttemptContext context)
Returns XMLRecord reader to read xml document.

Overrides:
createRecordReader in class org.apache.hadoop.mapreduce.lib.input.TextInputFormat