Package json_to_relation :: Module json_to_relation :: Class JSONToRelation
[hide private]
[frames] | no frames]

Class JSONToRelation

source code

object --+
         |
        JSONToRelation

Given a source with JSON structures, derive a schema, and construct a relational table. Source can be a local file name, a URL, or an StringIO pseudofile.

JSON structures in the source must be one per line. That is, each line in the source must be a self contained JSON object. Pretty printed strings won't work.

Instance Methods [hide private]
 
__init__(self, jsonSource, destination, outputFormat=0, schemaHints={}, jsonParserInstance=None, loggingLevel=30, logFile=None)
Create a JSON-to-Relation converter.
source code
 
convert(self, prependColHeader=False)
Main user-facing API method.
source code
 
ensureColExistence(self, colName, colDataType)
Given a column name and MySQL datatype name, check whether this column has previously been encountered.
source code
 
processFinishedRow(self, filledNewRow, outFd)
When a row is finished, this method processes the row as per the user's disposition.
source code
(ColumnSpec)
getSchema(self)
Returns an ordered list of ColumnSpec instances.
source code
 
getColHeaders(self)
Returns a list of column header names collected so far.
source code
 
getNextNewColPos(self)
Returns the position of the next new column that may need to be added when a previously unseen JSON label is encountered.
source code
 
bumpNextNewColPos(self) source code
String
ensureLegalIdentifierChars(self, proposedMySQLName)
Given a proposed MySQL identifier, such as a column name, return a possibly modified name that will be acceptable to a MySQL database.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  MAX_SQL_INT = 2147483647.0
  MIN_SQL_INT = -2147483648.0
  LEGAL_MYSQL_ATTRIBUTE_PATTERN = re.compile(r'^[\$\w]+$')
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, jsonSource, destination, outputFormat=0, schemaHints={}, jsonParserInstance=None, loggingLevel=30, logFile=None)
(Constructor)

source code 

Create a JSON-to-Relation converter. The JSON source can be a file with JSON objects, a StringIO.StringIO string pseudo file, stdin, or a MongoDB

The destination can be a file, where CSV is written in Excel-readable form, stdout, or a MySQL table specification, where the ouput rows will be inserted.

SchemaHints optionally specify the SQL types of particular columns. By default the processJSONObs() method will be conservative, and specify numeric columns as DOUBLE. Even though all encountered values for one column could be examined, and a more appropriate type chosen, such as INT when only 4-byte integers are ever seen, future additions to the table might exceed the INT capacity for that column. Example

If schemaHints is provided, it is a Dict mapping column names to ColDataType. The column names in schemaHints must match the corresponding (fully nested) key names in the JSON objects:

   schemaHints dict: {'msg.length' : ColDataType.INT,
                      'chunkSize' : ColDataType.INT}

For unit testing isolated methods in this class, set jsonSource and destination to None.

Parameters:
  • jsonSource ({InPipe | InString | InURI | InMongoDB}) - subclass of InputSource that wraps containing JSON structures, or a URL to such a source
  • destination ({OutputPipe | OutputFile | OutputMySQLTable}) - instruction to were resulting rows are to be directed
  • outputFormat (OutputFormat) - format of output. Can be CSV or SQL INSERT statements
  • schemaHints (Map<String,ColDataTYpe>) - Dict mapping col names to data types (optional)
  • jsonParserInstance ({GenericJSONParser | EdXTrackLogJSONParser | CourseraTrackLogJSONParser}) - a parser that takes one JSON string, and returns a CSV row. Parser also must inform this parent object of any generated column names.
  • loggingLevel ({logging.DEBUG | logging.WARN | logging.INFO | logging.ERROR | logging.CRITICAL}) - level at which logging output is show.
Raises:
  • ValueErrer - when value of jsonParserInstance is neither None, nor an instance of GenericJSONParser, nor one of its subclasses.
  • ValueError - when jsonSource is not an instance of InPipe, InString, InURI, or InMongoDB
Overrides: object.__init__

convert(self, prependColHeader=False)

source code 

Main user-facing API method. Read from the JSON source establish in the __init__() call. Create a MySQL schema as the JSON is read. Convert each JSON object into the requested output format (e.g. CSV), and deliver it to the destination (e.g. a file)

Parameters:
  • prependColHeader (Boolean) - If true, the final destination, if it is stdout or a file, will have the column names prepended. Note that this option requires that the output file is first written to a temp file, and then merged with the completed column name header row to the final destination that was specified by the client.

ensureColExistence(self, colName, colDataType)

source code 

Given a column name and MySQL datatype name, check whether this column has previously been encountered. If not, a column information object is created, which will eventually be used to create the column header, Django model, or SQL create statements.

Parameters:
  • colName (String) - name of the column to consider
  • colDataType (ColDataType) - datatype of the column.

processFinishedRow(self, filledNewRow, outFd)

source code 

When a row is finished, this method processes the row as per the user's disposition. The method writes the row to a CSV file, inserts it into a MySQL table, and generates an SQL insert statement for later.

Parameters:
  • filledNewRow (List<<any>>) - the list of values for one row, possibly including empty fields
  • outFd (OutputDisposition) - an instance of a class that writes to the destination

getSchema(self)

source code 

Returns an ordered list of ColumnSpec instances. Each such instance holds column name and SQL type.

Returns: (ColumnSpec)
ordered list of column information

ensureLegalIdentifierChars(self, proposedMySQLName)

source code 

Given a proposed MySQL identifier, such as a column name, return a possibly modified name that will be acceptable to a MySQL database. MySQL accepts alphanumeric, underscore, and dollar sign. Identifiers with other chars must be quoted. Quote characters embedded within the identifiers must be doubled to be escaped.

Parameters:
  • proposedMySQLName (String) - input name
Returns: String
the possibly modified, legal MySQL identifier