EnXML

EnXML provides a framework for converting XML documents into a compact representation that is more easily used by applications.

Using EnXML in an application involves the following steps:

    1. Create a description of the document.
    2. Define structures and enums to hold the parsed data.
    3. Prepare for EnXML parsing.
    4. Instantiate an XML reader. This document and the code in the Base Stack use the libxml2 memory reader since XML documents are available in their entirety and memory-resident.
    5. Call EnXMLParserProcessNode on the root document node and with a description of the elements which the root node may contain. EnXML will call the specified handler for each element as they are encountered. Handlers may in turn call EnXMLParserProcessNode for child nodes.

Example Usage

The code fragments shown below are from the EnXML example. It can be found in the Base Stack distribution as enatsc3-base/libenatsc3utils/examples/enxml_example.c.

The following XML document is used to illustrate EnXML.

<?xml version = "1.0"?>
<main-node group="animals">
    <child-node group="dogs">Chloe</child-node>
    <child-node group="cats">Leo</child-node> 
    <child-node group="mice">Mini</child-node>
</main-node>

Document Description

The document description is similar to an XML’s schema in that it describes the attributes and elements a document could include. EnXML defines types to describe attributes, elements (child nodes), and nodes.

Attributes and child nodes have an id member. EnXML ORs the ids together to indicate which attributes and child nodes were actually present. As a result it is important to ensure that they are a power of 2 value and have unique values within a particular node.

// Description for child-node
EnXMLParserAttribute vChildNodeAttributes[] = 
{
    {
        .id = ChildNodeAttribute_group, 
        .sName = sizeof("group") - 1, 
        .pName = "group", 
        .handler = ChildNodeAttributeHandler, 
    }, 
}; 
EnXMLParserChildNode vChildNodeChildNodes[] = 
{
}; 
EnXMLParserNode childNodeDescription = 
{ 
    .pName = "ChildNode", 
    .pAttributes = vChildNodeAttributes, 
    .nAttributes = numof(vChildNodeAttributes), 
    .pChildNodes = vChildNodeChildNodes, 
    .nChildNodes = numof(vChildNodeChildNodes), 
    .textHandler = ChildNodeTextHandler, 
};

// Description for main-node
EnXMLParserAttribute vMainNodeAttributes[] =
{
    {
        .id = MainNodeAttribute_group,
        .sName = sizeof("group") - 1,
        .pName = "group",
        .handler = MainNodeAttributeHandler,
    },
    {
        .id = MainNodeAttribute_foobar, 
        .sName = sizeof("foobar") - 1, 
        .pName = "foobar", 
        .handler = MainNodeAttributeHandler, 
    },
};
EnXMLParserChildNode vMainNodeChildNodes[] =
{
    {
        .id = Example_ChildNode,
        .sName = sizeof("ChildNode") - 1,
        .pName = "ChildNode",
        .handler = ProcessChildNode,
    },
};
EnXMLParserNode mainNodeDescription =
{
    .pName = "Example",
    .pAttributes = vMainNodeAttributes,
    .nAttributes = numof(vMainNodeAttributes),
    .pChildNodes = vMainNodeChildNodes,
    .nChildNodes = numof(vMainNodeChildNodes),
    .textHandler = NULL,
};

A description for a node contains two arrays:  an array of attributes and an array of child nodes. The description is like the XML schema in that it describes the attributes and child nodes which may appear in a document. The description also contains a text handler.

The child-node description contains an array of attributes consisting of one item: child-node may have a group. The description has an empty array of child nodes indicating that no child elements are expected.

The main-node description also contains an array of attributes consisting of one item, which also is the group attribute. However it defines its own handler function and also a unique id for the attribute (it is not necessary that these be unique as long as the application knows what to do when the handler is called). The description also indicates that main-node may have child nodes of type child-node and foobar. The sample document doesn’t contain a foobar element but other documents might.

Define structures to hold the parsed data and bitmasks indicating attributes present.

As EnXML processes a node it indicates which attributes and elements were encountered.  To do this it is passed an ExXMLParserNodeBase for each node. EnXMLParserNodeBase contains a presentFlags member which is a mask of the encountered elements and attributes. EnXMLParserNodeBase also includes a nodeType member to indicate the node type. nodeType is not used by EnXML but may be set and used by the application.

In practice it is often useful for applications to extend EnXMLParserNodeBase to include any node-associated data they wish to store, as illustrated below.

typedef struct 
{ 
    EnXMLParserNodeBase base; 
    int32_t groupDataOffset; 
    int32_t nameDataOffset; 
} ParsedChildNode; 

typedef struct 
{ 
    EnXMLParserNodeBase base; 
    EnXMLParserDataOffsetArray childNodes; 
    int32_t groupDataOffset; 
} ParsedMainNode;

Preparing for EnXML Parsing

Preparing for EnXML parsing generally involves initializing variables and storage that will be used to hold the parsed document .In the sample code below the ParsedData type is used to hold parsed data.

EnXML defines several types that can assist in storage.

EnXMLParserDataBuffer is a variable-sized data buffer that grows as needed. It is useful for holding collections of variable-sized objects such as strings or arrays. The application stores the object in the data buffer and notes the offset into the data buffer where the object was stored. Generally only one EnXMLParserDataBuffer is required per document.

EnXMLParserDataOffsetArray holds an array of data offsets and automatically grows as items are appended. As an EnXMLParserDataOffsetArray itself is of varying size it is often convenient to store it in an EnXMLParserDataBuffer. Some care needs to be taken to commit the EnXMLParserDataOffsetArray to the EnXMLDataBuffer after the array is fully populated. This is because the final size of the EnXMLParserDataOffsetArray is not known until a node is fully parsed. The Base Stack code creates a temporary EnXMLParserDataOffsetArray with an initial size which is unlikely to be exceeded. When it is done processing a node the EnXMLParserDataOffsetArray is then stored to the EnXMLParserDataBuffer.

typedef struct 
{
    ParsedMainNode mainNode;
    EnXMLParserDataBuffer dataBuffer;
} ParsedData;

...

ParsedData parsedData;
memset(&parsedData, 0, sizeof(parsedData));
// Initialize the document-wide EnXMLDataBuffer with an initial size of 8192 bytes increased in 
// 4095 byte increments
if (EnXMLParserDataBufferInitialize(&parsedData.dataBuffer, 8192, 4096))
{
    printf("Failed to initialize data buffer\n");
    return -1;
}

Instantiating an XML Reader

Here we are using the libxml2 memory reader:

xmlTextReaderPtr reader = NULL;

// pXML is a pointer to the XML document and sXML is the size of 
// the document in bytes
reader = xmlReaderForMemory(pXML, sXML, NULL, NULL, 0);
if (!reader)
{
    printf("Failed to create xml reader\n");
    return -1;
}

Validate the top-level node type (main-node) and process it

int nodeType;
const xmlChar *pLocalName;

nodeType = xmlTextReaderNodeType(reader);
pLocalName = xmlTextReaderConstLocalName(reader);

// Process the top node locally to ensure we're in fact dealing
// with the expected document type
if (nodeType == XML_READER_TYPE_ELEMENT &&
    !strcmp((char *)pLocalName, "main-node"))
{
    // Since main-node may have several child-node elements, initialize an 
    // EnXMLDataOffsetArray to hold the offsets of the parsed child-node elements. 
    if (EnXMLParserDataOffsetArrayInitialize(&parsedData.mainNode.childNodes,
        &parsedData.dataBuffer, 16, 16))
    {
        printf("Failed to allocate child-node array\n");
        return -1;
    }

    // Process the main node based on its description. The handlers specified
    // in the description will be called as attributes and child nodes are 
    // encountered.
    if (EnXMLParserProcessNode(&mainNodeDescription, reader, &parsedData,
        (EnXMLParserNodeBase *)&parsedData.mainNode))
    {
        printf("Failed to parse main-mode\n");
        return -1;
    }
}
else
{
    printf("Top-level object isn't main-node (%d/%s)\n", nodeType, 
        pLocalName);
    return -1;
}

Handle main-node attributes

static int32_t
MainNodeAttributeHandler(EnXMLParserAttribute *pAttribute,
    const xmlChar *pValue, int32_t sValue, xmlTextReaderPtr reader,
    void *pUserData, EnXMLParserNodeBase *pParentNode)
{
    ParsedData *pParsedData= (ParsedData*)pUserData;
    ParsedMainNode *pMainNode = (ParsedMainNode *)pParentNode;

    switch (pAttribute->id)
    {
        case MainNodeAttribute_group:
            // copy the group into the EnXMLDataBuffer and note the offset
            // in the ParsedMainNode structure.
            if (EnXMLParserDataBufferAddStringAndZeroTerminate(
                &pParsedData->dataBuffer, (char *)pValue, sValue,
                &pMainNode->groupDataOffset))
            {
                printf("Failed to store type\n");
                return -1;
            }
            break;
        default:
            printf("Unhandled attribute %d (%s)\n", pAttribute->id,
                pAttribute->pName);
       break;
    }

    return 0;
}

Process a child-node

static int32_t
ProcessChildNode(EnXMLParserChildNode *pChildNode, xmlTextReaderPtr reader,
    void *pUserData, EnXMLParserNodeBase *pParentNode)
{
    ParsedMainNode *pMainNode = (ParsedMainNode *)pParentNode;
    ParsedData *pParsedData = (ParsedData *)pUserData;
    ParsedChildNode childNode;
    int32_t childNodeOffset;

    memset(&childNode, 0, sizeof(childNode));

    // Allocate space for the node from the data buffer and store the
    // offset in the main-node's childNodes data offset array.
    if (EnXMLParserDataBufferAllocateSpace(&pParsedData->dataBuffer,
        sizeof(childNode), &childNodeOffset))
    {
        printf("Failed to allocate space for aea node\n");
        return -1;
    }

    if (EnXMLParserDataOffsetArrayAppendOffset(&pMainNode->childNodes,
        childNodeOffset))
    {
        printf("Failed to store childnode offset\n");
        return -1;
    }

    // Initialize the childNode
    // Because the memory for dataBuffer may move due to being realloc'd we are 
    // using a stack-allocated version of childNode. It will be committed to the
    // dataBuffer after it is parsed.
    childNode.base.nodeType = NodeType_ChildNode;

    if (EnXMLParserProcessNode(&childNodeDescription, reader, pUserData,
        (EnXMLParserNodeBase *)&childNode))
    {
       printf("Failed to parse child-node\n");
        return -1;
    }

    // Now commit child-node to the dataBuffer
    memcpy(pParsedData->dataBuffer.pBuffer + childNodeOffset, &childNode, sizeof(childNode));

    return 0;
}

Handle child-node attributes

static int32_t
ChildNodeAttributeHandler(EnXMLParserAttribute *pAttribute,
    const xmlChar *pValue, int32_t sValue, xmlTextReaderPtr reader,
    void *pUserData, EnXMLParserNodeBase *pParentNode)
{
    ParsedData *pParsedData = (ParsedData*)pUserData;
    ChildNode *pParsedChildNode = (ParsedChildNode *)pParentNode;

    switch (pAttribute->id)
    {
        case ChildNodeAttribute_group:
            // copy the group into the EnXMLDataBuffer and note the offset
            // in the ParsedChildNode structure.
            if (EnXMLParserDataBufferAddStringAndZeroTerminate(
                &pParsedData->dataBuffer, (char *)pValue, sValue,
                &pChildNode->groupDataOffset))
            {
                printf("Failed to store type\n");
                return -1;
            }
            break;
        default:
            printf("Unhandled attribute %d (%s)\n", pAttribute->id,
                pAttribute->pName);
       break;
    }

    return 0;
}

Handle child-node text

static int32_t
ChildNodeTextHandler(EnXMLParserNode *pNodeData, const xmlChar *pValue,
    int32_t sValue, xmlTextReaderPtr reader, void *pUserData,
    EnXMLParserNodeBase *pParentNode)
{
    ParsedData *pParsedData = (ParsedData*)pUserData; 
    ParsedChildNode *pChildNode = (ParsedChildNode *)pParentNode;

    if (EnXMLParserDataBufferAddStringAndZeroTerminate(&pParsedData->dataBuffer,
        (char *)pValue, sValue, &pChildNode->nameDataOffset))
    {
        printf("Failed to store location\n");
        return -1;
    }
    return 0;
}

Using a parsed document

The following code fragment illustrates traversing a parsed document.

// Now that we have the parsed node let's do something with it...
printf("main-node\n");
if (parsedData.mainNode.base.presentFlags & MainNodeAttribute_group)
{
    printf(" group: %s\n", (char *)EnXMLParserDataBufferGetPointer(
        &parsedData.dataBuffer, parsedData.mainNode.groupDataOffset));
}
if (parsedData.mainNode.base.presentFlags & MainNodeChildNode_ChildNode)
{
    int32_t i;

    for (i = 0; i < parsedData.mainNode.childNodes.nOffsetsUsed; i++)
    {
        ParsedChildNode *pChildNode;

        pChildNode =
            (ParsedChildNode *)EnXMLParserDataOffsetArrayGetPointer(
                &parsedData.mainNode.childNodes, i);
        printf(" child-node %d\n", i);
        if (pChildNode->base.presentFlags & ChildNodeAttribute_group)
        {
            printf(" group: %s\n",
                (char *)EnXMLParserDataBufferGetPointer(
                    &parsedData.dataBuffer, pChildNode->groupDataOffset));
            printf(" name: %s\n",
                (char *)EnXMLParserDataBufferGetPointer(
                    &parsedData.dataBuffer, pChildNode->nameDataOffset));
        }
    }
}