The Intricacies of Textblock Tags in Inline XBRL: A Deeper Look
At Data Amplified 2023 in Zürich, our own Janis Steinmann gave a technical talk about the challenges of InlineXBRL textblock tagging. This article reiterates some of the key arguments of his presentation.
In the realm of InlineXBRL, textblock tags pose unique challenges, notably because of their inconsistent rendering across different viewers and the lack of clear requirements. This article aims to explore the origins of these issues and identify viable solutions to enhance and standardize the rendering of textblock tags.
The example below illustrates this challenge very clearly: The same content is rendered in two different ways, producing a very different outcome.
A quick refresher about the challenges of re-rendering contents from PDF-based InlineXBRL reports can be found as part of this article: Isolated Textblock Rendering: AMANA´s Approach to the Textblock Extraction Challenge
What are textblock tags in InlineXBRL?
If you are using InlineXBRL, chances are you've heard about textblock tags. In the past year all ESEF filers in Europe had to use textblock tags in their annual reports for the first time. While numeric tags are simple and straightforward, it gets a bit more complex with textblock tags. The content of numeric tags is very restricted, and thus easy to handle. Textblock tags can basically include everything. Looking at the example below, the highlighted table on the right-hand side is for example created by <div> and <span> tags with a lot of styling.
How are they used?
Before we ask ourselves how the content of textblock should be constructed, it is vital first to understand how they are to be used. The honest answer to that question for most of the community currently is “We don’t know”. However, XBRL International is currently crafting a Working Group Note with four possible use cases and their technical requirements:
- Disclosure Checklists: Provide a simple flag, indicating if a concept has been used in a report
- Disclosure Navigation: Allows a user to quickly navigate to a specific disclosure using search or taxonomy browsing
- Disclosure Text Analysis: Allows a processor to access the text content of a specific disclosure enabling automated analysis of the text (requires “readable” text)
- Isolated rendering: Allows a user to render the content of the text block separately from the original source document, reproducing the same layout as in the original document.
Textblock tags also hold potential for AI-assisted processing, e.g. auto-summary creation, "sentiment analysis", comparison of content between filers and a lot more.
What are the challenges?
Recently, Europe completed a full reporting season with mandatory textblock tags, here are some of the main challenges.
HTMLs flexibility, which allows the creation of similar looking layouts with very different source code, is one such challenge. The European regulator ESMA gave filers a lot of freedom in designing their reports, which are usually rooted in already existing documents such as Word, PDF, etc.
In the picture above we can see two tables from two different reports.
The left-hand table stems from a Finnish report based on a PDF. The table consists of <div> and <span> elements with additional styling. The extracted content when being re-rendered as HTML has lost almost all style information and cannot redisplay the table in a usable format.
The right-hand table is based on a German Word document and thus has semantic information. The table can be re-rendered in a table layout, nonetheless, information that the filer deemed important (highlighted rows and columns) are no longer visible.
Another example can be seen in the image below: Because local styles have been used, the re-rendered result appears very chaotic and is not readable. The picture is blurred because the report was of course not released like this.
What are the requirements?
Navigating these challenges requires well-defined requirements and specifications on textblock content, which are presently lacking. Different regulations from diverse regulating bodies lead to a range of requirements. From a technical side, the following constraints exist:
- Inline XBRL Specification: NonNumeric
Content: ( any element | any text node ) *
- Data Type: textBlockItemType
“The unescaped content MUST have mixed content containing a simple string, or a fragment of XHTML or a mixture of both” (http://www.xbrl.org/dtr/dtr.html)
Thus, the content can be almost anything.
Regulators also have different approaches to textblock tagging.
For instance, ESMAs regulations are quite loose and consist only of some minor technical requirements, allowing for a complex report design. The SEC on the other hand demands that reports to be more form-based, with inline styling as a must-have.
In Europe, especially in Germany, auditors are trying to fill the gap in requirements for textblock tags. Their primary emphasis rests on enhancing machine readability. To ensure this, a shift is called for towards the use of semantically meaningful HTML tags, like <table>, <tr>, <td> over <div> and <span> for tables. While this makes the content more readable, a Working Group Note currently being authored by XBRL International states that resorting to detailed XBRL tagging is a more reliable and easier method for data extraction.
Now you might wonder, what exactly is meant by meaningful HTML? The content should employ meaningful or semantic HTML tags like <p> for a paragraph, <h1> for a heading, <table> for tables, which serve to enhance machine readability, while also enabling better human readability when re-rendering extracted content. Since currently no real-world use cases are known to require such semantic HTML, it is questionable what use this requirement has.
Furthermore, the sudden imposition of these additional requirements late in the reporting season led to a significant degree of confusion among filers. Consequently, the necessity for standardizing these requirements would be greatly appreciated by everyone involved.
How to view the content?
Given there's also no standardized method to view textblock content, different tools have different approaches in visualizing the content. Some use XBRL Internationals InlineViewer for enhanced report presentation in the original layout, while others utilize software to view content with retained HTML or just to view the text content.
In the US, Edgar renderer posits a more formal viewing content approach, presenting formal styling and lesser layout possibilities for firms.
How to enhance the content?
Improving the content of textblock tags again depends on the usage and target audience. For instance, if our goal is merely to create a disclosure checklist, there shouldn't be much cause for concern. However, if we aim to show the content in isolation, it would be advisable to utilize proper semantic tags, tagged with inline styling ideally. If we need reliable machine-readable tagging data, the application of detailed XBRL tagging is the way to go.
Using inline or global styles each have their unique challenges as well:
- All style information inline
- Extracted content can be rendered similar to original
- File size increases due to repetition of styles
- Use global styles
- Extracted content will have basic styling when semantic tags are used
- Extracted content will have no styling when fixed layout is used
- Smaller file size
How to improve the viewers?
The software displaying the content can also be improved. Offering users options to boost outputs, add highlighting, remove styles, revise table formatting is one way to go. As an example, on the left side picture below, we can see the chaotic rendering of textblock tag content we have seen before. On the right some of the enhancement options of AMANAs XBRL Auditor have been employed to make the same content more readable.
Considering that the whole report is always available as part of the report package, not merely the extracted content, this information can be utilized to properly render isolated textblock content in the original, intended layout, regardless of the underlying HTML structure.
To support this point, there is an open-source plugin created by AMANA available for the XII InlineViewer. Consisting of just a few hundred lines of code, this plugin can is open-source and readily available for use and extension by anyone interested. (Isolated Textblock Rendering: AMANA´s Approach to the Textblock Extraction Challenge) However, it's important to note that depending on the specific conversion approach used, some level of adaptation may be required. With such meaningful enhancements, we can not only boost the viewer's functionality but also pave the way for more interactive and intuitive content rendering in InlineXBRL.
Textblock tags pose a new challenge in the context of InlineXBRL. The use cases for their content are not clear but should guide any requirements. Different requirements from multiple stakeholders exist, adding to the complexity for software vendors, auditors, and filers alike. However, there's room for improvement in both enhancing content and expanding viewers.
With ESG reporting on the horizon taxonomies that contain a lot more narrative tags are on the way. In order to have useful content and a streamlined report creation process, a lot more work needs to be done with regards to textblock tags.