Transifex

  • Documentation
  • File formats
  • HTML

HTML

  • File extensions: .html, .xhtml
  • i18n_type: HTML, HTML_FRAGMENT, XHTML respectively

HTML

Any HTML document or part of one can be uploaded to Transifex.

Since HTML is not a proper i18n file format, translating offline often causes alignment issues, and as such, HTML documents should be translated with the web editor Transifex provides. You're still able to translate offline by downloading the file for translation, in which case Transifex inserts extra information that will help it map the content of the translated document back to the original segments.

In case a segment has not been translated, the corresponding source string will be used instead, so that the resulting document will be complete, even though partially translated.

Context in HTML

In order to provide context to your translators and make this information available in Transifex Web Editor, you can use the attribute tx-context

<div tx-context="homepage">
  <div tx-context="button">Register</div>
</div>

This information will be displayed under the context tab in the editor as follows:

context_html.png#asset:8409


Note

HTML_FRAGMENT is currently in BETA. To learn more about the solution or to have a hands-on experience get in touch with us.

The new HTML_FRAGMENT file format is detecting and extracting translatable content from any HTML file. Content is characterized as translatable and non-translatable according to type or position in the document. The file format is not parsing the HTML file, looking for correct formatting and structure of elements within, thus allowing for missing or misformatted elements. The fragment HTML format will extract content from any given file as long as the file used features correct HTML punctuation, Examples where the file format will fail include missing '<' or '>' or even nesting things like "<h1 <h2>>".

The parser does behave in the same way as the HTML file parser with the exceptions:

  1. 1. Ability to handle HTML documents of misformatted HTML "syntax"
  2. 2. Maintain all whitespaces or any related "layout" characters that the original input file affords.

Note: inline tags are considered to be text. In this meaning, they are parsed like so to create HTML translatable entities. For example the following HTML code is going to be considered a single translatable string.

This is an interestingly <b> bold </b> text.

The complete inline element list is:

'b', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'a', 'bdo', 'map','object', 'q', 'span', 'sub', 'sup', 'button', 'label', 'select', 'textarea', 'snippet', 'u','img', 'input', 'br'.

The fragment HTML format does recognize content for translation in html attributes. The attribute content that is recognized for translation includes only the values for the listed attributes:

  • alt
  • label
  • placeholder
  • title

Regarding the end-user, the parsing, and translating use-case flow does remain the exact same one except the `download to translate` step. During that step, a custom HTML file is created to enable instant, inline HTML translations. The layout of that file despite it is similar to the old one. It features though, some extra details to empower more complex future translation scenarios.

Example:

Macaque in the trees

During export Transifex is adding some internally created hashes on each item to be able to correctly map translated content back in Transifex application once it is uploaded.In the above example the download to translate file is going to be structured as follows:

Macaque in the trees

The type of tokens that any translator should translate is the Text ones, and the Attribute Values one. The translatable Text tokens are surrounded by the special tx tag:


<tx se_hash= "source_hash" > .... </tx>

Tags while the attribute values are prefixed by custom:

se_hash="source_hash" attribute, you will see the Transifex custom prefix only in attribute title and src attributes.

Translators should update the values of only the prefixed attributes or the surrounded text objects. Any other update may introduce broken translation to the related resource. Here's an example of how the html content inside the for-translation file looks:


Macaque in the trees

And here is how it should look once translations have been filled in:

Macaque in the trees

XHTML

The XHTML support Transifx provides differs in two ways from the support for HTML:

  • You can only upload parts of a XHTML document.
  • The file must be a valid XHTML document.

For these reasons, we recommend that you always use the HTML file format.

Tip

Please note that big paragraphs in the HTML file will result in long source strings for translators, which is hard to work with. Whenever possible, try to break the big paragraphs into smaller ones by simply adding line breaks.

Detected content

When loading a .html or .xhtml file, you'll notice that anything that is showed to the user will be detected and made available for translation. This includes items like:

  • Content of block-level elements
  • Content appearing inside table cells
  • Attributes such as alt, label, placeholder, title
  • Contents inside <a> tags, their href attribute
  • The src attribute of images

Some of this content might be formatted as HTML, which might come as a surprise. However, in some cases, this is necessary. Some of the images you're showing to your users might need localization (e.g. a screenshot), so their src needs to be translatable. Some of the links might need localization, since you might want to point the user to the appropriate URL of a localized page. In most cases, Transifex will present the element to the user in a way where they can translate only what they should be translating, avoiding the risk of breaking the HTML.

translatable_attributed.png#asset:7913

In case you would like such strings to be excluded from the translation process though, then you can instruct Transifex to "lock" them and block translators from working on them (or, accidentally breaking them) by using Smart tags.

Managing HTML translation files

Uploading an HTML translation file, the respecting parser will extract and match the content found in the file with the respecting content of the source HTML file, following the specific content order found on the source HTML file. Any differences in the structure of the HTML elements between the two files will not allow uploading of the translation file.

Please note that when uploading an HTML translation file the parser will simply go through each HTML tag element identified and match translations to the source strings following the exact same order found in the respecting source file. Make sure that the structure and content are relayed in the exact same order between source and translation files, any possible difference will map the translations to different source strings.

In the following example, the translation file contains one element less than the source file in the middle of the HTML. This will result in the wrong mapping of translations to the respecting source strings.

<!-- Source file structure & content -->
...
<p>Some block of text</p>
<p>&nbsp;</p>
<p>Another block of text</p>
<p>Third block of text</p>
...
<!-- End of Source file -->

<!-- Uploaded translation file structure & content -->
...
<p>Translation for "Some block of text"</p>
<!-- missing empty paragraph element -->
<p>Translation for "Another block of text"</p>
<p>Translation for ""Third block of text""</p>
...
<!-- End of translation file -->

In this example mapping of translations to strings will be:

  • The translation for the string Some block of text is Translation for "Some block of text"
  • The translation for the string is Translation for "Another block of text"
  • The translation for the string Another block of text is Translation for "Third block of text"
  • The translation for the string Third block of text will be the next element found in the HTML translation file.

To avoid this, please double-check HTML translation files before uploading to Transifex.

Handling duplicate content

HTML format will ignore duplicate content entries when uploading an HTML file resource. This means that you will only see a single entry in your resource content that is duplicate. Duplicate content is defined as similar text contained in two different HTML elements. In the example below, the resource will have one entry for "Dealing with duplicates" and another one for "foo":

<a href="#duplicates" target="_self">Dealing with duplicates</a>
<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

<!-- here the anchor link from above and the text of the h3 element are the same. -->
<h3><a name="duplicates">Dealing with duplicates<a></h3>

<!-- in the following list `foo` is duplicate -->
<ul>
  <li>foo</li>
  <li>fooBar</li>
</ul>

If you want Transifex to handle duplicate strings as different source entries without ignoring them during source file upload (you might want to translate these identical strings in a different way), you can do that using our API. Specifically, you can create your HTML resource through the API endpoint described here setting the option "allow_duplicate_strings" to true:

HTML_API_Duplicates.png#asset:8539


Uploading a translation HTML file that contains duplicates needs additional handling on the HTML code to define the duplicate entries. Not defining the duplicate entries will raise an error when parsing the file.

To address that, you can use the data-tx-separate attribute in the elements that contain the duplicated text. Find all but the first occurrences of duplicated text and in each element tag add the data-tx-separate attribute. In the example shared above the final code should look like this:

<!-- At the element of each second instance of each duplicate text add the data-tx-separate attribute: -->
<a href="#duplicates" target="_self">Dealing with duplicates</a>
<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

<!-- here the anchor link from above and the text of the h3 element are the same. Since the text belongs to the h3 element we add the data-tx-separate in it -->
<h3 data-tx-separate="false"><a name="duplicates">Dealing with duplicates<a></h3>

<!-- in the following list `foo` is duplicate so we add in the li element the data-tx-separate attribute -->
<ul>
  <li data-tx-separate="false">foo</li>
  <li>fooBar</li>
</ul>

This needs to be done only in the translated HTML and not in the source language HTML file.