Transifex

  • Documentation
  • File formats
  • HTML

HTML

  • File extensions: .html, .xhtml
  • i18n_type: HTML, HTML_FRAGMENT, XHTML respectively

HTML

Any HTML document or part of one can be uploaded to Transifex.

Since HTML is not a proper i18n file format, translating offline often causes alignment issues, and as such, HTML documents should be translated with the web editor Transifex provides. You're still able to translate offline by downloading the file for translation, in which case Transifex inserts extra information that will help it map the content of the translated document back to the original segments.

In case a segment has not been translated, the corresponding source string will be used instead, so that the resulting document will be complete, even though partially translated.

Note

HTML_FRAGMENT is currently in BETA. To learn more about the solution or to have a hands-on experience get in touch with us.

The new HTML_FRAGMENT file format is detecting and extracting translatable content from any HTML file. Content is characterized as translatable and non-translatable according to type or position in the document. The file format is not parsing the HTML file, looking for correct formatting and structure of elements within, thus allowing for missing or misformatted elements. The fragment HTML format will extract content from any given file as long as the file used features correct HTML punctuation, Examples where the file format will fail include missing '<' or '>' or even nesting things like "<h1 <h2>>".

The parser does behave in the same way as the HTML file parser with the exceptions:

  1. 1. Ability to handle HTML documents of misformatted HTML "syntax"
  2. 2. Maintain all whitespaces or any related "layout" characters that the original input file affords.

Note: inline tags are considered to be text. In this meaning, they are parsed like so to create HTML translatable entities. For example the following HTML code is going to be considered a single translatable string.

This is an interestingly <b> bold </b> text.

The complete inline element list is:

'b', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'a', 'bdo', 'map','object', 'q', 'span', 'sub', 'sup', 'button', 'label', 'select', 'textarea', 'snippet', 'u','img', 'input', 'br'.

The fragment HTML format does recognize content for translation in html attributes. The attribute content that is recognized for translation includes only the values for the listed attributes:

  • alt
  • label
  • placeholder
  • title

Regarding the end-user, the parsing, and translating use-case flow does remain the exact same one except the `download to translate` step. During that step, a custom HTML file is created to enable instant, inline HTML translations. The layout of that file despite it is similar to the old one. It features though, some extra details to empower more complex future translation scenarios.

Example:

Macaque in the trees

During export Transifex is adding some internally created hashes on each item to be able to correctly map translated content back in Transifex application once it is uploaded.In the above example the download to translate file is going to be structured as follows:

Macaque in the trees

The type of tokens that any translator should translate is the Text ones, and the Attribute Values one. The translatable Text tokens are surrounded by the special tx tag:

<tx se_hash= "source_hash" > .... </tx>

Tags while the attribute values are prefixed by custom:

se_hash="source_hash" attribute, you will see the Transifex custom prefix only in attribute title and src attributes.

Translators should update the values of only the prefixed attributes or the surrounded text objects. Any other update may introduce broken translation to the related resource. Here's an example of how the html content inside the for-translation file looks:


Macaque in the trees

And here is how it should look once translations have been filled in:

Macaque in the trees

XHTML

The XHTML support Transifx provides differs in two ways from the support for HTML:

  • You can only upload parts of a XHTML document.
  • The file must be a valid XHTML document.

For these reasons, we recommend that you always use the HTML file format.

Tip

Please note that big paragraphs in the HTML file will result in long source strings for translators, which is hard to work with. Whenever possible, try to break the big paragraphs into smaller ones by simply adding line breaks.

Detected content

When loading a .html or .xhtml file, you'll notice that anything that is showed to the user will be detected and made available for translation. This includes items like:

  • Content of block-level elements
  • Content appearing inside table cells
  • Attributes such as alt, label, placeholder, title
  • Contents inside <a> tags, their href attribute
  • The src attribute of images

Some of this content might be formatted as HTML, which might come as a surprise. However, in some cases, this is necessary. Some of the images you're showing to your users might need localization (e.g. a screenshot), so their src needs to be translatable. Some of the links might need localization, since you might want to point the user to the appropriate URL of a localized page. In most cases, Transifex will present the element to the user in a way where they can translate only what they should be translating, avoiding the risk of breaking the HTML.

translatable_attributed.png#asset:7913

In case you would like such strings to be excluded from the translation process though, then you can instruct Transifex to "lock" them and block translators from working on them (or, accidentally breaking them) by using Smart tags.