- File extensions: .html, .xhtml
- i18n_type: HTML, HTML_FRAGMENT, XHTML respectively
Any HTML document or part of one can be uploaded to Transifex.
Since HTML is not a proper i18n file format, translating offline often causes alignment issues, and as such, HTML documents should be translated with the web editor Transifex provides. You're still able to translate offline by downloading the file for translation, in which case Transifex inserts extra information that will help it map the content of the translated document back to the original segments.
In case a segment has not been translated, the corresponding source string will be used instead, so that the resulting document will be complete, even though partially translated.
HTML_FRAGMENT is currently in BETA. To learn more about the solution or to have a hands-on experience get in touch with us.
The new HTML_FRAGMENT file format is detecting and extracting translatable content from any HTML file. Content is characterized as translatable and non-translatable according to type or position in the document. The file format is not parsing the HTML file, looking for correct formatting and structure of elements within, thus allowing for missing or misformatted elements. The fragment HTML format will extract content from any given file as long as the file used features correct HTML punctuation, Examples where the file format will fail include missing '<' or '>' or even nesting things like "<h1 <h2>>".
The parser does behave in the same way as the HTML file parser with the exceptions:
- 1. Ability to handle HTML documents of misformatted HTML "syntax"
- 2. Maintain all whitespaces or any related "layout" characters that the original input file affords.
Note: inline tags are considered to be text. In this meaning, they are parsed like so to create HTML translatable entities. For example the following HTML code is going to be considered a single translatable string.
This is an interestingly <b> bold </b> text.
The complete inline element list is:
'b', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'a', 'bdo', 'map','object', 'q', 'span', 'sub', 'sup', 'button', 'label', 'select', 'textarea', 'snippet', 'u','img', 'input', 'br'.
The fragment HTML format does recognize content for translation in html attributes. The attribute content that is recognized for translation includes only the values for the listed attributes:
Regarding the end-user, the parsing, and translating use-case flow does remain the exact same one except the `download to translate` step. During that step, a custom HTML file is created to enable instant, inline HTML translations. The layout of that file despite it is similar to the old one. It features though, some extra details to empower more complex future translation scenarios.
During export Transifex is adding some internally created hashes on each item to be able to correctly map translated content back in Transifex application once it is uploaded.In the above example the download to translate file is going to be structured as follows:
The type of tokens that any translator should translate is the Text ones, and the Attribute Values one. The translatable Text tokens are surrounded by the special tx tag:
<tx se_hash= "source_hash" > .... </tx>
Tags while the attribute values are prefixed by custom:
se_hash="source_hash" attribute, you will see the Transifex custom prefix only in attribute title and src attributes.
Translators should update the values of only the prefixed attributes or the surrounded text objects. Any other update may introduce broken translation to the related resource. Here's an example of how the html content inside the for-translation file looks:
And here is how it should look once translations have been filled in:
The XHTML support Transifx provides differs in two ways from the support for HTML:
- You can only upload parts of a XHTML document.
- The file must be a valid XHTML document.
For these reasons, we recommend that you always use the HTML file format.
Please note that big paragraphs in the HTML file will result in long source strings for translators, which is hard to work with. Whenever possible, try to break the big paragraphs into smaller ones by simply adding line breaks.
When loading a .html or .xhtml file, you'll notice that anything that is showed to the user will be detected and made available for translation. This includes items like:
- Content of block-level elements
- Content appearing inside table cells
- Attributes such as alt, label, placeholder, title
- Contents inside <a> tags, their href attribute
- The src attribute of images
Some of this content might be formatted as HTML, which might come as a surprise. However, in some cases, this is necessary. Some of the images you're showing to your users might need localization (e.g. a screenshot), so their src needs to be translatable. Some of the links might need localization, since you might want to point the user to the appropriate URL of a localized page. In most cases, Transifex will present the element to the user in a way where they can translate only what they should be translating, avoiding the risk of breaking the HTML.
In case you would like such strings to be excluded from the translation process though, then you can instruct Transifex to "lock" them and block translators from working on them (or, accidentally breaking them) by using Smart tags.
Uploading an HTML translation file, the respecting parser will extract and match the content found in the file with the respecting content of the source HTML file, following the specific content order found on the source HTML file. Any differences in the structure of the HTML elements between the two files will not allow uploading of the translation file.
HTML format will ignore duplicate content entries when uploading an HTML file resource. This means that you will only see a single entry in your resource content that is duplicate.
Duplicate content is defined as similar text contained in two different HTML elements. In the example below, the resource will have one entry for "Dealing with duplicates" and another one for "foo":
<a href="#duplicates" target="_self">Dealing with duplicates</a> <ul> <li>foo</li> <li>bar</li> </ul> <!-- here the anchor link from above and the text of the h3 element are the same. --> <h3><a name="duplicates">Dealing with duplicates<a></h3> <!-- in the following list `foo` is duplicate --> <ul> <li>foo</li> <li>fooBar</li> </ul>
Uploading a translation HTML file that contains duplicates needs additional handling on the HTML code to define the duplicate entries. Not defining the duplicate entries will raise an error when parsing the file.
To address that, you can use the
data-tx-separate attribute in the elements that contain the duplicated text. Find all but the first occurrences of duplicated text and in each element tag add the
data-tx-separate attribute. In the example shared above the final code should look like this:
<!-- At the element of each second instance of each duplicate text add the data-tx-separate attribute: --> <a href="#duplicates" target="_self">Dealing with duplicates</a> <ul> <li>foo</li> <li>bar</li> </ul> <!-- here the anchor link from above and the text of the h3 element are the same. Since the text belongs to the h3 element we add the data-tx-separate in it --> <h3 data-tx-separate="false"><a name="duplicates">Dealing with duplicates<a></h3> <!-- in the following list `foo` is duplicate so we add in the li element the data-tx-separate attribute --> <ul> <li data-tx-separate="false">foo</li> <li>fooBar</li> </ul>
This needs to be done only in the translated HTML and not in the source language HTML file.