Post Language Project Proposes to Bring Better Support for Multilingual WordPress Content

photo credit: . Entrer dans le rêve - cc
photo credit: . Entrer dans le rêvecc

WordPress global usage on the web is now at 23%, and this year marked the first time that non-English downloads surpassed the number of English downloads. Major internationalization improvements coming in 4.0 will open up the platform even more for those publishing in different languages.

While discussing the upcoming language-related improvements at WordCamp Seattle this year, Andrew Nacin highlighted the fact that only 5-10% of the world speaks English. It may not be long before the majority of WordPress installations are in Mandarin, Spanish, Hindi, or Arabic.

The need for better ways to support multilingual content is already a concern for many international users and agencies. One thing that WordPress core is currently missing is the ability to easily retrieve the language in which a post or page has been written. German WordPress developer Caspar Hübinger is in the early stages of creating a proposal to add a Post Language feature to core.

Why Does WordPress Need Post Language Support?

In outlining the need for post language support, Hübinger cites WordPress download stats from the end of April, demonstrating that 3.9 had been downloaded roughly 1.36 times more often in other languages than the default US English:

Total Core downloads: 6,589,287 (100%)
Default English: 2,807,978 (42.6%)
Others: 3,781,309 (57.4%)

(Data from April 29, 2014)

Hübinger wants to add post_language as a property of WP_Post just like post_author, post_excerpt, and the other variables.

“Offering a basic opportunity to users for them to store the language of their content along with other post meta information would provide a new level of empowerment for both, users and developers,” Hübinger contends.

His proposal is based on the premise that the language of post content serves as:

  • a highly relevant piece of post meta information in general
  • one of the most important parameters for plugin and theme developers to tackle the already complex field of language and translation

Many plugins, in the course of providing translation features, require the ability to determine the language a post was written in, but they all go about it in different ways. Portability is abysmal across plugins such as WPML, Polylang, Babble, Multilingual Press, and others that provide a similar functionality.

All of those plugins, however, do much more than just determining the language of a post,” Hübinger told the Tavern. “They offer UIs for translating content and establishing language relationships between single posts — a field so complex that being built without any core method for language determination, each one of those plugins can become a major headache when a user tries to switch from one plugin to another.

“As a user you’re pretty much locked in to the solution you choose, since not only are connections between original posts and translations gone when you switch plugins, but also the very marker of which language a post is written in simply vanishes or becomes ineffective,” Hübinger explained. If WordPress had a standard way to determine the language in which a post was written, all of these plugins could potentially provide more portable functionality.

The Proposed Post Language Feature

So what would Post Language look like as a feature implemented in WordPress? In addition to providing developers with more tools to add custom language and translation features, post language would also allow users to assign a language selection in the Publish Post meta box:

post-lanugage-publish-box

Hübinger proposes that the select box be populated with the languages previously defined through either the language packs available within the given WordPress install, or a filter. The language selection would return the ISO code for that language and store it in a database field as post meta or an extra field that would have to be added to the database table.

The value for Post Language could then be used in the following ways:

  • should be made accessible through template tags:
    the_post_language()
    get_the_post_language()
  • should possibly affect
    get_bloginfo( 'language' )
    get_bloginfo( 'text-direction' )
    and thus language_attributes()
  • OR should be implemented via a new attribute on a per-post basis, similar to post_class():
    • post_language()
      [html light=”true”]<article <?php post_class(); ?> <?php post_language(); ?>>
      // ouput:
      <code><article class="foo bar" lang="en-US"></code>[/html]

Since not all WordPress sites would need this feature, he suggests that it be disabled by default and enabled via a constant, a filter or perhaps an admin setting under Settings > General.

Hübinger mentioned his idea in a comment on Andrew Nacin’s roadmap for 4.0 internationalization improvements, but he decided to wait until 4.0 is in place before officially proposing the feature. Adding a new property to WP_Post is a major consideration and will likely encounter a healthy debate.

Post Language Support Falls In Line with WordPress’ Mission to Democratize Publishing

Unlike various other CMSs, such as Drupal and Typo3, WordPress does not provide a core feature to publish translations of original content. “You can’t even just publish single posts in more than one language per site without messing up your markup with false language attributes,” Hübinger notes. “Not a problem? Try to get a machine reading a post to you in any other language than English when its markup says it is written in English. You’ll most certainly hear the problem.”

Hübinger believes that raising awareness is key for the Post Language feature to gain momentum. “Language on a per post basis is generally associated with translation in people’s minds, and rightfully so,” he said. “Translation, though, has always been an edge case scenario for our mainly anglophone WordPress core dev team, and rightfully so as well.” Convincing the WordPress community of the case for adding Post Language to core is the first step to making it a viable possibility.

The lack of a post language field juxtaposed with the existence of post formats in core is a continual source of bewilderment for Hübinger, who comes from a multilingual culture.

“I like to say if we have a visual carnival like post formats in core, it is high time to spend some thought on a language API which potentially will affect and benefit a couple of millions more users than fancy post formats,” he said. “Nothing against post formats; I like them. They just make such good contrast when comparing the importance of core features.”

His proposal makes a compelling case for the international community and appeals to the heart of WordPress’ core mission to democratize publishing.

After all, WordPress is all about publishing content, and content inevitably has to do with language. We can’t honestly claim to ‘democratize publishing’ while we continue to ignore the relevance of linguistic aspects regarding content for WordPress users around the world.

Hübinger believes that a Post Language feature can help the project enter a higher level of maturity with one small API feature addition. “While the whole field of translating and multilingual content rightfully has been and will be outsourced into plugin territory, WordPress core needs to provide at least a basic language-per-post API for plugin authors to work with, thus preventing users from locking themselves in with one solution forever,” he said.

Hübinger readily admits that the feature is beyond his coding capabilities and hopes that other developers will join the effort to establish a path for architecture and implementation.

“I am totally open to any self-respecting developers who would like to contribute, fork the repo, set up their own one for the same idea,” he said. “This is about making WordPress better for millions of non-anglophone users, so let’s just get that language API in there in the most decent manner possible!”

Once WordPress 4.0 is released with improved multilingual support, Hübinger hopes to drum up more support and contributors to work on the project before officially proposing it to core. If you’d like to assist on further developing the Post Language proposal, you can find the project on GitHub.

25

25 responses to “Post Language Project Proposes to Bring Better Support for Multilingual WordPress Content”

  1. I hope this discussion will be continued.

    One of the major reasons why we started developing our own plugins was the fact that if we needed complex functionality working in a multilingual enviroment, there was no solution – more complex plugins (for events management, membership) were just not working with WPML or Polylang (which are definitely best multilingual plugins in my opinion).

    Providing a core Language API would allow developers to standarize at least some of the methods of handling multilingual content.

  2. This is so important for WordPress. Please help Caspar and make this happen!

    The language selection would return the ISO code for that language and store it in a database field as post meta or an extra field that would have to be added to the database table.

    Some languages (German for example) have the default translation *and* a formal translation, so please do not just use the ISO code. We should have a possibility/parameter to switch these different translations, too!

    • @Torsten While I’ve felt the pain dealing with formal/informal translations, I don’t see that issue within the scope of a language-per-post feature at all. Formal/informal needs to be considered when installing language packs. Given the fact a switch between formal and informal within one and the same site would be an edge case at most (generally you’d pick one and stick with it for the whole site), I don’t see any case right now where bringing that switch to the level of authored content on a per-post basis would add relevant value to user experience in WordPress.

  3. Thank you, Sarah, for covering the topic so thoroughly. Not much to add, but emphasizing again this needs more eyes to be looked upon by in order to make it happen. With all the great effort being put into 4.0 currently and positive feedback for the idea in general, I’m confident we’ll make it happen soon.

  4. I don’t see this happening soon (1-2 years from now) because the statistics that are used to make this point aren’t the right ones. The post language selector only makes sense when you have a multilingual site and that percentage is not that high. That said the work done in 4.0 are a big step forward.

    I believe to make steps in adding better multilingual support we first need to work on the roadmap for taxonomies so at one point we do have native support for relationships. Also we should work on making the current multilingual plugins better and have user tests done to see what users like from all the plugins out there. They are quite different from each other but we could look if they share common code and try to integrate that in WordPress.

    • Regarding the statistics: When discussing a new feature, should we watch at the percent of users *already using* that feature through third-party implementations, or at the percent potentially benefiting from it?

      I don’t have numbers for the first part – stats of major multilanguage plugins may provide an estimate. Regarding the potential: even in the US, with just one official language*, about 20% of the population is multilingual. Not speaking even of China or India.

      *Edit: Wikipedia proves me wrong :)

    • @Marko Heijnen

      The post language selector only makes sense when you have a multilingual site and that percentage is not that high.

      What @Manuel says, and: define “multilingual site”. ;) We usually think of translated content when we hear “multilingual”, don’t we? But that’s not the only use case for language-per-post. Matter of fact, that’s not the way we communicate as (mostly) bi- or multilingual human individuals. (I recommend Stepanie Booth’s talk from WordCamp Switzerland this year for excellent explanation of why confounding multi-/bilingual and translated content is such an unfortunate misconception.)

      To put it clearly: I cannot see any justification for a “multilingual” WordPress core feature that would cover or even touch the broad issue of translation of authored content. Language relationships (and thus translation) imo can and should remain plugin territory.

      Despite the title of this post showing the term “multilingual”, my proposal is not about bringing what we generally refer to as “multilingual functionality” to WordPress. The future feature or API we’re talking about imo should not cover relationships between languages.
      There are several ways of implementing language relationships for specific use cases none of which should be preferred or excluded from the pallet of possible solutions by default.

      • That is the only concept for me that makes sense. The general concepts are blog post in 1 language or having multiple languages and have one post translated to another language (not meaning always the same posts).
        So “language-per-post” targets the first. It targets the users who sometimes posts something in another language. Adding a selectbox in an area what is already crowded doesn’t make to much sense to me.

  5. Sadly, I don’t see this being included in core anytime soon. Most core developers come from monolingual cultures and can’t understand people needing to make their sites available in multiple languages. So, we are stuck with plugins such as WPML (not free and bloated) or Polylang (free but lacking some options).

  6. This is how Drupal does it (table node has a “language” column) and I agree it’s really useful, not only to know in which language the post is written but, more importantly (when having to manage/update the site), to know if the site is multilingual

  7. This is very promising development. Looking forward to a day when WordPress site development & content addition in Hindi would involve a process of only downloading default WP setup, installing and voila! वर्डप्रेस हिंदी मैं :-)

  8. Contrary to what seems to be a kind of it hasn’t happened yet so it won’t happen consensus voiced here, I think reflection about language issues in WP is long overdue.
    Recent WP development has been all about the shiny bits – multimedia (still clunky), the fantastic things for theme developers… but nothing about what should be basic core functionality.
    Language attributes is a start, but I think there should be some sort of thinking about how (even if the nuts and bolts are done through a plugin) multilingual posts could/should be implemented.
    Almost all the plugins I’ve looked at or tried out have tags within the post that wrap the content for each language – I still use one developed by Jennifer Hodgson years ago that works this way – a language attribute would be useless in this case as the post contains both (or more) languages.
    With a language attribute metadata the logic would be a each language version of a post would be a post in itself (in the database) which would then be linked in some way (so that they could be retrieved for comparison and editing and selected at the front end).
    I don’t think you can introduce post_language() without thinking through to end of the process.
    But it is essential that it be done !

      • Caspar, thanks for starting this discussion.

        I was thinking about the possible ways of implementation.

        If we needed a language attribute for posts only the ideal solution would be to add new “post_language” or just “language” column to posts DB table. Then extend the WP_Query with this new parameter and add a couple of post language related functions just like you did in your code. I think it would be the most natural way of handling that, keeping the relation between post and postmeta intact.

        In this logic, WordPress comment, term and user tables should also be extended with a “language” column.

        Unfortunatelly, that solution is impossible or at least not recommended to do in a plugin – it would require an involvment of the WP core team.

        My second idea idea is to create 2 new DB tables similar to terms and term_relationships, for eg. languages and language_relationships. Language would store the languages you have in your site, language_relationships would contain the relationships beetween objects (posts, comments, terms, users) and the language. There’s would be no need to extend existing tables here.

Newsletter

Subscribe Via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.