Tide Project Aims to Audit and Score WordPress Themes and Plugins based on Code Quality

Last week XWP dropped an intriguing preview of a new project called Tide that aims to improve code quality across the WordPress plugin and theme ecosystems. The company has been working with the support of Google, Automattic, and WP Engine, on creating a new service that will help users make better plugin decisions and assist developers in writing better code.

XWP’s marketing manager Rob Stinson summarized the project’s direction so far:

Tide is a service, consisting of an API, Audit Server, and Sync Server, working in tandem to run a series of automated tests against the WordPress.org plugin and theme directories. Through the Tide plugin, the results of these tests are delivered as an aggregated score in the WordPress admin that represents the overall code quality of the plugin or theme. A comprehensive report is generated, equipping developers to better understand how they can increase the quality of their code.

The XWP announcement also included a screenshot of how this data might be presented in the WordPress plugin directory:

XWP plans to unveil the service at WordCamp US in Nashville at the Google booth where they will be inviting the community to get involved. Naturally, a project with the potential to have this much impact on the plugin ecosystem raises many questions about who is behind the vision and what kind of metrics will be used.

I contacted Rob Stinson and Luke Carbis at XWP, who are both contributors to the project, to get an inside look at how it started and where they anticipate it going.

“Tide was started at XWP about 12 months ago when one of our service teams pulled together the idea, followed up by a proof of concept, of a tool that ran a series of code quality tests against a package of code (WordPress plugin) and returned the results via an API,” Stinson said. “We shortly after came up with the name Tide, inspired by the proverb ‘A rising tide lifts all boats,’ thinking that if a tool like this could lower the barrier of entry to good quality code for enough developers, it could lift the quality of code across the whole WordPress ecosystem.”

Stinson said XWP ramped up its efforts on Tide during the last few months after beginning to see its potential and sharing the vision with partners.

“Google, Automattic and WP Engine have all helped resource (funds, infrastructure, developer time, advice etc) the project recently as well,” Stinson said. “Their support has really helped us build momentum. Google have been a big part of this since about August. We had been working with them on other projects and when we shared with them the vision for Tide, they loved it and saw how in line it is with the vision they have for a better performant web.”

The Tide service is not currently active but a beta version will launch at WordCamp US with a WordPress plugin to follow shortly thereafter. Stinson said the team designed the first version to present the possibilities of Tide and encourage feedback and contribution from the community.

“We realize that Tide will be its best if its open sourced,” he said. “There are many moving parts to it and we recognize that the larger the input from the community, the better it will represent and solve the needs of the community around code quality.”

At this phase of the project, nothing has been set in stone. The Tide team is continuing to experiment with different ways of making the plugin audit data available, as well as refining how that data is weighed when delivering a Tide score.

“The star rating is just an idea we have been playing with,” Stinson said. “The purpose of it will be to aggregate the full report that is produced by Tide into a simple and easy to understand metric that WordPress users can refer to when making decisions about plugins and themes. We know we haven’t got this metric and how it is displayed quite right. We’ve had some great feedback from the community already.”

The service is not just designed to output scores but also to make it easy for developers to identify weaknesses in their code and learn how to fix them.

“Lowering the barrier of entry to writing good code was the original inspiration for the idea,” Stinson said.

Tide Project Team Plans to Refine Metrics Used for Audit Score based on Community Feedback

The Tide project website, wptide.org, will launch at WordCamp US and will provide developers with scores, including specifics like line numbers and descriptions of failed sniffs. Plugin developers will be able to use the site to improve their code and WordPress users will be able to quickly check the quality of a plugin. XWP product manager Luke Carbis explained how the Tide score is currently calculated.

“Right now, Tide runs a series of code sniffs across a plugin / theme, takes the results, applies some weighting (potential security issues are more important than tabs vs. spaces), and then averages the results per line of code,” Carbis said. “The output of this is a score out of 100, which is a great indicator of the quality of a plugin or theme. The ‘algorithm’ that determines the score is basically just a series of weightings.”

The weightings the service is currently using were selected as a starting point, but Carbis said the team hopes the WordPress community will help them to refine it.

“If it makes sense, maybe one day this score could be surfaced in the WordPress admin (on the add new plugin page),” Carbis said. “Or maybe it could influence the search results (higher rated plugins ranked first). Or maybe it just stays on wptide.org. That’s really up to the community to decide.”

In addition to running codesniffs, the Tide service will run two other scans. A Lighthouse scan, using Google’s open-source, automated tool for improving the quality of web pages, will be performed on themes, which Carbis says is a “huge technological accomplishment.”

“For every theme in the directory, we’re spinning up a temporary WordPress install, and running a Lighthouse audit in a headless chrome instance,” Carbis said. “This means we get a detailed report of the theme’s front end output quality, not just the code that powers it.”

The second scan Tide will perform measures PHP compatibility and will apply to both plugins and themes.

“Tide can tell which versions of PHP a plugin or theme will work with,” Carbis said. “For users, this means we could potentially hide results that we know won’t work with their WordPress install (or at least show a warning). For hosts, this means they can easily check the PHP compatibility before upgrading an install to PHP 7 (we think this will cause many more installs to be upgraded – the net effect being a noticeable speed increase, which we find really exciting and motivating).”

Carbis said that the team is currently working in the short term to get the PHP Compatibility piece into the WordPress.org API, which he says could start influencing search results without any changes to WordPress core.

“We’d also like to start engaging with the community to find out whether surfacing a Code Quality score to WordPress users is helpful, and if it is, what does that look like? (e.g. score out of 100, 5 star rating, A/B/C/D, etc.),” Carbis said. “We will release our suggestion for what this could look like as a plugin shortly after WordCamp US.”

More specific information about the metrics Tide is currently using and how it applies to plugins and themes will be available after the service launches in beta. If you are attending WordCamp US and have some suggestions or feedback to offer the team, make sure to stop by the Google sponsorship booth.

22

22 responses to “Tide Project Aims to Audit and Score WordPress Themes and Plugins based on Code Quality”

    • @Ajay, the source for the sniffs is indeed WPCS.

      Currently its using predominantly the wordpress-core sniffs from the project and also relevant sniffs from wordpress-vip that pertains to performance. The team carefully reviewed the selected sniffs and categorised it into “security”, “performance” and “standards”. So some sniffs have a higher weighting than others.

      As mentioned in the article, the weighting document will be made available so that the community can decide what it should be. Our’s is just first draft.

      The WPCS project is really the source of inspiration for Tide.

  1. This is amazing!
    I think I’ll be first in line to test drive this as soon as it becomes available!

    I’ve been playing around with grading themes at Themetally with Lighthouse (without deploying anything to production yet), but this is going to take it to a whole new level. I can’t wait! 🤓

  2. I have to concur that this is an interesting concept. I’ve already tweeted to you Sarah, but will see what others think, should the stars be implemented based on the screenshot….instead of grey stars, using green stars instead. I think people would react better with that colour.

  3. WPCS is nice, but very far from being flawwless. For some things you need to uglify your code in order to reduce the noise coming from it, which is not a great thing. If for example you want your code to be easily testable by having functions returning html instead of outputing it, good luck with reducing the amount of “non escaped output” warnings.

    Coding standards should be a connivance for the developer, helping him to read his old code and search for anything in it, it is not by itself a good measure of the quality of the code, you can write a totally insecure code that obey all coding standards.

    While it might be a good thing to nudge wordpress developers into using some mostly common coding standards, I am afraid this information will just create a false sense of security with users, and there will obviously be false warnings as well. If this information will have an influence on search results, or any thing else, all you will end up with will be a better and more maintainable malware code.

    I assume that applying such tool on the repository will force Otto to kick his “php in widgets” plugin out of it, which lol is unlikely to happen, so we will end up again with the tired old discussion about arbitrary decisions in the repository :(

    • I agree. But the purpose of Tide is to check against the standards of the ecosystem we work with. We deliberately chose not to compromise on some sniffs that I personally find “silly” to put it mildly, but nevertheless. It was determined by the community and is shaped by the community.

      Once Tide launches the best we all could do to improve the service is to contribute to the WPCS project. Help shape the PHP sniffs, help out with the ESLint projects @netweb is working on, etc.

      • The problem is not with the tool, the problem is with how it is being marketed (at least here). Since it will be very hard to score a 10 for any non trivial plugin (any plugin/theme which generates inline CSS or JS for example), malwares be will able to safely score a 9 to be considered as “perfect”, but it takes only one soft spot in the code that can be abused to make it a horrible plugin. Same goes for performance.

        Security and performance have to be audited by humans that can understand the context and execution path of the code. It is nice to have helper tool, but if it was possible to trust such tools, we would have heard of such tools in other contexts as well. wordpress.com VIP for example do a manual inspection and do not trust the tool “as is”.

        Targeting such tools, and their results, toward users which do not have the knowledge about the possible pitifuls, is just very wrong.

  4. Coding standards are not the same as performance or security. Also WP default standard score lower on readability tests so theres that also. Make your code harder to read, go the WP way. Not sure what can be done about that now though.

  5. Looks like an interesting project!
    Not sure yet how I think about star ratings as output, it would depend on how accurate it is and what the exact factors are (I don’t think you should get half a star less because someone were to use spaces instead of tabs).

    Something to consider too, a textual output in lines of:
    – With risk (red, I try not to say something too negative like ‘bad’)
    – Unconventional (yellow/orang~y)
    – Conventional (green)

    @Derek Herman, @Rheinard Korf, @Luke Carbis might be interesting for for us to chat to see if http://codeoversight.com/ can bring in any value here. Feel free to reach out to me if you think its interesting to chat.

  6. Has some merit, especially the compatibility checking (as it is black and white). Performance is also probably a good metric. And security audits will be useful.

    But there are so many other factors that will be ignored or are subjective. Do this scoring could be counterproductive. It is important to remember a well coded plugin is not the same as a good plugin.

    • Totally agree. Well written code is not the same as good code, but its a start right? Helps developers not leave gaping holes.

      On the other hand, because Tide is an API and will be open sourced, it leaves room for different kinds of audits to be run on code that we haven’t even thought of. My personal vision is that Tide would eventually allow others to build their own plugins that use the API to surface specific metrics as determined by that plugin. Could even make a subjective review plugin if you feel that way inclined.

  7. The code sniffer is a good tool for developers in helping them write better code. It’s not meant for users.

    After using the PHPCS WPCS project for months to review code, I can say with absolute certainty that there are far too many false-positives given. That’s not a problem in terms of using the tool. It simply means, “Hey, you need to check that this is OK; it may or may not be an issue.” It doesn’t necessarily mean, “Hey, this is broken.”

    The code sniffer results should never be used as a metric for end users. Certainly not based on a 5-star rating system. It’d be far too easy to write a quality, secure 1-star plugin that gets matched up against an insecure 5-star plugin.

    I can’t speak to the other two scans mentioned. I’m only familiar with PHPCS WPCS.

    • Thats why we are releasing the scoring metrics for contribution. Not all sniffs are equal… some perhaps should only be run once… “So you did this thing, we recommend you fix it, but we’re not gonna sting you for each time we see it in the code”.

      We do not wan’t sniffs like “yoda” conditions for example to give a really good plugin a terrible score.

      Also, the star rating is a concept from the design team. It could be badges, it could be broken down into categories. We’re asking for feedback, so this is helpful.

      • I don’t think this addresses the real issue – how to relate any code sniffing effort to plugin quality for a non-technical audience.

        Presuming that plugins with serious security issues or with malware aren’t on the repo, then we’re left with things that are going to be hard to express in a summary rating whether that’s stars, a 1-10 score or whatever. Can we reliably judge, say, the performance impact of a plugin? What if that impact is fine on low traffic sites but starts to compromise a site that has significant traffic? What if the code is high quality but it puts obnoxiously colored promos on its settings page?

        I just don’t see a way to sniff most code issues and relate them to things that non-technical WP users care about which is mostly 1) does this do what I need, 2) does it have any security problems, 3) does it slow down my site?

  8. I like and share your vision of a performant, secure and reliable “open web”. Thus every new tool/method that helps to get closer to this goal and to stick to best practices is a benefit for the WordPress community.

    However, I’m not sure if showing the “tide score” publicly in the item description should be the way to go (false positives, etc.).

    It might be a useful tool to help plugin and theme authors to enhance their code before submission. An important part will be to provide detailed information about spotted “problems” and maybe even possible solutions.

    • Personally I think its kind of a wasted effort. The harsh nasty code mostly does not sit inside of the public WP plugin repository (although there are certainly quite a lot examples of “code crime on humankind”), but in major commercial marketplaces like ThemeFactory.

      Getting the management of THOSE places on-board would be of much more use; it would give us, ie. the developers and designers who work with WP on a daily basis, a much better tool, an indicator of how crappy or tidy a code base of a specific theme or plugin ACTUALLY is. And thus saving hundreds of thousands of hours which would normally go into building workarounds for ugly code issues, short-sighted closing of APIs, and so on.

      cu, w0lf.

  9. This is a fantastic initiative. It is needed and will get better with more tools added.

    I think having unit testing and code coverage included over time would be a huge bonus. Anyone doing continuous integration on a plugin will generally have a much more robust product.

    HUGE applause.

Newsletter

Subscribe Via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.