51 Comments

  1. Vova Feldman

    I’m just going to put it here ;) I think it’s very relevant: https://wptavern.com/freemius-insights-enables-plugin-developers-to-make-data-driven-decisions

    Report

  2. Omaar Osmaan

    The explanation completely makes sense. Only counting the stats for millions of sites is hard- requests per seconds level would be crazy resource intensive. Unless there is huge “business interests”, extensively collecting data wouldn’t be cost effective at all.

    That being said, for plugins/themes certain stats really could be helpful to the authors- for example: versions count for plugins/theme and PHP/WordPress.

    Report

  3. Andreas Nurbo

    I want PHP and MySQL Version per WP versions listed. Nacin agress with me https://twitter.com/nacin/status/710689984575971328 :D

    Report

    • Jeff Chandler

      While I didn’t mention it in the article, Otto knows about this request and has access to the data. But he’s working on so many other things right now that it’s not high on his priority list. The data does show that upgrading PHP is happening by many hosts. I know that’s not the answer you want to read but that’s what’s going on with that data.

      Report

  4. Jason Lemieux

    I see it plainly this way:

    There are 76 million websites powered by WordPress. That number is so high because of the efforts of all of us spending years of our lives and millions of dollars of investment. We built that number together.

    That we, as a community, are not leveraging anonymized data about these sites seems like an enormous waste and disservice to our users… and, well, the entire globe. We are talking about 25% of the web.

    And to think that nobody really cares enough about the data to build anything to crunch it? We should be better than that.

    Report

    • Mike nelson

      +1
      If wp.org main tenets need help achieving this, maybe the could open up to involvement from the community they foster. If the community wants it they can help build it

      Report

    • Nathan

      It’s not about caring I can assure you that. I think that is the point of the discussion here. We all care, but the priority is skewed.

      Report

      • Jason Lemieux

        Plugin and theme developers (of course) care. But maybe the folks that actually have access to the data don’t.

        It’s clear In Otto’s comment below that there just isn’t anyone on the .org team that has the resources to pay attention to big data. It’s not a problem of political will, but of resource allocation.

        The data dumps from Wikimedia that Paul mentions would be a fantastic start. If we can get some data released a community will quickly form around making something useful from that.

        We would see the most fantastic bloom of projects since the wporg api itself.

        The internet is the most important creation in the history of humanity. It is the first truly global connecting of cultures and people. We’ve built a quarter of it with WordPress.

        Knowing the technical intricacies, dependencies and limitations of our collective work should be considered essential. Google has data on their slice. Facebook does as well. They and their peers proactively leverage, plan, and develop around this information. Why don’t we?

        Report

  5. David McCan

    It would be nice if we could be rid of conspiracy theories and presumptions that good people are up to something. Thanks Jeff and Otto for the information. More communication helps.

    You ask what types of information we’d like to see. PHP and WordPress version usage would be interesting. Maybe a quarterly report … or any report that would allow us to see trends over time.

    Report

    • Samuel "Otto" Wood

      It is interesting. And since it seems reasonably accurate, I’ll try to make it public soon. In the meantime, here is my first attempt at showing something.

      Cloudup vpgahovqd5x

      The thing is, data gathering is hard. It takes time and resources. Yes, we care about certain things. Knowing that hosts are updating php is important to me. So is updating the plugin directory. And the forums. And internationalization. Those Germans need a forum in their own language to work right too. There’s a lot of things happening. Sorry, but there are priorities, and stats don’t quite get there straight up. Unless we need them right now. Then we go and get them, when we need them. I’m more interested in internationalization stats, honestly. Those seem promising. Wish I had them. I don’t, yet.

      Report

      • David McCan

        Interesting. Thank you very much. Next time you run it could you please add PHP 7?

        The tasks you mentioned are obviously higher priorities. and I’m sure that list is just the tip of the iceberg.

        Report

      • Nathan

        No apologies nessecary. The efforts put forth are tremendous. I find it fascinating that you guys provide what you do with the resources at hand. I think what Josh would like to see is a greater effort to the availability of the aggregates. does .org run on WordPress? ;)

        Report

      • David Anderson

        Very interesting. The same break-down for MySQL versions would also be interesting. We still see developers who are trying to develop their sites on modern PHP/MySQL versions, and then upload them to live hosting on obsolete ones (the issue with the MySQL utf8mb4 charset change in WP 4.2 is significant here). When we say “don’t do that, WP core doesn’t support that; and launching a new site on obsolete/EOLed tech is a terrible idea”, sometimes they come back and insist that doing such things is common or good practice. To be able to point to the stats can really help them.

        Report

  6. Peter Cralen

    There is enough place on w.org site to publish details about what is send from every WP site, how often, the purpose of that and publish these data.

    If there is not well usage of these data, stop collect them from millions of websites, at least from my sites, please.

    The idea that we’re hiding things is ludicrous

    Nope, it’s logical if there are is not enough details/information about any subject.

    … some of the assumptions people have are not true

    Obvious, no information are published. So assumptions, conspiracy, and other theories will pop-ups only.

    Report

  7. Robert Neu

    ?

    Report

  8. Nathan

    “Data is stored for two days” on .org? Where does it go? How can it be accessed with ease? Who owns WordPress.org? WordPress.com employees are HALF the community staff? Lots of conspiracy still left up in the air. Maybe a follow up article would be good here, because this has certainly raised some intriguing points.

    This comment stream is noisy only with a new data display, nothing addresses the aggregate access.

    Report

  9. paul

    How about some kind of raw data export/dump like https://dumps.wikimedia.org/

    Report

    • Jason Lemieux

      This would be a great starting place. +1,000 to this idea.

      Report

    • Otto

      A data dump of the raw requests sent for updates would be out of the question, for starters, because privacy. It would need to be fully anonymized, and that would be difficult. However, the big kicker is this: *we don’t have it*. We simply don’t keep it for very long.

      We don’t have any data to dump. Seriously, we don’t collect that type of raw information for any length of time. We collect and gather the data that we display, then toss it. Literally. I don’t know how many more ways I can say this. You’re asking for things that do not actually exist.

      Report

      • Peter Cralen

        I am sure you have better things to do than answer any weird comment, anyway, can you point me to some source, post or whatever what describe what exactly data are collected?
        I did not find any information about that.

        Report

        • Otto

          Define “collected”. We get all the data we display from the update checks made by WordPress for plugins and themes and core. For example, the “Active Installs” count is simply a count of how many sites checked for an update of that plugin/theme yesterday. Download counts are simply like a naive counter of the downloads of the ZIP files. With some filtering to prevent people from spamming them up by repeatedly requesting the same files over and over, of course. We have Google Analytics on every page of the site, but honestly, that sort of thing is over my head. I never could figure out Google Analytics, personally.

          If you want to see what is sent, you can look at the core code. As for what is retained, it’s all posted up on w.org somewhere, once we’re confident that the results from the counts are reasonably accurate. We don’t retain any raw data for more than a couple days, for debugging. Like those download requests. We store the request information for the day, then update the count once a day, then delete the information. Simple as that, really.

          What data, exactly, are you concerned about? I find it difficult to imagine what you think we’re doing, really, that would be in any way concerning. Dion went to a lot of trouble to make the Active Install counts available, for example. Why is that a bad thing? Why all the fuss? I don’t get it, really.

          Report

        • Peter Cralen

          Not a bad thing Otto, not the fuss. I just want to know what my website sends somewhere to space.
          I don’t care who, how and for what period play with these data, it’s not under my control and responsibility.

          Not a drama, I just miss asked information, which should be published somewhere without requirements to know the code.

          Report

      • Peter Cralen

        Thanks Jeff. I searched a lot, did not find something similar.

        Report

      • David

        you talk about privacy, but you don’t even ask for permission/notify users that you collect their data?

        are you Batman?

        Report

        • Jeff Chandler

          It’s not up to Otto to notify users, it’s up to WordPress.

          Report

        • David

          well, you are right.
          but he is the one who have all the data.
          (and some others like matt, etc)

          I understand your point Jeff,
          but I really wish that matt and others responsible (maybe) don’t hide behind WP as organisation or as open source cms.
          because I feel WP as cms/org is lead by several core team and org/company behind it. and these people make decisions about WP direction/where to go.
          (lets admit that some have bigger voice than others)

          it’s not uncommon in open source. e.g.
          chrome and android dev is directed by google.
          (both open source)

          Report

  10. James

    Would love to hear what w.org does with the amount of registered users on a site that is submitted with every update check. And: if it’s not used why is this number part of an update request?

    Plugin data are submitted with their descriptions included, regardless if a plugin is self written or not. But an update check could simply transfer a hash of all these data of a plugin: easier to compare and it would be a huge reduction of data submission and therefore almost a blessing for w.org’s old systems. But this won’t happen, Otto. The amount of users collected with each update call will remain in core.

    Tell us why. And why users may not have a clue about all this. Why isn’t there a disclaimer that says: these data of your installation will be submitted to w.org twice a day. If you don’t like it don’t use WordPress. Or even better: every software has a checkbox and says: we collect data of your sofware usage. If not desired, uncheck. – Why isn’t such a thing in WordPress?

    I have to laugh. Because after all (and there is more) presenting oneself insulted by conspirations theories is not really a reason for trust. Maybe Otto is not the only one with hands on w.org.

    Report

    • Otto

      User and blog count are included in the check because there is potentially a future case where the updates may need to be targeted.

      Blog count was originally there for WordPress MU, where the thinking was that multisites with large numbers of blogs would need special handling code if there was an extremely big database change. This is because each blog is stored in it’s own set of tables, therefore the “upgrade” process needs to run independently on each of them. If you have a couple dozen or so, no big deal. If you have several thousand, then you probably want to do a different way of sending the SQL needed to alter all those tables.

      User count, same principle. If your users table is excessively large or perhaps even split across various databases with HyperDB, then we probably don’t want to send you an update that does a big ALTER to it, potentially taking down the site for an extended period of time.

      So far, those numbers have not been needed for database upgrades. That doesn’t mean they won’t be. There was a change a while back that changed the size of the user_pass field, but testing showed that it was not a game breaker for the users table even on very large tables, so it wasn’t needed. Other alterations, such as moving data to/from usermeta and such, might need special code for larger sites.

      The data is not stored or used for anything else, if that’s what you’re asking. The API currently ignores it. It’s a just-in-case measure, because you can’t predict the future and upgrades are complicated by a huge number of factors.

      Report

      • David

        that’s a fishy answer.

        Report

      • James

        User and blog count are included in the check because there is potentially a future case where the updates may need to be targeted.

        @Otto: w.org runs regular updates 3 times a year. Opportunity enough to extend data collection when needed. Isn’t this the same argumentation as if you say: install all plugins available on w.org since one day you could need them? – Weird.

        And what WP really lacks is the user permission to take their data away. You ask: who in the hell is concerned about blog and user count?! Well, what about business owners who offer a site inside a network for money? Or a membership site? Maybe they have competition. These data for owners like that are sort of business secrets. Definitely not meant to be sent out several times a day. So without permission it’s like sniffing. Thats why it’s really important to ask site owners for permission.

        Report

        • Otto

          You ask: who in the hell is concerned about blog and user count?!

          I asked no such thing. But I *know* what we do with it, and that is *nothing*. It is not stored. It is not aggregated. We don’t save it, because it is not useful *to us*.

          You want to doubt my word? That’s fine. If that is the case, don’t ask me the question to begin with.

          Report

        • James

          Ok. Otto cannot or does not want to answer at least one of my questions. But he said sth. important.

          It is not useful *to us*.

          Nice! To whom are collected data useful then? IF they are collected they CAN be used, e.g. due to FISA. But why, again, are “not useful” data collected at all?

          To not answer this question is user privacy ignorant. No allowance checkbox, no removing of fetched data that are wether stored nor used. Ah, and, Otto, it’s really not about YOU and whether I believe YOU. It’s about user privacy and what it means to Automattic and what that means to users. Matt seemed to be happy with his last key note: about making WP sites more secure by applying letsencrypt.org certs to all wordpress.com sites.

          However, this kind of happiness for me overlooks some black holes in WP itself, if we think about user privacy.

          Report

        • mark k.

          Otto, I think that james is pointing the finger at the wrong direction, but basically he is right, as no one ever communicates the privacy challenges that come from operating wordpress.

          I 100% trust your words about what you do with the data, but you are unlikely to do the same thing for the rest of your life, so just trusting one person is just not good enough. In addition wordpress.org is a golden hacking target. You know that you are deleting files every two days but you don’t know what the hacker that copies them (hopefully it doesn’t actually happen) do with them.

          Anything that might impact privacy should be opt-in not opt-out. WordPress core can use whatever scare tactics they want to make people opy-in, but it should be the decision of the site admin. Yes there are plugins that do that, but if the admin doesn’t really aware to the implications of automatic updates, he is not likely to use them.

          Report

    • Otto

      Oh, as for the plugin and theme checks sending headers, hey, I want to change that too. My idea was to give every plugin and theme in our system a unique ID (I was thinking UUIDs, to be precise). This UUID could be in the header of the plugin, and it would uniquely identify that plugin. Eliminate all confusion, and solves quite a lot of pain for me. I didn’t write the current update check system. If I had, it would be different. :)

      Report

  11. Andreas Nurbo

    Seems a lot of people have missed that there is a sort of privacy policy on wordpress.org https://wordpress.org/about/privacy/

    Report

    • Peter Cralen

      It’s about collecting information from visiting w.org sites, not from the installations of WordPress.

      Report

      • Andreas Nurbo

        “For instance, WordPress.org may reveal how many downloads a particular version got, or say which plugins are most popular based on checks from api.wordpress.org, a web service used by WordPress installations to check for new versions of WordPress and plugins. However, WordPress.org does not disclose personally-identifying information other than as described below.”

        Report

  12. John Teague

    @Otto let me know if GEMServers can donate DBaaS resources. We can spin up resources for that. Thinking Cloud Dataflow would be perfect for this. Or DeepSQL HTAP servers.

    Report

  13. M

    Wonderful discution guys, I would like to thank everyone for asking such questions and thank you very much Otto for taking time answering and thanks to you Jeff, as well, for bringing this up.

    On the idea of themes and plugins’ UUIDs, I think it would be a nice initiative but it will be quite hard to implement since a lot of plugins in the repo won’t implement it for a long time. This gets us to the idea of hashes proposed by James wich should be more easily implementable or to the idea of a tag that would belong to “private” plugins/themes which do not need/should not be checked for eventual updates, this one too could be more or less easily implemented.

    Report

Comments are closed.

%d bloggers like this: