Gravatar is fielding questions today after “Have I Been Pwned,” a data breech checker service, tweeted “New scraped data: Gravatar had 167M profiles scraped in Oct last year via an enumeration vector. 114M of the MD5 email address hashes were subsequently cracked and distributed alongside names and usernames.“ It claims 72% of these email addresses were already logged with the service.
The tweet referenced a BleepingComputer article from October 2020 titled, “Online avatar service Gravatar allows mass collection of user info,” which explains how the hashes were originally obtained. After Italian security researcher Carlo Di Dato was unable to get an answer from Gravatar, he demonstrated to the publication how one could access user data by using a numeric ID associated with each profile to fetch it. He then wrote a test script that sequentially visits profile URLs from IDs 1 to 5000 and said he was able to collect JSON data of the first 5000 Gravatar users with no issues.
Many Gravatar users were startled and upset by notices from Firefox Monitor and Have I Been Pwned this morning, stating that their information had appeared in a new data breach.
The BleepingComputer article has gained more attention after Have I Been Pwned’s disclosure today, spurring Gravatar to respond on Twitter:
Gravatar helps establish your identity online with an authenticated profile. We’re aware of the conversation online that claims Gravatar was hacked, so we want to clear up the misinformation.
Gravatar was not hacked. Our service gives you control over the data you want to share online. The data you choose to share publicly is made available via our API. Users can choose to share their full name, display name, location, email address, and a short biography.
Last year, a security researcher scraped public Gravatar data – usernames and MD5 hashes of email addresses used to reference users’ avatars by abusing our API. We immediately patched the ability to harvest the public profile data en masse. If you want to learn more about how Gravatar works or adjust the data shared on your profile, please visit Gravatar.com.
Gravatar does not consider the incident to be a data breach, which is why the service did not disclose the changes made in response to the security researcher in 2020.
The Automattic-owned service is used across WordPress websites, GitHub, Stackoverflow, and other places online. Security researchers and privacy advocates have warned about privacy attacks on Gravatar for years. Many have demonstrated how readily available user information is and how easy it is to scrape it.
In July 2013, Dominique Bongard spoke at Passwordscon in Las Vegas about De-anonymizing Members of French Political Forums. He explained how a custom crawler could be written to acquire MD5 hashes for forum users and demonstrated that an attack with custom cracking software was able to recover 70% of Gravatar users’ email addresses.
Bogard noted that de-anonymizing members of political forums can be particularly dangerous in places where the forums’ users have no constitutional right to free speech, or where participants may be likely to get harassed or attacked.
Wordfence published an advisory regarding Gravatar in 2016, which referenced Bongard’s research, as well earlier work done in 2009 where a researcher proved that he could reverse engineer ~10% of gravatar hashes into email addresses.
Wordfence founder and CEO Mark Maunder explained how using email address hashes can lead to people to googling the extracted hash to find other websites and services that an individual is using.
“For example: A user may be comfortable having their full name and profile photo appear on a website about skiing,” Maunder said. “But they may not want their name or identity exposed to the public on a website specializing in a medical condition. Someone researching this individual could extract their Gravatar hash from the skiing website along with their full name. They could then Google the hash and determine that the individual suffers from a medical condition they wanted to keep private.”
Many Gravatar users were not satisfied with the service’s explanation that all of the information users entered was public, which disqualified the incident from being labeled a breach. In the same explanation, however, the service claims the API was abused, instead of admitting that it was vulnerable and could have been better protected.
After years of researchers demonstrating that this was possible, is scraping Gravatar an unethical data acquisition because the scraper is abusing the service’s architecture? Or is it unethical that Gravatar made it possible to harvest profile data en masse for years?
“If someone is able to use an API for other than its intended purpose and can gather information which otherwise wouldn’t be available through ‘standard’ means… it’s a breach,” Twitter user @RegGBlinker commented on the matter.
Gravatar undoubtedly wants to minimize the damage done by the breach notices sent out this morning to its users, but making this an issue of semantics was not reassuring. Most users did not intend to share their Gravatar emails with whoever has the motivation to scrape the data that was exposed for harvesting. Even if that data was dumped through “abuse” of their API, it feels like a breach to those who expected that user data would not be available for distribution elsewhere.
The incident serves as a reminder that, as Gravatar emphasized today, the data users choose to share publicly is made available by the service’s API and is not private. As a user, there are risks to enjoying the convenience of not having to upload your profile photo multiple times across various websites. Publishers who want their sites to offer a more privacy-conscious option should look to alternatives like Local Gravatars or Pixel Avatars.
I generally try to avoid Gravatar, Disqus, and other such services; I do my best to keep all my data siloed. Not easy, I likely do not succeed, but this does not surprise or particularly disturb me.