• Resolved Alexander S. Kunz

    (@antermoia)


    Hello,

    I have read the rather concerning news that WordPress.com and Tumblr are going to provide user-created content to AI companies like OpenAI and MidJourney for training their, erm, stochastic parrots and mimicry machines. (news found on 404 Media, The Verge, and others.)

    I am self-hosting WordPress and try to rigorously defend myself against such abuse of my data, but as a customer using Jetpack’s “Vault” backup, I am concerned that my data will be included through this backdoor — to which I, needless to say, strongly object.

    I would appreciate a clarification.

    Thanks
    Alexander.

    The page I need help with: [log in to see the link]

Viewing 7 replies - 1 through 7 (of 7 total)
  • Dee Teal

    (@thewebprincess)

    Thanks for asking my first question @antermoia I’ll be following your thread with interest…

    Ditto. Automattic needs to answer questions about whether and how data from self-hosted WordPress sites using Jetpack is being collected and sold. If no answers are provided — and soon — I will stop using Jetpack on all my sites, recommend ceasing its use on all my client sites, and publish my own findings about this.

    Thread Starter Alexander S. Kunz

    (@antermoia)

    There’s a blog post by Automattic (with the Orwellian title “Protecting User Choices”) with a clarification: “We are not including content from sites hosted elsewhere even if they use Automattic plugins like Jetpack or WooCommerce.”

    However, there is a feature called “The Firehose”, through which Automattic sold access to posts from self-hosted sites with Jetpack that have “Enhanced Distribution” enabled — this is enabled by default, and can only be deactivated on the list of all Jetpack modules, at:

    (your site) /wp-admin/admin.php?page=jetpack_modules

    The description states that “The Firehose” is “intended for partners like search engines, artificial intelligence (AI) products and market intelligence providers”.

    That’s nice, isn’t it?

    Thanks for that. I had never seen the Modules settings before, cleverly hidden in plain sight in a tny link at the bottom of the Jetpack dashboard.

    And sure enough, all my sites have Enhanced Distribution enabled, which means their posts have been sold by Automattic for years. I’m disabling Enhanced Distribution on all sites, and I’m going to remove Jetpack completely once I find alternatives to the functions I was using.

    Even if they claim they don’t sell data from self-hosted sites to AI companies, I’m sure that was the plan, given that they were already selling it to anyone else who was willing to pay them for it.

    People have been warning me about Jetpack for years. I should have listened.

    Thanks to everyone for your patience as we are getting back to you.

    With our AI partnerships, we are only sharing public content hosted on WP.com and Tumblr from sites that don’t opt out of content sharing. So, for sites using Jetpack at any other host, their content is not shared.

    We have sold our Firehose to social and data analytics companies, and we have also used some distribution partners (like Socialgist) to sell the Firehose to these types of end users. Neither we or our distribution partners sell the Firehose to any companies that are training LLMs or to any generative AI companies. They also do not sell access to anyone else that is acting as a further downstream distributor of data, who could put our data in the hands of others, without our knowledge.

    If reselling Firehose data happens downstream from our partners, it would be a violation of our contracts, and grounds for immediate termination and further legal action against them. Overall, the Firehose business is a legacy business line that is very small for Automattic, and something we have been actively sunsetting and winding down.

    You can read more about the Firehose and our plans to sunset it here:

    https://jetpack.com/support/privacy/enhanced-distribution/

    Let us know if you have further questions!

    Leaving aside the issue of selling data for the purpose of training LLMs, I have additional questions:

    1. I’ve been using Jetpack on sites I manage for years. I had no idea the Enhanced Distribution setting even existed until Alexander S. Kunz mentioned it above. I’d like confirmation that data from those sites was gathered (without my approval) for the purpose of selling that data.
    2. I would also like to know how long that’s been going on. When did this surreptitious content farming start?
    3. Is there any way to know to whom said farmed data was sold?

    Yes, I’m aware that operating a publicly-accessible web site means that the site’s content will be indexed, scraped, and may be used in various ways. The difference is that you’re selling it.

    Plugin Contributor Stef (a11n)

    (@erania-pinnera)

    Hi there, thanks for waiting for our response on this matter ??

    Enhanced distribution is a feature that was released in 2013 with the purpose of driving traffic by giving blogs additional readership in the WordPress.com Reader.

    Content from those sites were gathered with approval by accepting the terms of service.

    Our partners were social and data analytics companies. We can confirm that neither we nor our distribution partners sell the Firehose to companies training LLMs or to generative AI companies. They also do not sell access to anyone else, acting as a further downstream distributor of content.

    Hope that answers your questions!

    • This reply was modified 8 months, 1 week ago by Stef (a11n).
Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘Jetpack Backup & AI’ is closed to new replies.