• Resolved 8725z4twhugias

    (@8725z4twhugias)


    Hi there, i finally noticed how many Cookies are mentioned in my cookie policy after a friend asked me “wtf, that doesn’t even fit your business model?”

    So i looked through my cookie policy and wow … everything in there from facebook to youtube to wistia to even more unknown services, which i NEVER installed on my website, not even in distant past.

    Okay. so i opened an inkognito window, accepted cookies on my site and clicked around. then i logged in in that same window and started the cookie scan. and voila, only the real ones are there. stripe, woocommerce, kadence, complianz, GA, nothing else

    so i think it’s fixed and go on with my life, but about a DAY later, all this random stuff and even more is in my cookie policy again… what?

    Please understand this as productive critique.. HOW ON EARTH did you come to the conclusion it would be a good idea to scan the Admins Browser storage and decide that found cookies are added to the cookie policy?
    Do you yourself manage your own site only in incognito window? Realtalk, something’s wrong with this approach

    I don’t want to get rid of complianz, long time user and (except this annoying Bug) very happy.
    So how do i make the scan server side so it just scans the site and NOT uses any Users Browser storage?

    Thank you

    The page I need help with: [log in to see the link]

Viewing 13 replies - 1 through 13 (of 13 total)
  • Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    it seems that it uses all cookies which are set when i use the wordpress backend, but that does not really translate well to the end users perspective. when wp-rocket has their tutorial videos embedded with wistia, then this is a cookie that is only set for me as an admin. no end user will ever get to that page.

    having a server side cron run once as logged out user and once as logged in user would be more valuable.
    there could be checkboxes in complianz settings on which user roles should be used for scanning.
    then just show the URL which cron needs to call.

    i just quickly searched for that in google, there are thousands of pages with this exact problem. Every single one i visited uses complianz cookie solution.
    For users who do not want to use server side cron everything can stay as it is now. But for company pages, shops or really anything business related, cron would be a better solution and look way more professional.

    please tell me if you want to move forward with this idea, i’d offer my help as far as my knowledge goes.
    Thanks

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    Hi @8725z4twhugias,

    Thanks for your feedback and ideas, much appreciated. Let me explain why the scan works the way it does.

    A lot of services and plugins do not set cookies server side, but with javascript, including third parties like Google Analytics and first parties like plugins on the website itself. As a consequence, these cookies will not get detected if you would run a serverside scan only. You need to scan for cookies in a browser, where the javascript runs. That’s not possible with a cron.

    The solution we have come up for this, is to load the website in an iframe while the admin is active on the back-end. Only front-end pages are loaded in this scan (unless you have selected that you have logged in visitors). But as the scan runs in the admin’s browser, all the admin cookies will get detected too, as they’re set on the website root.

    This method allows us to get both javascript and PHP cookies. For each of these cookies, the information will then be retrieved from cookiedatabase.org.

    The solution we have for the admin only cookies, is cookiedatabase.org. These cookies should be described on cookiedatabase.org, and flagged as ‘admin only’ cookies. Cookies flagged like this are not shown on the cookie policy anymore.

    cookiedatabase.org is an open source project that relies on contributors for the correct cookie information. There are currently 4356 cookies published and described, but a lot still needs to be done. If you have information about these cookies on your site, it would be great if you can share it with us. In our team we have two cookiedatabase.org contributors, who can update cookie and service information. When these cookies are complete and marked as admin only cookies, they will disappear from your cookie policy on the next sync.

    An alternative method to the current scan method would be to build a server that runs a headless browser, where we can load the website, and retrieve the cookies. This is a large project which would require quite some server capacity to scan 300000 websites on a regular basis.

    That is one of the things we have on our roadmap to investigate for version 9.x

    To summarise my long read:

    – Admin cookies will disappear from your policy when described on cookiedatabase.org
    – Investigation of the headless browser solution is on our roadmap

    If you have a workaround/better solution for any of the above, or I missed something, let me know!

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    I also did a search on cookiedatabase.org for the cookies that are not described on your site.

    wistia: is registered as being a marketing level cookie. If you are sure it is not used front-end on your site, you can disable the cdb.org sync, then set it as admin cookie.

    tADu, tTE, tMQ, tADe, tAE, tC, tTf, tTDe, tTDu, tnsApp, t3D, tPL, tk_qs, loglevel
    These are registered on cdb.org, but have not been described, due to lack of information. If you know what plugin/service places these, we can investigate further, and complete the description.

    undefined: this is clearly a coding error from one of the services/plugins. I have added it as described cookie, with the flag “no longer active”. This should hide it on your policy on the next sync.

    wordpress_test_cookie: this one is described, and should be fully described after the next sync.

    Especially the cookies starting with a t are of interest, as I can see these are requested quite often.

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    The tADu, tTE, tMQ, tADe, tAE, tC, tTf, tTDe, tTDu, tnsApp, t3D, tPL, tk_qs items seem to be localstorage from gutenberg centered page builders. On your site probably Kadence. These are used front-end, and are functional.

    We will update the information on cookiedatabase.org shortly.

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    Yes, these “tXXX” cookies are definitely from Kadence Blocks. i had way more other cookies yesterday but i cleared them and scanned again. they will build up again in some days.

    it is more about the wistia things and other random cookies that are only from being an admin on my site.

    completely marking wistia as “admin” would not be the right way, as it could as well be a frontend cookie when someone enables this embedding.

    the headless browser is exactly the useful solution i am hoping for. if possible, let it be part of the plugin or an extra plugin. then those who wish to use it can run it as cron.

    but what i do not understand from your explanation: when you scan the pages in an iframe, you have all required data to see which cookies are placed on which site and by which script. why don’t you just scan the frontend and exclude everything that is placed when visiting a backend site?

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    why don’t you just scan the frontend and exclude everything that is placed when visiting a backend site?

    i know that complianz is built as a plugin, but maybe something like this could be implemented much easier as a script outside of wordpress for those with ssh access?

    edit: cookie description for t3D tADe tADu tAE tC tMQ tnsApp tPL tTDe tTDu tTE tTf
    the category/service could be “Website Design”
    english:
    This cookie is part of a bundle of cookies which serve the purpose of content delivery and presentation. The cookies keep the correct state of font, blog/picture sliders, color themes and other website settings.

    german:
    Dieses Cookie ist Teil eines Pakets von Cookies, die der Bereitstellung und Darstellung von Inhalten dienen. Die Cookies behalten den korrekten Status von Schriftarten, Blog-/Bild-Schiebereglern, Farbthemen und anderen Website-Einstellungen bei.

    another edit:
    i am pretty sure i do not use jetpack, i have not installed it and also never used it. the cookie “tk_qs” is still found and mentioned after browsing around in my backend. maybe from some woocommerce dashboard page.

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    @8725z4twhugias When retrieving the cookies from the browser, there is no information on where the cookie comes from (admin or front-end). So can’t distinguish between that.

    A headless browser can never be included in a plugin, as it needs to run in an entirely different type of server, not apache/php. So this has to be a remote service on a central server, to be connected with through an api.

    For those cookies you wish to mark as admin, even when it’s marked as front-end on cookiedatabase.org you can disable the sync, then mark it as admin only cookie.

    This is better then deleting it, because in that case it will get found on the next run.

    Alternatively, if you delete it in the cookie descriptions, it will stay in your database marked as “deleted”. Which also prevents it from being scanned again.

    Only if you completely clear the cookies on the scan page will all cookies get detected again.

    So for the time being, I would recommend to use the “delete” option on the cookie descriptions page, or unsync them, and mark as admin cookie.

    This will prevent the cookies form showing up in your policy, while preventing them from being added again.

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    In reply to your last post: with ssh access there’s no browser to run the script in. So that won’t work. The script needs to run in a browser.

    Regarding ‘tk_qs’. If you never used JetPack, it is possible another service or tool uses this cookie name, which is always a risk when plugins use such generic cookie names.

    For third party services, it is possible to have cookies with the same name in different services. With first party services this is currently not possible, but if there is another service using this name, and we know which one it is, we can start working on that as well.

    Kadence cookies: thanks for your input, I’ll pass it on to our contributors so they can update these cookies.

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    okay i will delete them now and then not “clear all cookies” in complianz before scanning again.

    In reply to your last post: with ssh access there’s no browser to run the script in. So that won’t work. The script needs to run in a browser.

    correct. puppeteer, selenium or similar can run server side, not on apache/nginx/php hosting level BUT server level, like on top of ubuntu OS
    hence the ssh root access to the server level, not just the webhosting part ??

    this would be a zero running-cost solution for you. who wants to use api can still do that as a paid cloud service.

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    for the th_XX cookies, i found this: https://automattic.com/cookies/

    tk_qs is not used for “marketing purposes” by jetpack as stated in my cookie policy, but it’s Automattic Admin statistics. tk_ai is correctly categorized as “admin only”

    tk_ni / tk_ai / tk_qs Gathers information for our own, first party analytics tool about how our services are used. A collection of internal metrics for user activity, used to improve user experience.

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    When retrieving the cookies from the browser, there is no information on where the cookie comes from (admin or front-end). So can’t distinguish between that.

    it seems i am missing some connection here…

    complianz loads the pages in an iframe and then after the load is complete, scans the storage for cookies.
    is there no way to start in “inkognito mode” and differentiate or log what cookies were stored by each page?

    Plugin Contributor Rogier Lankhorst

    (@rogierlankhorst)

    Building a solution specifically to run in server environments where a headless browser is possible is not a sufficiently generic solution for us to focus on right now. We’d be happy to accept pull requests with such solutions on github.

    There are several requirements for the scan:
    – it has to run in a browser => as iframe in the user’s browser window
    – it has to be able to send back the detected data to the server => needs to be logged in

    The cookies are read from the browser, not necessarily on the page it is placed. It is quite possible a cookie is detected on the homepage, while it was placed already on an admin page of the website. So we don’t know where the cookie comes from. We get that information from cookiedatabase.org.

    Thread Starter 8725z4twhugias

    (@8725z4twhugias)

    i agree, that is a specific use case which will most likely not apply to the majority.

    just to confirm, setting the mismatched cookies to “show in cookie policy -> off” solved all problems for me.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Many false positives. How to fix or make cookie scanning server side?’ is closed to new replies.