PAHC: Data Tracking

Pseudo Anonymous Hit Counter: Data Tracking

      When I setup this website, I decided I wanted to know how many people visited it. Was it only my mother, when I reminded her? Or was half the County showing up every time I posted? While the former is much more likely then the later, I needed a way to find out.

      I could have simply used Google Analytics. Set it up, let them hold all the data. But that was explicitly not what I wanted. I also could have used the logs provided by my web host. Unfortunately, they didn’t really distinguish between bots and ‘real’ people.

      I also wanted to avoid using scripting or similar. Sure, I need that functionality to work with my site, but for the regular user it shouldn’t be needed. Likewise, it can’t be something that is blocked by the common ad blockers.

      So I needed something that:
Would note when ‘real’ people visited my site, ignoring ‘bots’.
Was stored locally.
Would work without compromising the security of my visitors.
Was free/open source.



      Did some searching, and the first thing I found that looked to meet all these was Piwik (which has since renamed itself Matomo, go figure). It installs on my server, keeping it’s own data. Would sort ‘bots’ from ‘real’ visitors. Was free. Only challenge was it wanted to use scripting and was blocked by uBlock. I did manage to work around both those issues.

      While Piwik wants to use scripting, it would work as a static image. Couldn’t track as much, no finding out visitors a/s/l/favorite ice cream flavor/mothers maiden name/etc, but it would note they had been here. Which was what I wanted, so good there.

      Blocking by uBlock was more challenging. Not from a technical standpoint: uBlock didn’t like the filename, so it was blocked. Simply rename the file and uBlock stops caring. But from an ethical standpoint: As someone in favor of privacy and minimizing tracking, could I justify such sneakery? If the existence of this article wasn’t enough clue, the answer was yes. I figured since I was gathering the minimal amount of information I could, and was keeping it locked in my little corner, that it was ok. Realistically, I’m sure that was rationalization and I am now counted amongst the ranks of evil data collectors.

      With that being said, what data do I gather? And what does that tell me?



What I Track

      If you visit this site:
I know when you visited.
I know what page you visited.
I know your IP address (although this can be spoofed).
I know your User Agent (although this can also be spoofed).
From your IP, I can guess at your geographic location (which can be wrong, and is only general when it’s right).
From your User Agent, I can guess at your browser and operating system (which can also be wrong).

      And that’s it. I don’t know anything else about your web browsing, computer usage, etc. I only have a rough idea of your geographic location, such as Town not street (let alone number on street). And half of what I know could have been faked through various methods.

      If someone was to comment on the site, I would know more. I’d know the username and email they registered with, as well as the text of their comment. In theory, I could then match that to IP, which could then tell me what pages that IP visited, and therefor that user has likely read. As no one has commented yet, this is a non-issue. And even if someone had, I don’t care enough to sort through the data that much.



      Once I had Piwik up and running, and had tested it enough to be sure it worked, I told it to ignore my IP address. Didn’t want to pollute the numbers with my own visiting. Before that, I was the largest user of my site, by a large margin. Probably still am, not that it matters since it won’t count me now.

      Since then, Piwik has been quietly noting visits. Should probably do a ‘year in review’ come January. But I don’t see why I can’t review past years here.



Review of traffic before 2017

      I have no meaningful data before 2017. Technically the site existed, as some of the DRAT posts show, but Piwik wasn’t setup until the start of 2017. For that matter, the 2015 posts were technically made through WordPress.com, not my site.



Review of traffic in 2017

      For 2017, once my traffic is factored out, there is negligible activity until November. Sometime around April was when I told Piwik to ignore my IP, and thus my visits, which had a noticeable impact on it’s traffic reports. But there was a surprising jump in November, that continued into December. Considering the pages viewed, I would have to guess interest in Muffy’s subdivision caused the traffic increase.

      As for the other statistical information, there is barely enough to really evaluate. America, and Honeoye Falls in particular, is where most traffic came from, which should come as no surprise. Firefox was the most used browser, and Android (ie smartphone/tablet) the most used system. That is a little unusual, but not unreasonable with the sample size I had.

      All in all, a bit more traffic then I expected, but not much. I am not a high traffic part of the internet (imagine that).

Leave a Reply