Where to store ratings?

I will add ratings to my site and thinking about where to store them. In the file system or the database? What would you do?

Maybe every rating has this:

IP
timestamp
id (uri)

Maybe one page has 10 or 1000 votes.

If I save it in the file system every vote is stored as a page and that could be many files and folders. I could also update the same file but then that would be large and I don’t know if it could be corrupt if many tries to update it at the same time.

Thougts?

Good question - regarding the last issue;

I think database (entries) can also get corrupted - when done wrong :slight_smile:

Last week I created a similar project (logging download-actions from several users at a time).

I decided to store the data in the file-system, while locking the file like this;

  $logfile = "content/downloads/stats/" . $download_uid . ".log.txt";

  if (!file_exists($logfile)) {

    fopen($logfile, "w");

  } else {

    $logdata = file_get_contents($logfile);
  }

  $fp = fopen($logfile, "r+");

    while(!flock($fp, LOCK_EX)) {
    }

  include_once("site/fields/downloads/inc/blueprint.php");

  $logdata = $blueprint . $logdata;

  ftruncate($fp, 0);
  fwrite($fp, $logdata);
  fflush($fp);
  flock($fp, LOCK_UN);
  fclose($fp);

Every hit is saved in the same file (one log-file for every downloadable asset) and contains (per hit) something like this;

Date       : Saturday, 2nd of July 2016 - 16:05:11
IP-address : ::1
Map        : (not available)
Country    : (not available)
Region     : (not available)
City       : (not available)
System     : Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Language   : EN / English
Provider   : MY-PC
Server     : MY-PC
Domain     : localhost / localhost
------------------------------------------------------------------

When things go wrong, or become to big - I’ll consider a database solution myself :stuck_out_tongue:

- edit - The log-files aren’t that important for me right now; I’d consider json or yaml syntax later on; than you can query the contents…

I reply my own question. I think I figured it out how to do it in the file system after all.

1. Spam protection

To not get bots I use this as protection:

//forum.getkirby.com/t/kirby-recaptcha/4631

2. Blacklist control

  1. Check if the IP is in a childpage of /current-page/blacklist/*.
  2. If not in blacklist, add an IP + timestamp to the blacklist.
  3. Remove all IP addresses older than today from the blacklist.

That means that the only thing that is in the blacklist is IP numbers from today. One vote each day is acceptable.

3. Keep track of votes

Store a counter for each vote to keep track and be able to calculate an avarage etc.

1: 24
2: 40
3: 456
4: 76
5: 9

So just keep what’s needed and it could probably grow forever. Untested. What do you think?

@1n3JgKl9pQ6cUMrW

Interesting approach!

You step out of the Kirby file system and roll out your own. What I do fear is that large file.

Maybe it could work to build a folder tree like my revisions plugin (a kind of copy of the file structure) and save the filename by the IP, like 123.12.13.145.txt. It would find it very fast.

For a page it would look kind of like this:

123.12.13.145.txt
123.12.14.145.txt
123.12.15.145.txt

Update

If there is a need to cleanup the IP:s maybe the best approach is not after a day (time). Maybe it’s by a file number limit, like after 100 IP:s it starts to cleanup.

I’m considering making a combination of these two approaches and name the files without .txt so Kirby does not use them. They will that way be seen as attached files.

Sounds cool :stuck_out_tongue:

Personally, I don’t like the idea of filling up my contents folder with lot’s of pages, text-files, etc… when they are not content related.

I want to keep things clean and lean; why putting 6.000 text-files and 10 directories with 25 subdirectories in your content-folder, when the site itself has only 4 pages (for example).

Maybe, in the future, Kirby needs a solution for this; the ability to save data not in a database, but also not in the content folder.

I have some other third-party plugins (visitor stats, for example) which also creates pages / subpages / text-files for it’s data;

They not only show up in the widget-field (where they belong) but also on the page-overview-panel (while they are not pages at all).

…I don’t like that approach :stuck_out_tongue:


O, by the way - when saving IP-addresses, keep in mind that IPv6 has colons in it, which you can not save straight in the name of the text-file (like my own test-IPv6-address).

I want to keep things clean and lean; why putting 6.000 text-files and 10 directories with 25 subdirectories in your content-folder, when the site itself has only 4 pages (for example).

With my latest approach I don’t rename them txt and don’t add any more folders other than blacklist. Then blacklist will be a page but all the IP:s will be seen as files, not content in that folder.

4 pages (your example) would result in 4 subpages (blacklist) and an avarage of 1500 tiny IP files per page. It’s better than your predicted nightmare but I agree, it’s still not perfect.

Maybe, in the future, Kirby needs a solution for this; the ability to save data not in a database, but also not in the content folder.

I totally agree with you here. But how do you think that would work? The first that comes to mind is like:

c::set('panel.exclude.slugs', ['blacklist', 'data']);

But then it’s still in the content folders.

O, by the way - when saving IP-addresses, keep in mind that IPv6 has colons in it

Oh my… It looks like some hash md5 thing. Thanks for that info! :slight_smile:

Now that I have your IP I can probably find out where you live, your cats name and what you ate for breakfast! :wink:

Are you aware of the possibility to hide such pages from the panel?

That’s since Kirby 2.3 - if I am correct?

It’s a test / dynamic IP - through a VPN :stuck_out_tongue:

1 Like

Exactly …

1 Like

@texnixe Do you think it’s best to have this kind of data inside or outside of the content folder?

Even if the in this case blacklist subpage is hidden in the panel, it’s not hidden for the query in the template.

I don’t care if it’s in the query except for what it might do with the speed. When getting that page, what more is in that object?

It feels like I’ve read here somewhere that it only goes to it’s own parent, but I’m not sure. In that case it would not care about it’s children (sound terrible ;)). If that’s true it would be good in this case as blacklist children would not be a problem.

I think it makes sense to have that sort of information in the content folder. In a way, it is user generated content, isn’t it, that you probably want to store/backup with your other content. If you don’t want it in your content folder, you can easily store this kind of stuff in a database, since it does not have to manipulated from the panel.

Yes, it’s user generated content so it might fall natural to put it there then. :slight_smile: If I can avoid a DB I will. Else I need to backup both DB and files.

Maybe I should have added this in the first post, but here is hopefully the result (Photoshop mockup):

1 Like

@jenstornell:
In https://getkirby.com/changelog/kirby-2-3-0 we have seen, how we can build something you ask here:

We can create a new directory in the root of the webserver and build the subdirectories and their files like the new thumbs builder does.
Then we know (we can calculate) the path to the corresponding directory in the new tree for every webpage.

Good luck!

Hint: the webserver needs writing permissions in the new tree!

Pointer: the new tree can but needs not to be included in the backup very easy, like the admin needs or wants.

@anon77445132 Thanks! I still have decided to go for a content approach.

@1n3JgKl9pQ6cUMrW About the IPv6, I’ll probably use md5 or something to hash it. I don’t need to know the IP, just that the same unique signature does not appear twice.

I don’t know how critical your ratings are, but of course you can share an
IP.

I know companies who have about 15.000 workers, using the same (outbound)
IP to reach the internet.

So blocking one IP, will block all of those workers :stuck_out_tongue:

@1n3JgKl9pQ6cUMrW I use md5 just to be able to have that signature as a filename.

Do you think a company with 15000 workers will use the same IP? They will probably get blocked by Google and other services. I worked on a company before with 100 workers and we had 5 IPs. We got blocked by Google from time to time.

Anyway, I get your point. In my case I don’t think a company should be able to vote more than one time on one item. Else a company with 15000 workers could give 15000 votes on a rating of 5 to a single item. Pros and cons to both ways I guess. :slight_smile:

How far did I come with this? Not done yet but I create a blacklist as a childpage to the item (page) and add an empty file renamed as an md5.

If the file already exists it does not add a voting else it increment content in the item like this:

Rating-1: 2
----

Rating-2: 3
----

Rating-3: 2
----

Rating-4: 4
----

Rating-5: 1

The new increment feature was very helpful here:

https://getkirby.com/docs/cheatsheet/page/increment

$page->increment('rating_' . $rating);

Good to hear!

I created a rating system myself (like the one you are creating right now)
way back in 2003 or so;

I had the same troubles (IP, blacklisting, false entries, etc…) and
decided - after a lot of trouble to fall back to a third-party system;

It took too long, to create my own (but that’s just me!).

- update - O my God, it came from the deep :stuck_out_tongue: (almost fifteen years ago…).

I had the same troubles (IP, blacklisting, false entries, etc…)

I have blacklisting of IP:s now. I hope that Google reCaptcha will keep me away from bots.

Self hosted third party system? Or a service? I tried about a year ago to find a good rating service (with some API) but I found nothing. Strange, should be a market for it.

Looks kind of like this now, not Photoshop this time, real code:

I could not even code in 2003. I kind of started with PHP 2007.

Looks promising, may be you can create the third-party API your self :slight_smile:

1 Like

@jenstornell:
It is quite more then normal that big companies with many more then fifteen thousand employees use only one IP!
And that for all people, who work for them in whole Europe and the surrounding countries!