Check if images are identical

My Gilmour audio plugin has the capability of ripping the album art from individual tracks and storing it on the file system. Then it sets it as the cover image in the file meta on each file.

The downside is that most of the time, you end up with 10 or so of the same image in the file system. However, the reason I did this way is because its possible to have different covers on each track if the album is a compilation album created by various artists.

The image comes out of the ID3 tag as base64. I turn that into a jpeg that can be written. Is there an easy way to compare all the given tracks and only write that ones that are unique?

I guess this is probably more of a PHP question than a Kirby question but if someone could point me down the right path, I would appreciate it.

As far as I can see, you are using a custom file method that only reads the information for a single file at a time. So getting all the tracks is not possible this way, I think.

If multiple tracks have the same cover image, is the filename or base64-encoded string always the same or different?

Id have to test but i’m pretty sure the base64 string is identical if the jpeg is the same. Trouble is i cant think of an easy way to do it, since if i check all the files, its going to run on each file upload which is wasteful processing. I don’t think theres a hook for the end of all uploads being complete is there? It fires on each uploaded file, rather then a multi-drop?

But maybe it’s not worth the effort. Right, now i have a toggle so you can turn off the cover image generation and upload your own image to use. Maybe thats enough. I kind of wanted to do it though, since little problems like this help me learn more PHP. :slight_smile:

I guess i could log the information in a cache file and then check all the values in that against each other. Then set the meta and generate the images from the cache, at the end of all uploads completing.

The problem is that once you store your image with a given file name, even if you base64_encode that file again for comparison, the resulting string will be different I think, even if the file contents are the same. But googling might help here to get more information.

And to try and compare images by what they contain might also be pretty processing intensive, so the question is if that is worth it.

I would do it on the direct base64 string from the mp3 file, before doing any manipulation on it at all, so they should be identical i think, since its the raw data. Its a string comparison, which doesn’t sound too intensive?

But your right, It’s starting to sound like a mammoth effort for a small niggle.

Well, yes, but then you would have to get all the base64 strings of all the files that already sit in the folder again.

Which is why i was thinking of using the file cache, since if its in the cache already, i dont need to process the file again, unless its been modified.

Man… can of worms…

You mean to store all base64 strings for that page in the cache? Hm, but your cache has to be cleared once in a while…

Yes thats what i mean. The default life of the cache is 30 minutes isn’t it? Thats enough since the information from the cache gets stamped in to the file metas once all the uploads are complete. It’s just a temporary log to do the job. I don’t think it matters that it doesn’t persist.

Oh, I see, you were only talking about mass uploads. I was thinking more generally when files are uploaded at different times.

On a side note, Kirby has method to get the extension of a file, so you don’t need to use string methods…

Yes, i saw but for some reason it it didn’t work out for me, so I resorted to that. :slight_smile: Maybe I will have another stab at it. Thanks for the tip.

I cant remember why it i didnt work, I made the plugin a couple of months ago but had to stop since i found a bug in Kirby that meant couldn’t finish it until it was fixed, so only just seeing the light of day now.

One option could be:

  1. When storing a cover image, store its SHA1 hash in a meta field (let’s call it cover_hash).
  2. When importing a new audio file, convert the image data from base64 to binary, compute the SHA1 hash and then run $page->images()->findBy('cover_hash', $hash). If successful, you now have the file object and can link to it from the audio file meta. Otherwise, the cover does not already exist, so store it (back to 1).

Thanks for the suggesting @lukasbestle but thats probably a bit above my current PHP ninja level… :slight_smile:

I’ve hacked together an example implementation based on your plugin code. Untested, but might work already:

// File Path
$contentpath = $this->parent()->root();

// image file
$audioartfilename = substr($this->filename(), 0, -4) . '-art.jpg';
$audioartfilepath = $contentpath.'/'.$audioartfilename;
$audioart = explode( ',', $this->id3('cover'));
$audioartdecode = base64_decode($audioart[1]);

// Check if the same cover already exists
$hash = sha1($audioartdecode);
if ($image = $this->parent()->images()->findBy('gilmour_hash', $hash)) {
    return $image->filename();
}

// Does not already exist, create it

// Meta File
$audioartmetafilename = substr($this->filename(), 0, -4) . '-art.jpg.txt';
$audioartmetafilepath = $contentpath.'/'.$audioartmetafilename;
$metacontent = [
    'template'     => option('hashandsalt.gilmour.imagetemplate'),
    'gilmour_hash' => $hash
];

// Create Meta
Kirby\Data\Data::write($audioartmetafilepath, $metacontent, 'txt');

// Create Cover File
F::write($audioartfilepath, $audioartdecode);

return $audioartfilename;
1 Like

Thanks :slight_smile: I’ll give it a shot…

Just out of curiosity, whats the difference / advantage of using Kirby\Data\Data::write rather then F::write? I used F::write to make the meta originally, but I see you’ve switched it.

The Data class is used internally to write all sorts of data files, e.g. our KirbyData format which is used for the .txt files.

I switched to it because it takes a simple array of data. The F::write() method on the other hand takes the final string to write to the file, which requires writing the KirbyData syntax (field separators etc.) by hand. For that one field you were writing previously it was fine, but because there are now two fields, that implementation is a bit more robust.

Even better would be to use the $page->createFile() method, but that would need some more changes to the code.

Thanks for the explanation, i get it :slight_smile:

I tried your code out, but theres no difference, other then i now have the hash stamped into each meta. Im still getting the multiple jpg files generated. Just needs a tweak i think, since if they are all the same i need to store it as album-name-art.jpg as one single jpeg, but if they all differ, as a compilation album might, then i need to generate one for each track, as it does now.

Off for some head scratching…

The issue is that the images are all identical but have different file names.

mp3files

Are all the hashes identical?

Yup. :slight_smile: