Help needed. Background batch image import

I have a working plugin

That imports data from CSV and inserts it in the structure field
I also have a files field inside the structure and wish to upload images from external sources, create files inside the page, and attach them to the structure field. The main problem is that if more than 100 items in the file I have an error with response time … Right now it is working with manual batches (that I select in other field) and I need to re save page until all images are uploaded.

My question is, is there any solution to do an image download, and attach to the structure field as a background process … where should I look for modification?

P.S. It will be a great idea to create a plugin that will help export/import any data to KirbyCMS as it is in Wordpress or other CMS solutions.

<?php

Kirby::plugin('ggg/gaming-software', [
	'hooks' => [
		'site.update:after' => function () {
			createGamingSoftwarePage();
		},
		'page.update:after' => function ($newPage, $oldPage) {
			processAndDownload($newPage);
		}
	],
	'blueprints' => [
		'pages/gaming-software' => __DIR__ . '/blueprints/pages/gaming-software.yml',
	],
	'templates' => [
		'gaming-software' => __DIR__ . '/templates/gaming-software.php',
	]
]);

function createGamingSoftwarePage() {
	$site = kirby()->site();

	if (!$site->find('gaming-software')) {
		try {
			$gamingSoftwarePage = $site->createChild([
				'slug'     => 'gaming-software',
				'template' => 'gaming-software',
				'content'  => [
					'title' => 'Gaming Software'
				]
			]);
		} catch (Exception $e) {
			throw new Exception('Error creating gaming software page: ' . $e->getMessage());
		}

		if (isset($gamingSoftwarePage)) {
			try {
				$gamingSoftwarePage->changeStatus('unlisted');
			} catch (Exception $e) {
				throw new Exception('Error listing gaming software page: ' . $e->getMessage());
			}
		}
	}
}

function processAndDownload($page) {
	$fileFieldKey = 'gaming_software_files';
	$structureFieldKey = 'gaming_software';
	$mapping = [
		'Software' => 'software_name',
		'Logo' => 'software_logo_temp'
	];

	$fileField = $page->content()->get($fileFieldKey);
	$jsonArray = $page->content()->get($structureFieldKey)->yaml() ?? [];

	$existingSoftwareNames = array_column($jsonArray, 'software_name');

	// Process CSV files
	if ($fileField->isNotEmpty()) {
		$fileIds = $fileField->toFiles();

		foreach ($fileIds as $file) {
			if ($file->extension() === 'csv') {
				$csvData = array_map('str_getcsv', file($file->root()));
				$headers = array_shift($csvData);

				foreach ($csvData as $row) {
					$item = array_combine($headers, $row);
					$mappedItem = [];

					foreach ($mapping as $csvKey => $structureKey) {
						$mappedItem[$structureKey] = $item[$csvKey] ?? '';
					}

					// Check for duplicates before adding
					if (!in_array($mappedItem['software_name'], $existingSoftwareNames)) {
						$mappedItem['_key'] = $item['id'] ?? uniqid();
						$jsonArray[] = $mappedItem;
						$existingSoftwareNames[] = $mappedItem['software_name'];
					}
				}
			}
		}
	}

	// Download images
	$batchSize = (int)$page->content()->get('import_items')->value(); // Get the number of items to process per iteration
	$count = 0;

	foreach ($jsonArray as &$item) {
		if (!empty($item['software_logo_temp']) && empty($item['software_logo'])) {
			$urlField = $item['software_logo_temp'];
			$softwareName = $item['software_name'] ?? 'unknown';
			$fileId = downloadAndAttachFile($urlField, $softwareName, $page);
			if ($fileId) {
				$item['software_logo'] = $fileId;
				unset($item['software_logo_temp']);
				$count++;
			}
		}
		if ($count >= $batchSize) {
			break;
		}
	}

	// Encode data to YAML format
	$yamlData = yaml::encode($jsonArray);

	// Update the page with the combined data
	try {
		$page->update([
			$structureFieldKey => $yamlData,
			'gaming_software_files' => '' // Clear the file field if necessary
		]);
	} catch (Exception $e) {
		throw new Exception('Error updating fields: ' . $e->getMessage());
	}
}

function downloadAndAttachFile($urlField, $softwareName, $page) {
	if (preg_match('/\((https?:\/\/[^\)]+)\)$/', $urlField, $matches)) {
		$url = $matches[1];
		$parsedUrl = parse_url($url);
		$pathParts = pathinfo($parsedUrl['path']);
		$extension = strtolower($pathParts['extension']);
		$validExtensions = ['jpg', 'jpeg', 'png', 'gif', 'svg'];

		if (!in_array($extension, $validExtensions)) {
			return ''; // Invalid file type, ignore
		}

		$filename = str_replace(' ', '_', strtolower($softwareName)) . '_logo.' . $extension;

		// Temporary file path within the page directory
		$tempFilePath = $page->root() . '/temp-' . $filename;

		// Download file to temporary path
		try {
			file_put_contents($tempFilePath, file_get_contents($url));
		} catch (Exception $e) {
			// Log the error and return an empty string
			error_log('Failed to download file: ' . $e->getMessage());
			return '';
		}

		// Move the file to the content directory
		try {
			$file = $page->createFile([
				'source' => $tempFilePath,
				'template' => 'image', // You can specify a different template if needed
				'filename' => $filename,
			]);

			// Clean up the temporary file
			unlink($tempFilePath);

			return $file->id();
		} catch (Exception $e) {
			// Log the error and return an empty string
			error_log('Failed to create file: ' . $e->getMessage());
			return '';
		}
	}

	return '';
}```

littlebit update plugin

but still have a question how can i attach file id in one iteration to all items in structure field if part of images could be not uploaded yet.

<?php

use Kirby\Uuid\Uuid;

Kirby::plugin('yourname/gaming-software', [
	'hooks' => [
		'site.update:after' => function () {
			createGamingSoftwarePage();
		},
		'page.update:after' => function ($newPage, $oldPage) {
			processAndDownload($newPage);
		}
	],
	'blueprints' => [
		'pages/gaming-software' => __DIR__ . '/blueprints/pages/gaming-software.yml',
	],
	'templates' => [
		'gaming-software' => __DIR__ . '/templates/gaming-software.php',
	]
]);

function createGamingSoftwarePage() {
	$site = kirby()->site();

	if (!$site->find('gaming-software')) {
		try {
			$gamingSoftwarePage = $site->createChild([
				'slug'     => 'gaming-software',
				'template' => 'gaming-software',
				'content'  => [
					'title' => 'Gaming Software'
				]
			]);
		} catch (Exception $e) {
			throw new Exception('Error creating gaming software page: ' . $e->getMessage());
		}

		if (isset($gamingSoftwarePage)) {
			try {
				$gamingSoftwarePage->changeStatus('unlisted');
			} catch (Exception $e) {
				throw new Exception('Error listing gaming software page: ' . $e->getMessage());
			}
		}
	}
}

function processAndDownload($page) {
	$fileFieldKey = 'gaming_software_files';
	$structureFieldKey = 'gaming_software';
	$mapping = [
		'Software' => 'software_name',
		'Logo' => 'software_logo_temp'
	];

	$fileField = $page->content()->get($fileFieldKey);
	$jsonArray = $page->content()->get($structureFieldKey)->yaml() ?? [];

	$existingSoftwareNames = array_column($jsonArray, 'software_name');

	// Process CSV files
	if ($fileField->isNotEmpty()) {
		$fileIds = $fileField->toFiles();

		foreach ($fileIds as $file) {
			if ($file->extension() === 'csv') {
				$csvData = array_map('str_getcsv', file($file->root()));
				$headers = array_shift($csvData);

				foreach ($csvData as $row) {
					$item = array_combine($headers, $row);
					$mappedItem = [];

					foreach ($mapping as $csvKey => $structureKey) {
						$mappedItem[$structureKey] = $item[$csvKey] ?? '';
					}

					// Extract URL from software_logo_temp and determine the file extension
					if (preg_match('/\((https?:\/\/[^\)]+)\)$/', $mappedItem['software_logo_temp'], $matches)) {
						$imageUrl = $matches[1];
						$pathParts = pathinfo(parse_url($imageUrl, PHP_URL_PATH));
						$extension = strtolower($pathParts['extension']);
						$validExtensions = ['jpg', 'jpeg', 'png', 'gif', 'svg'];

						// Check if the file extension is valid
						if (in_array($extension, $validExtensions)) {
							$softwareName = $mappedItem['software_name'];
							$imageFilename = str_replace(' ', '_', strtolower($softwareName)) . '_logo.' . $extension;

							// Check if the file already exists
							if (!file_exists($page->root() . '/' . $imageFilename)) {
								uploadAsync($imageUrl, $page, $imageFilename);
							}

						} else {
							error_log('Invalid file extension: ' . $extension);
						}
					} else {
						error_log('Invalid software_logo_temp format: ' . $mappedItem['software_logo_temp']);
					}

					// Check for duplicates before adding
					if (!in_array($mappedItem['software_name'], $existingSoftwareNames)) {
						$mappedItem['_key'] = $item['id'] ?? uniqid();
						$jsonArray[] = $mappedItem;
						$existingSoftwareNames[] = $mappedItem['software_name'];
					}
				}
			}
		}
	}

	// Attach the image to the structure field after creating txt files with UUIDs
	foreach ($jsonArray as &$mappedItem) {
		$softwareName = $mappedItem['software_name'];
		$softwareLogoTemp = $mappedItem['software_logo_temp'];
		if (preg_match('/\((https?:\/\/[^\)]+)\)$/', $softwareLogoTemp, $matches)) {
			$imageUrl = $matches[1];
			$pathParts = pathinfo(parse_url($imageUrl, PHP_URL_PATH));
			$extension = strtolower($pathParts['extension']);
			$imageFilename = str_replace(' ', '_', strtolower($softwareName)) . '_logo.' . $extension;
			
			// Attach the image to the mapped item using file ID
			$imageFile = $page->file($imageFilename);
			if ($imageFile) {
				$uuid = Uuid::generate();
				$txtFilename = $imageFile->root() . '.txt';	
				file_put_contents($txtFilename, 'Uuid: ' . $uuid);
				$mappedItem['software_logo'] = $imageFile->id();
			} else {
				$mappedItem['software_logo'] = null;
			}
		} else {
			$mappedItem['software_logo'] = null;
		}
	}

	// Encode data to YAML format
	$yamlData = yaml::encode($jsonArray);

	// Update the page with the combined data
	try {
		$page->update([
			$structureFieldKey => $yamlData,
			//'gaming_software_files' => '' // Clear the file field if necessary
		]);
		error_log('Page updated successfully.');
	} catch (Exception $e) {
		error_log('Error updating fields: ' . $e->getMessage());
		throw new Exception('Error updating fields: ' . $e->getMessage());
	}
}

function uploadAsync($url, $page, $filename) {

	$filePath = $page->root() . '/' . $filename;

	// Use a background process to download the image
	$cmd = "curl -o " . escapeshellarg($filePath) . " " . escapeshellarg($url) . " > /dev/null 2>&1 &";
	exec($cmd);
}
?>

I would look at the Janitor plugin (Janitor | Kirby CMS Plugins) which will be able to run such a download script more reliably than hooking into page updates.

I would look at the Janitor plugin (Janitor | Kirby CMS Plugins) which will be able to run such a download script more reliably than hooking into page updates.

it looks great but I think that not have such knowledge to work with it.

if I have some empty example…

how to work with it … would be great. For example

I will create
button to import all data to page
button to import all images from img_temp and assing to structure field and update page

but what should be in config / site/commands I don’t understand…

Have you had a look at the plugin docs? GitHub - bnomei/kirby3-janitor: Kirby 3 Plugin for running commands like cleaning the cache from within the Panel, PHP code, CLI or a cronjob

There @bnomei has put a number of examples.

we work together with ChatGPT 4o ) and he try to create something and this something is unworking at all)