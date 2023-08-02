Israeli researchers at graphics giant Nvidia have unveiled a new text-to-image (T2I) generating model that is small enough to fit on a floppy diskette.

The first double-sided double-density 5¼-inch floppies with a marketed capacity of 360KB were launched in 1978.

Nvidia’s Perfusion model is just 100KB, taking up less than 30% of the space available on a regular 5¼-inch floppy disk.

The model is capable of producing images with four minutes of training.

The researchers said Perfusion could creatively portray personalised objects, with the ability to make significant changes in their appearance while maintaining their identity.

Perfusion can also combine individually learned concepts into a single generated image.

Most popular T2I models — including Stable Diffusion and Dall-E — have billions of parameters that mean they occupy multiple gigabytes in their offline guises.

To develop its light and efficient model, the Nvidia team created a new mechanism called ‘Key-Locking’.

This addresses issues typically faced by T21 models — including maintaining high visual fidelity while allowing creative control and the combination of multiple personalised concepts into a single image.

“Perfusion avoids overfitting by introducing a new mechanism that ‘locks’ new concepts’ cross-attention keys to their superordinate category,” the researchers explained.

“Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts.”

“This allows runtime-efficient balancing of visual-fidelity and textual alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art.”

A research paper explaining how Perfusion works is available on Nvidia’s research website. The researchers also plan to publish the model’s code online soon.