I'm using the Box/Spout library and it seems that using StyleBuilder with a custom hex color (e.g. 0000FF for blue) uses a ton of memory compared to using pre-defined colors such as Color::BLUE. Why would that be?
Relevant snippet:
//LOW MEMORY
$row[] = WriterEntityFactory::createCell('test', (new StyleBuilder())->setFontColor(Color::BLUE)->build());
//HIGH MEMORY
$row[] = WriterEntityFactory::createCell('test', (new StyleBuilder())->setFontColor($colorHex)->build());
Output:
setFontColor(Color::BLUE): Peak memory usage: 1666 KB
setFontColor($colorHex): Peak memory usage: 189436 KB
Full code:
(For demo purposes I'm loading a small 250x150 image to provide multiple color values)
<?php
require_once 'Spout/Autoloader/autoload.php';
use Box\Spout\Writer\Common\Creator\WriterEntityFactory;
use Box\Spout\Common\Entity\Style\Color;
use Box\Spout\Writer\Common\Creator\Style\StyleBuilder;
//load an image
$img = imagecreatefrompng('input/test250x150.png');
$writer = WriterEntityFactory::createXLSXWriter();
$writer->openToFile('output/MyExcel.xlsx');
//height of the image
for($y=0; $y<150; $y++) {
//create or reset array to hold this row's cells
$row = [];
//width of the image
for($x=0; $x<250; $x++) {
//gets the pixel color
$index = imagecolorat($img, $x, $y);
$colorRGBArr = imagecolorsforindex($img, $index);
$colorHex = sprintf("%02x%02x%02x", $colorRGBArr['red'], $colorRGBArr['green'], $colorRGBArr['blue']);
//LOW MEMORY
//$row[] = WriterEntityFactory::createCell('test', (new StyleBuilder())->setFontColor(Color::BLUE)->build());
//HIGH MEMORY
$row[] = WriterEntityFactory::createCell('test', (new StyleBuilder())->setFontColor($colorHex)->build());
}
$writer->addRow(WriterEntityFactory::createRow($row));
}
$writer->close();
echo 'Peak memory usage: '.round(memory_get_peak_usage() / 1024).' KB';
?>
tl;dr
While the
Spout
can be improved, Excel isn't designed for large quantities of styles, so this isn't really a flaw of the library (you might want to rescind the issue)The Story
Alright, there's a few things at play here.. the code I used to test is at the bottom of my post - the key relevant color functions are
jonEg
andjonEgQuant
(which is a$by
variable inside that can be tuned)Styling in Excel is a little like HTML+CSS. Styles are defined initially in the file (in a header block) and then referenced by name (class) in the sheets. Fundamentally this suggests the Excel storage format was not designed for a plethora of styles, but only a handful that are reused on many cells
Spout
hides the style-definition complexity from the user - if the style's already defined (ie. an exact style already exists) then it reuses that definition, otherwise a new style is definedSpout
builds an index of all styles, by serializing the style object and storing it in an array. At approximately 1749 bytes per style this index chews up memory (in addition to the actual style objects it has in memory to generate the file) - this could be improved inSpout
With this code I'm guaranteeing each cell has a different style (using the row/column position to define the red and green components of the color) - which is more extreme than your image-color-picking, but perhaps not by much (I cannot tell without the sample image). There are 16M colors available with an RGB triplet.. but our eyes cannot always detect a difference of a few points. For example it'll be clear in a red gradient that 255,254,253...128 looks smooth, but a single block of 255,254,253 randomly distributed will probably look like a single color. For anything but a primary color, this variance now applies in three dimensions (r, g, & b). The JPEG format exploits this (a combination of compression, and noise in rebuilding) - so color picking what might look like a uniform block will still return slightly different values each time
When you run my code with
jonEg
you get (100*150+1) 15001 styles, which takes (15001*1749/1024/1024) ~25MB of memory for the index inSpout
alone. Again this index is needed to prevent every single cell in Excel having it's own style (but of course, in this contrived example I've made sure it's pointless - every cell does have it's own style) - this uses ~100MB of memoryWhen you run my code with
jonEgQuant
(leaving$by=16;
) only 71 styles are needed (but this is pretty extreme rounding, allowing only 4096 colors in total), - this uses ~2MB of memoryWhen you set
$by=4
(insideColorBuilder::jonEgQuant
) - you now have up to quarter of a million colours, 989 styles are needed, using ~7MB of memory (more to the point, this looks similar tojonEg
when opened in Excel)Spout
includes a method to convert RGB decimal values into a stringColor::rgb($r,$g,$b)
- but this doesn't change the outcome (spoutDoc
method)A style has many more dimensions than this - I've used background colour, but there's foreground (your example), underline, bold, italic, font face, font-size - all of these multiply the style quantity (reducing the color space here addresses your documented problem, but could be undone by altering other style components)
Take Aways
It's not the use of hex codes, RGB values or constants that's the issue, it's the quantity of styles you're using
Reduce the number of styles you use - Excel doesn't expect a large count,
Spout
likewise isn't optimized for this. Compressing the color (by rounding values as withjonEgQuant
) is one avenueImprove the
Spout
style index mechanism, but notice only some of the memory is consumed by the index - it's not small, but it's not the only contributor - each style object is needed to generate the result. Removing this style cache (which for this contrived example doesn't help) brings memory usage to 40MB (and generates the same Excel file). 60% savings isn't nothing, but more styles makes for more memory usage, larger Excel files, and slower opening/rendering of those filesTesting notes
I dropped the image reading/color harvesting while testing since it may add to memory usage but not to the meat of the problem
With
$colCount=250;
(from your example) you exceed the 128MB default memory limit in PHP, better to set it$colCount=100;
and you can run all the tests without changing that settingI switched to setting background-color, it's easier to see the difference when opened in Excel
For each color generator (
jonEg
,spoutDoc
,fixedEg
,jonEgQuant
) a different XLSX is generated (helps to see the file size and rendering differences)The file sizes match the Excel complexity -
jonEg
is a 179KB file (and Excel struggles to open), whilejonEgQuant
is 43KBCode
Tech:
PHP 7.4.2 CLI, Spout: 3.1.0, Win: 7 x64 (I know), Coffee: Venti Dark