During profiling I came across a function that was taking quite a bit of time, but essentially boiled down to this very simple piece of code:
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
Result := Copy(AInput, AStart, ASubstringLength);
end;
This function returns the expected sub-string, but it doesn't scale very well for longer inputs. I looked at the assembler code in CPU view, and from what I can tell (I'm not normally working at the assembler level), it seems that AInput is implicitly converted into a string before calling Copy.
But since at this point the length of the string/character array is unknown, the conversion code has to walk the length of the PChar until it finds the null terminator. This would explain the terrible scaling for longer inputs.
However, since the caller passes in the length of the PChar, I initially thought could just convert the method to use SetString instead.
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
SetString(Result, AInput + AStart - 1, ASubstringLength);
end;
In addition to SetString working zero-based (not one-based as Copy), there seem to be a number of other little things Copy does in terms of validating its inputs, not all of which are documented (e.g. any start value less than 1 is changed to 1). So the naïve implementation above doesn't always work as the original one.
My goal is to replicate the Copy routine as much as possible, as this function is part of a library and has been used extensively by my colleagues.
I'm wondering whether the following implementation accomplishes that, or whether I need to be aware of any other caveats of Copy. NB: FLength is the actual length of AInput that comes from another part in the module this function is part of. I removed that other part for this example.
function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
if (AInput = nil) then begin
Result := '';
end else begin
if (AStart < 1) then begin
AStart := 0;
end else begin
AStart := AStart - 1;
end;
if (ASubstringLength + AStart > FLength) then begin
ASubstringLength := FLength - AStart;
end;
SetString(Result, AInput + AStart, ASubstringLength);
end;
end;
I'm using Delphi 2006, but I assume this isn't much different in other versions of the product (at least the non-Unicode ones).
Let's consider the corner cases. I think they are:
AInputinvalid.AStart < 1.AStart > FLength.ASubstringLength < 0.ASubstringLength + (AStart-1) > FLength.We can ignore case 1 in my opinion. The onus should be on the caller to provide a valid
PChar. Indeed your check thatAInput <> nilis already a step too far in my view becausenilis not a validPChar.Of the rest you have covered 2 and 5, but not 3 and 4. So if the user supplies a value of
AStartthat is too large, then you will read off the end of the string. Likewise, the user can readily supply a negativeASubstringLength. I don't think that you need anyone to write the code to check these cases since you are clearly very competent.Now, if you really care about every last drop of performance, you should not be checking for any of these case. Demand that the user passes valid parameters. In debug mode, with
{$IFOPF D+}orAssertyou might check inputs. Of course, if these arguments come from external sources, then they should be validated.On the other hand, the biggest performance hit the original code suffers is the needless scanning of the entire string, and the copying to an intermediate heap allocated string. Once you've removed those, as you have, then the opportunity for further performance gains is much reduced.