Exactly what rules must a function abide before we can call it "idempotent"?

835 Views Asked by At

A post from another thread says that a function is said to be idempotent if it can be called multiple times without changing the result.

However the terms used (like no-side-effects and return-same-results) are relatively ambiguous. Consider this piece of code:

public class test {

    int x = 0;
    java.util.Random r = new java.util.Random();

    public int F() {
        return x + 1;
    }

    public void F2() {
        x = r.nextInt();
    }
}

Can we say that F() is idempotent because successive calls to F() returns the same value?

Or is it not idempotent since successive calls to F() does not return the same value if F2() is called inbetween?

PS: "idempotent" as defined in computer science, not maths.

4

There are 4 best solutions below

19
On

Your function is not idempotent. It can return different results. Given the same input, an idempotent function always returns the same output.

More explicitly, if f() is idempotent (according to the computer science definition) then:

int result1 = f(x);
// ... some arbitrary code
int result2 = f(x);
assert(result1 == result2);
19
On

I'm going to disagree (actually, I now see that I agree with them!) with the other answers and say that, in the most common computer science use (function calls with side-effects, not functional programming), a function is idempotent if you can safely replace any invocation of the function with calling the function twice and keeping only the second result.

For example, consider two functions for deleting a file by name:

1) A function that returns success if the file does not exist. (Since the purpose of a delete operation is to make the file not exist.)

2) A function that returns a "file does not exist" error if the file does not exist. (Since the file could not be deleted.)

The first one is idempotent. If you call it and ignore the result, you can call it again and still get correct information. The file does not exist when you are done.

The second one is not idempotent. If you call it once and ignore the result, your second call will fail, making you think you did not delete the file.

A function to get the current time is idempotent under this definition, even though the result may be different. There is no harm to calling the function twice.

The concept is important in client-server protocols. Say you send a command and get no reply, maybe the connection breaks, maybe the server crashed. So you send the command again and get a reply. But on the first command, was the command lost or the reply? If the command is idempotent, it doesn't matter. You can just use the result.

If a protocol guarantees that all operations are idempotent, lower-level code can retry failed operations, switch servers, and otherwise try to "make things work" without breaking the semantics of the operations.

It takes some doing to make an idempotent protocol. For example, you might wonder how you make a sensible "delete file" operation. One way is to assign each file a unique ID that changes when the file is deleted and recreated. Then you split a delete into two halves. The first, "get ID from name" is idempotent and fails if the file does not exist. The second, "delete ID if it exists" is idempotent and succeeds if you, or anyone else, deleted the file. (The one quirk is this doesn't tell you for sure that you were the one to delete the file.) The combination of the two idempotent operations provides the desired non-idempotent delete operation.

1
On

Trying to summarize things that have come up in other answers and in comments:

There is only one definition of "idempotent". A function f is idempotent if and only if f(f(x)) equals f(x) for all x in the domain of f.

There is more than one definition of "equals". In a lot of contexts, we have an idea of "equivalence" which stands in for equality, and the definition of "equivalent" might be different in different contexts.

There is more than one definition of "function". In mathematics (with a conventional set-theoretic construction), a function is a set of pairs. The "domain" of the function is the set of all elements that appear in the first position of a pair. No element of the domain appears in the first position of more than one pair in the function. The "range" of the function is the set of all elements that appear in the second position of a pair. Elements of the range may appear more than once. We say that a function "maps" each element of its domain to a particular element of its range, and we write f(x) to mean "the second element of the pair in f that has x as its first element".

So, it is clear that for a function to be idempotent, its range must be a subset of its domain. Otherwise, f(f(x)) is meaningless.

In computing, and particularly in imperative languages, a function is often defined as a sequence of statements/instructions, together with some named inputs and output(s) (in most languages only one output). "Calling" a function is an imperative operation that means to execute the instructions. But instructions in an imperative language can have side-effects: they can change things other than their outputs. This concept is absent from mathematics, as well as from pure functional programming.

These imperative "functions", which I will refer to from now on as "routines", can be reconciled with the mathematical definition of a function in two ways:

  1. ignore the side-effects, and say that the routine is a function whose domain is the set of all possible combinations of argument values, and which maps those to the outputs of the routine. This stands on weak theoretical ground if the function is not "pure", that is to say if its output depends on mutable state beyond its arguments, or if it modifies state beyond its outputs. The reason is that a mathematical function by definition does not map its inputs to different outputs at different times. Nor does a mathematical function "change" things when "called", because mathematical functions aren't "called" a particular number of times. They simply "are".

  2. incorporate the side-effects into a mathematical function that describes the effect that calling the routine has on the complete state of the machine, including the outputs from the routine but also including all global state and whatnot. This is a standard trick in CS, and it means that for every statement, instruction, call to a routine, or whatever, there is a corresponding function that maps the state of the machine before the call, to the state of the machine afterwards.

Now, if we apply the definition of "idempotent" in case 1, we are assessing whether or not the mathematical function that a particular routine was designed to implement is idempotent. If the routine does anything other than implement a mathematical function, for example if it has any side-effects, then we are on very shaky ground here, and will come up with misleading results. For example, the function int f(int i) { puts("hello!"); return i; } could be regarded as being idempotent on the basis that "it's an implementation of the identity function!". And that's true if you ignore the side-effects, but that means the definition is useless for any practical purposes, because once side effects are taken into account, executing the expression f(f(0)) is a different thing from executing the expression f(0). f(f(0)) is not equivalent to f(0) even though their return values are equal, and we could only replace one with the other if we didn't care about (that part of) the output of the program.

If we apply the definition of "idempotent" to the functions of machine states in case 2, we are assessing whether or not a call to the function (with particular arguments) is an idempotent operation on the state of the machine. Then my function f above clearly is not idempotent -- the state of a machine with "hello!\n" written to its output device is not the same as the state of a machine with "hello!\nhello!\n" written to its output device. I think it's also clear that in this case, your function F is idempotent (although it isn't "pure", since its return value depends on state other than its formal parameters, and so it is not simply an implementation of a mathematical function), and your function F2 is not idempotent. If test were immutable, then we could reasonably start describing F as pure. F2 then would be invalid.

As far as I know, when compscis talk about idempotence in imperative languages, they are usually talking about whether the functions of machine states defined by case 2, are or are not idempotent. But usage can vary -- if the routine is pure then they might talk about whether the mathematical function it represents is idempotent. In a pure functional language there is no machine state to talk about, so case 2 would be inappropriate, and any use of the term "idempotent" in relation to a function must be case 1. Functions in pure functional languages are always like mathematical functions.

1
On

I have also been trying to figuring out what idempotency actually means and I realized that there are multiple definitions of idempotency floating around. There are two camps that definitions follow, either the definition for mathematical and function programming functions or the computer science definition.

Mathematical definition: f(f(x)) = f(x) for any value x. In other words, a function is idempotent if the effect of the function is invariant under composition.

Computer Science definition: A function is idempotent if "the side effect of N > 0 identical requests is the same as for a single request". In other words, a function is idempotent if the effect is invariant over the number of calls.

For example, take an increment function defined as int f(int x) { return x+1; }. This function will fail the mathematical definition as it is not invariant under composition because f(f(x)) != f(x). On the other hand, it fits the computer science definition because, as Steve McLeod mentioned,

int result1 = f(x);
// ... some arbitrary code
int result2 = f(x);
assert(result1 == result2);

Now, back to your question, is F() in your example idempotent? I am saying yes F() is idempotent but a sequence of calls may not be. According to the HTTP/1.1 protocol definition of idempotency, "it is possible a sequence of several requests is not idempotent even if all methods executed in that sequence are idempotent".

This is possible because you have to consider the state of the program as a hidden parameter to the function F(). For example, consider an example sequence of requests of F(), F(), F2(), F(). The last F() request will not produce the same result as the first two but that is ok because the request was not identical. You must consider the state of the program as a hidden parameter to the function and in the last request, the state was that x was equal to a new random value, but in the first requests the state of x was initially zero.

Sources: