Pure IO
Pure functions are functions whose result totally depends on its arguments and it doesnt make any side effecting changes. One of the important property of such functions is given same arguments would give back same results. Lot of languages don’t have restrictions on functions to have that property. But its nice to have that in place since its much easier to reason about functions like that. We have enough tools in mathematics to actually talk about those functions, since all mathematical functions are pure.
IO is one of those computational problem which is hard to model it that way. Most programming languages have a function to get a line of input from the user. How can we we implement such a function in a language which supports only pure functions? Is it even possible to model IO as pure functions?
I would try to model something like that step by step.
Get Line
Let’s consider the case of a programming language which allows only one line of input to be read, that too at the start of the program, and prints out a result at the end. We can then easily model that by making the main function of the program take a string argument and produce a string. Runtime of such a programming language would read a string, pass it on to the function and prints out whatever the function returns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
trait PureIOProgram {
def main(arg:String):String
}
class Runtime {
def run(program:PureIOProgram): Unit = {
println(program.main(new Scanner(System.in).nextLine()))
}
}
object Runtime {
def main(args:Array[String]): Unit = {
new Runtime().run(new SquareNumber())
}
}
And consider this is the starting point of the program, and everything above is done under the hood for you.
1
2
3
class SquareNumber extends PureIOProgram {
override def main(arg: String): String = (arg.toInt * arg.toInt).toString
}
Interaction
IO usually is more interactive than that. Like we take input do some processing, get more input or output and in arbitrary combinations of it. How do we model something like that?
We are going to make use of something we did in last example. Instead of returning the string to output, if we can return some datastructure which lets us know whether to take more input or output. This datastructure would let the runtime exactly know what to do next, in what sequence. Like.,
1
2
3
4
5
sealed trait IO {}
case class InputLine(fn:String => IO) extends IO
case class OutputLine(value:String, fn:() => IO) extends IO
case object Exit extends IO
InputLine
takes a function which takes the read input line and produces more IO
actions. OutputLine
outputs whatever value is and calls the function for more IO
. And Exit
simply exits.
Now our interactive program would look like
1
2
3
4
5
6
7
8
9
class SquareNumber extends PureIOProgram {
override def main = {
OutputLine("Enter a number: ", () => {
InputLine(line => {
OutputLine("The square of the number is " + (line.toInt *line.toInt), () => Exit)
})
})
}
}
Note that none of the functions, even the anonymous functions inside the IO
actions are impure. IO is done in a symbolic way rather than in
And our runtime
1
2
3
4
5
6
7
8
9
10
11
12
class Runtime {
val scanner = new Scanner(System.in)
def run(program:PureIOProgram): Unit = {
exec(program.main)
}
def exec:PartialFunction[IO, Unit] = {
case InputLine(fn) => exec(fn(scanner.nextLine()))
case OutputLine(line, fn) => println(line); exec(fn())
case Exit =>
}
}
Higher order functions
If we look at InputLine
and OutputLine
, the sequencing logic is almost the same. Once we add more io actions like spinning up threads (Yes we can treat them as IO), leaving that responsibility to individual actions doesnt make lot of sense. But thats the only way of composition we provide. In-order to have better ways compose and have proper responsibility segregation we introduce 2 more IO actions.
1
2
case class Sequence[T,K](action:IO[T], fn: T => IO[K]) extends IO[K]
case class Return[T](value:T) extends IO[T]
The use case for Return
becomes explicit when we try to compose couple of more IO
actions. These 2 operations (Return
and Sequence
) together form, what is called a Monad. Which is nothing but any structure which has a flatMap
and map
operation. List, Option, Future etc are all Monad.
After some more restructuring of IO
and addition of flatMap
and map
results in our IO
being
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
case class IOResult[T](value:T)
object IO {
trait IO [T] {
def flatMap[L](fn: T => IO[L]):IO[L] = {
Sequence(this, fn)
}
def map[L](fn: T => L):IO[L] = Sequence(this, {v:T => Return(fn(v))})
def exec:IOResult[T]
}
private case object InputLine extends IO[String] {
val scanner = new Scanner(System.in)
override def exec: IOResult[String] = IOResult(scanner.nextLine())
}
private case class Sequence[T,K](action:IO[T], fn: T => IO[K]) extends IO[K] {
override def exec: IOResult[K] = fn(action.exec.value).exec
}
private case class OutputLine(line:String) extends IO[Unit] {
override def exec: IOResult[Unit] = {
println(line)
IOResult(Unit)
}
}
private case class Return[T](value:T) extends IO[T] {
override def exec: IOResult[T] = IOResult(value)
}
def printLine(line:String):IO[Unit] = OutputLine(line)
def getLine:IO[String] = InputLine
def returnn[T](value:T):IO[T] = Return(value)
}
Note that, we would no longer expose InputLine
or OutputLine
or Sequence
to outside world. We only expose IO
interface outside. Hence its safe to have polymorphic behaviours here.And every IO
action should do the same too. Since exec
is impure, as long as exec
isnt directly used by the programs in our language, they remain completely pure.
1
2
3
4
5
6
7
8
9
class SquareNumber extends PureIOProgram {
override def main = {
for(
_ <- printLine("Enter a number:");
line <- getLine;
_ <- printLine("The square of the number is " + (line.toInt * line.toInt))
) yield ()
}
}
Scala’s for
syntax is transformed to flatMap
s except for the last one which get converted to map
incase there is an yield
. Now our SquareNumber
program is lot more easier to read, except ofcourse the _
to eat away the printLine
result. Haskell does it much better, with its do
syntax we can just ignore results.
Also note that the program is still pure, though it might look like it ain’t. Any reasoning we can apply on pure functions can be applied to any part of the program.
Complete source code is here. Individual sections are available as seperate branches.