--- author: Las Safin date: "2021-07-23" keywords: - haskell - ghci - shell - procex - guide title: Using Haskell as my shell ... Oddly, programmers use one programming language for their shell, yet another one to write programs. When we need to run a lot of external commands, we use a shell scripting language, and when we need to write algorithms, we use a "real" programming language. The core difference can be summarized as the lack or presence of data structures. Bash doesn't support data structures well, something like Haskell, or any imperative language, like Python, does. When we try to use Bash to handle structured data, it quickly goes wrong. Take a look at [this stackoverflow answer](https://stackoverflow.com/a/45201229) for how to split a string into an array using `,` as a delimiter: ```bash readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]'; declare -p a; ## declare -a a=([0]="Paris" [1]="France" [2]="Europe") ``` This is the answer burried underneath many other almost correct yet still incorrect answers. Quite honestly, this is horrifying. What ought to be a simple function call, has been mangled into something beyond recognition. Yet, we still use Bash. For our shell, the most important thing isn't how easy it is to split a string, it's how fast and easily you can run an external command. Relying on Python for that won't go well, as evident by [this stackoverflow question](https://stackoverflow.com/questions/89228/how-to-execute-a-program-or-call-a-system-command); the clean answers depend on `sh`! Imagine, if we could use one language for both? There are many alternatives to Bash, but they are all fundamentally boring shells. [Zsh](https://www.zsh.org/), [Fish](https://fishshell.com), [Oil](https://oilshell.org), [Elvish](https://elv.sh/), [Nushell](https://www.nushell.sh/), [rc](https://doc.cat-v.org/plan_9/4th_edition/papers/rc), [es](https://wryun.github.io/es-shell/), [XS](https://github.com/TieDyedDevil/XS), are domain specific languages, and offer no real value as a "real" programming language. Would you write your IRC client in Elvish? Instead of making a shell language that can do more than Bash, why don't we go the other way around, and try making an existing language usable as a shell? For it to be a good shell, we want to make running external commands as ergonomic as possible. It shouldn't require multiple lines of code to read a file and pipe it to a process. There are many languages with a REPL. If we were to make the Python REPL fit for use as a shell, we could make a library to make it easier to run external commands, but due to Python's syntax, it would likely require a great amount of overhead to execute external processes, what we're likely to do the most in our shell. We could make a function with syntax similar to this: ```python r("ls","-l") ``` While it's not a lot of *visual* noise, it's much slower to type than `ls -l`. The `"`, `,`, and parentheses, take up our valuable time. Haskell, on the other hand, has a much more lightweight syntax. Throughout this blog post I manage to make the following syntax possible: ```hs ξ#ls#_l ``` Its REPL, GHCi, is also quite featureful, importantly supporting path completion. While longer than `ls -l`, for arguments without special characters and capital letters, it grows at the same rate, 1 key stroke of overhead per argument. To do this, I made my own library [Procex](https://github.com/L-as/procex). Why did I make another library when `shh`, `shell-conduit`, `Shelly.hs`, etc. already exist? The reason is that these solutions are all designed around [`createProcess`](https://hackage.haskell.org/package/process-1.6.12.0/docs/System-Process.html#v:createProcess). `createProcess` doesn't support all the features you'd expect on a Unix system, notably passing arbitrary file descriptors through to the called process. In addition, it has the issue that launching processes takes a non-trivial amount of time, around 0.5 seconds on my ODROID N2 (which is unusual hardware I admit). On POSIX-like systems, you generally need to close all file descriptors you don't want to pass on to the new process. This could be handles to files, pipes, etc., notably, stdin, stdout, stderr are the first file descriptors (in that order). Depending on your system, the limit is different. On my system it's `1048576`. `createProcess` is implemented such that it loops from `stderr+1` until this limit, closing every file descriptor in this range. On my ODROID N2, this takes around a second, meaning I had to wait an extra second on every command to execute. This was not usable. Procex doesn't have this problem, the specifics are [detailed here](vfork_close_execve.html), though it is not of great importance to this article. # Procex basics Let's start off by loading Procex up in GHCi, after installing the [`procex` package](hackage.haskell.org/package/procex). This depends on your system, but if you're using cabal it will generally be: ```bash $ cabal update $ cabal install procex --lib procex $ cabal install pretty-simple --lib pretty-simple # Heavily recommended, gives us Text.Pretty.Simple.pPrint $ ghci -Wall -Wno-type-defaults -XExtendedDefaultRules -XOverloadedStrings -interactive-print Text.Pretty.Simple.pPrint > import Procex.Prelude > import Procex.Shell > import Procex.Shell.Labels ``` GHCi has a couple of problems Procex helps us work around, notably, stdin is not set to line buffering, and changing directories doesn't affect path completion. To fix the former, we run: ```hs initInteractive ``` This is equivalent to `hSetBuffering stdin LineBuffering`. The latter can be fixed by doing the following: ```hs :set prompt-function promptFunction ``` Now we can use `cd` from [`Procex.Shell`](https://hackage.haskell.org/package/procex/docs/Procex-Shell.html), and path completion will be different depending on your working directory, where as before, path completion would always be from the directory you started GHCi in. As a side effect the prompt will also be changed. Procex has the concept of commands, which represent a process to execute, along with the arguments and file descriptors we want to pass to it. To create a command, we can use the [`mq`](https://hackage.haskell.org/package/procex/docs/Procex-Quick.html#v:mq) function. After `mq` you can write the arguments you want to pass, wrapping them in quotes, but without any commata, parentheses or similar. Listing the current directory: ```hs mq "ls" "-l" ``` Or if you want to use the short syntax: ```hs mq#ls#_l ``` The labels (prefixed with `#`) are interpreted as strings, where `_` is replaced by `-`, since that character is illegal in labels. The helpers you'll likely be interested in are all in [`Procex.Quick`](https://hackage.haskell.org/package/procex/docs/Procex-Quick.html). `diff`-ing two strings, then capturing the output: ```hs diff :: ByteString -> ByteString -> IO ByteString diff x y = captureLazyNoThrow $ mq "diff" (pipeArgStrIn x) (pipeArgStrIn y) ``` `cat`-ing a string: ```hs mq "cat" <<< "Hello World!\n" ``` Piping `curl` to `kak`: ```hs mq "kak" <| mq "curl" "-sL" "ipinfo.io" -- The reverse will wait for curl to end instead of kak ``` `stat`-ing all the entries in your directory: ```hs import System.Directory listDirectory "." >>= mq "stat" ``` Piping `curl` to a file: ```hs captureLazy (mq "curl" "-sL" "ipinfo.io") >>= B.writeFile "./myip.json" ``` Piping `stdout` and `stderr` to different places: ```hs import qualified Data.ByteString.Lazy as B mq "nix" "eval" "nixpkgs#hello.name" (pipeHOut 1 $ \_ stdout -> B.hGetContents stdout >>= B.putStr) (pipeHOut 2 $ \_ stderr -> B.hGetContents stderr >>= B.writeFile "./log") ``` [`pipeHOut`](https://hackage.haskell.org/package/procex/docs/Procex-Process.html#v:pipeHOut) gives us the raw handle, allowing us to handle the data in Haskell, allowing us to use all the usual Haskell libraries we'd use. In general, it is a better idea to rely on Haskell alternatives to the tools in `coreutils`, as they are fit for Bash and traditional shells: - [`createDirectory`](https://hackage.haskell.org/package/directory-1.3.6.2/docs/System-Directory.html#v:createDirectory) instead of `mkdir` - [`removeFile`](https://hackage.haskell.org/package/directory-1.3.6.2/docs/System-Directory.html#v:removeFile) instead of `rm` - [`createSymbolicLink`](https://hackage.haskell.org/package/unix-2.7.2.2/docs/System-Posix-Files.html#v:createSymbolicLink) instead of `ln` - [`replace-megaparsec`](https://hackage.haskell.org/package/replace-megaparsec)'s `streamEdit`, etc. instead of `sed`, `grep`, etc. # Setting up your shell with Nix You need to copy [this directory](https://github.com/L-as/procex/tree/master/example-shell), fix `shellrcSrcPath`, then refer to the derivation built by `default.nix` in your `environment.systemPackages`, or whatever you prefer. The derivation produces a single file `bin/s` that launches your shell. The equivalent of your `.bashrc` will be in the `ShellRC.hs` file. GHCi commands will have to be put directly into `default.nix`. All the imports in your `ShellRC.hs` file will in addition be available in the shell. The `:li` command will reload the `ShellRC.hs` file from source instead of using the pre-compiled version from the nix store. # Setting up your shell without Nix Let's make a `$HOME/.ghci-shell.hs` file, with the same purpose as the `.bashrc` file. Let's for now put this inside: ```hs :set -Wall -Wno-type-defaults -XExtendedDefaultRules -XOverloadedStrings -interactive-print Text.Pretty.Simple.pPrint import Procex.Prelude import Procex.Shell import Procex.Shell.Labels :set prompt-function promptFunction initInteractive ``` You can then launch your shell with: ```sh env GHCRTS="-c" ghci -ignore-dot-ghci -ghci-script "$HOME/.ghci-shell.hs" ``` This should work fine, but your init script won't be compiled, whereas it will with Nix. # Speeding up typing While the number of characters isn't very different compared to Bash, there are some tricks to make it faster to type. I'm using a [Japanese keyboard](https://en.wikipedia.org/wiki/File:Surface_type_cover_JIS_keyboard_layout_blue.jpg) with the UK layout. I don't use the extra Japanese keys, so I have rebound the `Hiragana_Katakana` key (2 keys right of space) to `"`, a valuable trick that is applicable to Bash too and has also saved my fingers from unnecessary pain holding down shift. I've also renamed `mq` to `ξ` as such: ```hs ξ :: (QuickCmd a, ToByteString b) => b -> a ξ = mq ``` I've bound my unused `Muhenkan` key (1 key left of space) to that to save another key stroke. I recommend omitting extraneous spaces whenever possible, since the code in your shell is write-once-read-never: ```hs ξ#nix#build"nixpkgs#hello"#_o#out ``` Since I need to hold down shift to type `_`, I've mapped my unused `Henkan` key (1 key right of space) to it to save one more key stroke. My `.XCompose`: ``` : "_" : "\"" : "ξ" ``` You're likely better off doing this by modifying your XKB layout, but I didn't want to delve into that mess. With this we're down to 34 key strokes on my keyboard. The equivalent command in Bash: ```bash nix build nixpkgs#hello -o out ``` This took me 31 key strokes, surprisingly quite close! You could further save key strokes by renaming functions in Procex to shorter names, however, I am of the belief that the user should choose the names, not just for functions from Procex, but also for other common functions they use. I've myself made aliases to [`Data.ByteString.Lazy.UTF8.toString`](https://hackage.haskell.org/package/utf8-string-1.0.2/docs/Data-ByteString-Lazy-UTF8.html#v:toString), [`Data.ByteString.Lazy.UTF8.fromString`](https://hackage.haskell.org/package/utf8-string-1.0.2/docs/Data-ByteString-Lazy-UTF8.html#v:fromString), and some other common functions I use a lot. # Internal design of Procex The first step was making my own glue code in C for interfacing with the `vfork` and `execve` for creating processes, as [detailed here](vfork_close_execve.html). You could do this in Haskell if you're careful, but file descriptors in the child, which would effectively be another Haskell thread, would point to different things than the parent. This is problematic since handles from the environment will now suddenly point to different things, but only in the child. Because of this the code that runs in the child before `execve` is in C. If you didn't bother reading the above article, the gist is that [the glue code](https://github.com/L-as/procex/blob/master/cbits/glue.c) provides functions that combine the forking and execution, in addition to allowing file descriptors to be set up for the child. This is then bound to inside [`Procex.Execve`](https://github.com/L-as/procex/blob/master/Procex/Execve.hs). We interface to it from [`Procex.Core`](https://github.com/L-as/procex/blob/master/Procex/Core.hs), which defines the core `Cmd` type. `Cmd` is internally `Args -> IO (Async ProcessStatus)`, where `Args` is a record of the raw arguments to pass as `ByteString`s, the file descriptors to pass (and how to map them), and what "executor" to use (used to allow `exec`-ing without `fork`-ing). This design was chosen as it is easy to compose. The exported functions are: - `makeCmd' :: ByteString -> Cmd`: Takes the path to an executable and gives you a `Cmd` - `passArg :: ByteString -> Cmd -> Cmd`: Passes an argument - `passFd :: (Fd, Fd) -> Cmd -> Cmd`: Passes the second fd to the command, renaming it to the value of the first fd - `passArgFd :: Fd -> Cmd -> Cmd`: Passes an argument that points to the fd, while passing the fd too. This allows process substitution, since opening the path (`/proc/self/fd/$fd`) will open what's behind the file descriptor. - `unIOCmd :: IO Cmd -> Cmd`: Embeds the IO action inside the `Cmd`, executing the IO action when the `Cmd` is run. - `postCmd :: (Either SomeException (Async ProcessStatus) -> IO ()) -> Cmd -> Cmd`: Runs an IO action just after the process is launched. - `run' :: Cmd -> IO (Async ProcessStatus)`: Runs the command and gives you the handle to a thread that's waiting for it to finish. - `runReplace :: Cmd -> IO ()`: Replaces the current process with the process launched by the command. Notably, `Procex.Core` does not expose any overlapping functionality, since it's only meant to expose the core interface. These all internally wrap the original function passed, resulting in a new function that takes `Args`. When we run a command, we simply pass it an empty `Args`, then each "layer" will add what it needs to it, then finally reaching the root function defined in `makeCmd'`, that calls the functions defined in the glue code (bound in [`Procex.Execve`](https://github.com/L-as/procex/blob/master/Procex/Execve.hs)). [`Procex.Process`](https://github.com/L-as/procex/blob/master/Procex/Process.hs) provides functionality that is commonly needed when executing processes, and wraps over `Procex.Core`. It defines a family of `pipe*` functions, which make pipes, then pass one end of the pipe (as a file descriptor) to the process, and the other end to something else. In principle, we need nothing more, but this is not very ergonomic to use as a shell. Each argument we want to pass to a process needs a `cmd & passArg "myarg"`, and `passArg` doesn't even work when you're in a shell: Often, in our shell, we'll pass paths as arguments, but if you pass in non-ASCII paths to `passArg` as a literal, they will get mangled. The top bit of each byte in the string will simply be unset by the `IsString` implementation of `ByteString`, since it's not UTF-8 aware, so it doesn't know how to encode such bytes into the `ByteString`. To avoid this problem, we need a helper function that takes a `String` instead of a `ByteString`, so that we don't use `ByteString`'s `IsString` instance. In [`Procex.Quick`](https://github.com/L-as/procex/blob/master/Procex/Quick.hs) we define a `ToByteString` class, that has a single `toByteString` member. It has an instance for `[a]` where `a ~ Char` (defined this way to aid type defaulting), such that we can define functions that take any `a` where `ToByteString a`. To attain a Bash-like syntax that is more concise, a `QuickCmd` class is defined, with `quickCmd :: QuickCmd a => Cmd -> a`. It has three instances: - `QuickCmd Cmd`, which uses `id` for the definition. - `(a ~ ()) => QuickCmd (IO a)`, which uses `run` for the definition, i.e. it synchronously waits for it to finish and throws if the exit code is non-zero. The reason this isn't an instance for `IO ()` is again to aid type defaulting. - `(QuickCmdArg a, QuickCmd b) => QuickCmd (a -> b)`, this means `quickCmd cmd` can result in another function that takes an `a` where `QuickCmdArg a` then returns a `b` where `QuickCmd b` again. `QuickCmdArg` has all the instances you can guess, `String`, `ByteString`, etc. We actually can't use `ToByteString` for our instances for `QuickCmdArg`, as that would 1) require `UndecidableInstances` and 2) make type inference not work in a lot of cases. Wrapping it all up, we have the `mq` function that wraps `makeCmd` and `quickCmd`, as shown in the basic examples. There are also various operators that wrap over `Procex.Process` and call `Data.ByteString.Lazy.hGetContents` for you, e.g. `<<<`, `|>`, ` ByteString -> IO ByteString attachFinalizer finalizer str = B.fromChunks <$> go (B.toChunks str) where go' :: [BS.ByteString] -> IO [BS.ByteString] go' [] = finalizer >> pure [] go' (x : xs) = (x :) <$> go xs go :: [BS.ByteString] -> IO [BS.ByteString] go = unsafeInterleaveIO . go' ``` A `Data.ByteString.Lazy.ByteString` is internally isomorphic to a list of `Data.ByteString.ByteString`. By converting it to and then from a list of such chunks, we can insert lazy IO into it, executing the finalizer when we reach the nil case using `unsafeInterleaveIO`. In practice this works quite well, but some times we don't want it to err, for example when we're using `diff`. `diff` returns a non-zero exit code when the inputs differ, but we want to ignore that, so for each lazy `capture*` function there is a `-NoThrow` version. This could be extended to allow filtering what exit codes you want to ignore, but this would complicate the "quick" module, and if you want more advanced behavior, you'd likely be better off using `Procex.Core` and `Procex.Process` directly, then passing the resulting `Cmd -> Cmd` to `mq`. ## The label trick [`Procex.Shell.Labels`](https://github.com/L-as/procex/blob/master/Procex/Shell/Labels.hs) contains this: ```hs {-# OPTIONS_GHC -Wno-orphans #-} module Procex.Shell.Labels where import Data.Functor import Data.Proxy (Proxy (..)) import GHC.OverloadedLabels (IsLabel (..)) import GHC.TypeLits (KnownSymbol, symbolVal) instance (a ~ String, KnownSymbol l) => IsLabel l a where fromLabel = symbolVal (Proxy :: Proxy l) <&> \case '_' -> '-' x -> x ``` Labels like `#label` when `-XOverloadedLabels` is enabled are translated into something like `fromLabel @"label"`. The reason, it's `IsLabel l a` where `a ~ String` instead of `IsLabel l String`, is that with the latter, type inference wouldn't work properly, meaning something like `mq #echo` wouldn't type check. With this instance, `fromLabel @"label"` will be inferred to be of the type `String`, causing it to be evaluated as `"label"`. This will likely conflict with other uses of labels, so you might not want it if you use other libraries that use labels. # Conclusion after 6 months of using Haskell as my shell In the beginning it was certainly painful, it was as if I had to relearn talking. Thankfully GHCi provides an escape hatch: `:!` allows you to shell out to `sh` easily. In the process of switching my shell to Haskell, I also got a lot faster at writing Haskell. Haskell is now the primary interface through which I use my computers, and it has been very pleasant. I no longer have to deal with regexes, since I can whip out a full parser combinator library any time. You could likely also include a PostgreSQL library in the shell to access databases without going through the `psql` REPL. I've also been removing my scripts one by one completely, replacing them with simple Haskell functions in my `ShellRC.hs`, where they can interface with structured data rather than raw bytes. ## Future work Advanced completion like in Fish would be quite nice, but unfortunately GHCi is a bit hard to customize due to its integration into the GHC source code. Perhaps a GHCi alternative external to GHC could be implemented, or the Idris REPL could be modified instead, since it seems more amenable to customisation. -------- # Related articles - [vfork, close_range, and execve for launching processes in Procex](/blog/vfork_close_execve.html) - 2021-07-20 # About me Type theorist. Rolling my own crypto. - E-mail: mdwuaidiuawhdiuhe`@`{=html};lajxujxujuxjujus.rs - GitHub: [\@L-as](https://github.com/L-as) - Matrix: [\@Las:matrix.org](https://matrix.to/#/@Las:matrix.org) # Posts - [All you need is higher kinded types](/blog/all-you-need-is-hkt-s.html) - 2023-01-13 - [Using Haskell as my shell](/blog/haskell-as-shell.html) - 2021-07-23 - [vfork, close_range, and execve for launching processes in Procex](/blog/vfork_close_execve.html) - 2021-07-20 - [F2FS swap files broken and the arcane ritual to fix them](/blog/f2fs.html) - 2021-07-07 This page has a [markdown version](./haskell-as-shell.md) [Atom Feed](/atom.xml) [Public PGP key (6B66 1F36 59D3 BAE7 0561 862E EA8E 9467 5140 F7F4)](/public-pgp-key.txt)