The name does not guarantee security. Haskell and type safety

Haskell developers talk a lot about type safety. The Haskell development community advocates the ideas of "describing invariant at the type system level" and "excluding invalid states." Sounds like an inspiring goal! However, it is not entirely clear how to achieve it. Almost a year ago I published an article "Parse, don't validate" - the first step towards filling this gap.



The article was followed by productive discussions, but we were never able to reach a consensus on the correct use of the newtype construct in Haskell. The idea is simple enough: the newtype keyword declares a wrapper type that is different in name but is representatively equivalent to the type it wraps. At first glance, this is an understandable way to achieve type safety. For example, consider how to use a newtype declaration to define the type of an email address:



newtype EmailAddress = EmailAddress Text
      
      





This trick provides us with some meaning, and when combined with a smart constructor and encapsulation boundary, it can even provide security. But this is a completely different kind of type safety. It is much weaker and different from the one I identified a year ago. By itself, newtype is just an alias.



Names are not type safety ©



Internal and external security



To show the difference between constructive data modeling (more about it in the previous article ) and newtype wrappers, let's look at an example. Suppose we want the type "integer from 1 to 5 inclusive". A natural approach to constructive modeling is enumeration with five cases:



data OneToFive
  = One
  | Two
  | Three
  | Four
  | Five
      
      





Then we would write several functions to convert between Int and the OneToFive type:



toOneToFive :: Int -> Maybe OneToFive
toOneToFive 1 = Just One
toOneToFive 2 = Just Two
toOneToFive 3 = Just Three
toOneToFive 4 = Just Four
toOneToFive 5 = Just Five
toOneToFive _ = Nothing

fromOneToFive :: OneToFive -> Int
fromOneToFive One   = 1
fromOneToFive Two   = 2
fromOneToFive Three = 3
fromOneToFive Four  = 4
fromOneToFive Five  = 5
      
      





This would be quite enough to achieve the stated goal, but in reality it is inconvenient to work with such technology. Since we have invented a completely new type, we cannot reuse the usual numeric functions provided by Haskell. Hence, many developers would prefer to use the newtype wrapper instead:



newtype OneToFive = OneToFive Int
      
      





As in the first case, we can declare functions toOneToFive and fromOneToFive with identical types:



toOneToFive :: Int -> Maybe OneToFive
toOneToFive n
  | n >= 1 && n <= 5 = Just $ OneToFive n
  | otherwise        = Nothing

fromOneToFive :: OneToFive -> Int
fromOneToFive (OneToFive n) = n
      
      





If we put these declarations in a separate module and choose not to export the OneToFive constructor, the APIs are completely interchangeable. It seems that the newtype option is simpler and more type-safe. However, this is not quite true.



Let's imagine that we are writing a function that takes the OneToFive value as an argument. In constructive modeling, such a function requires pattern matching with each of the five constructors. The GHC will accept the definition as sufficient:



ordinal :: OneToFive -> Text
ordinal One   = "first"
ordinal Two   = "second"
ordinal Three = "third"
ordinal Four  = "fourth"
ordinal Five  = "fifth"
      
      





The newtype display is different. Newtype is opaque, so the only way to observe it is to convert back to Int. Of course, Int can contain many other values ​​besides 1-5, so we have to add a pattern for the rest of the possible values.



ordinal :: OneToFive -> Text
ordinal n = case fromOneToFive n of
  1 -> "first"
  2 -> "second"
  3 -> "third"
  4 -> "fourth"
  5 -> "fifth"
  _ -> error "impossible: bad OneToFive value"
      
      





In this fictional example, you might not see the problem. But it nonetheless demonstrates a key difference in the guarantees provided by the two approaches described:



  • The constructive data type fixes its invariants so that they are available for further interaction. This frees the ordinal function from handling invalid values, since they are no longer expressible.
  • The newtype wrapper provides a smart constructor that validates the value, but the boolean result of this validation is only used for control flow; it is not saved as a result of the function. Accordingly, we cannot further use the result of this check and the introduced restrictions; during subsequent execution, we interact with the Int type.


Checking for completeness might seem like an unnecessary step, but it is not: exploiting bugs has pointed to vulnerabilities in our type system. If we were to add another constructor to the OneToFive datatype, the version of the ordinal that consumes the constructive datatype would immediately be non-exhaustive at compile time. In the meantime, another version that uses the newtype wrapper would continue to compile, but would break at runtime and go to an impossible scenario.



This is all a consequence of the fact that constructive modeling is inherently type-safe; that is, the security properties are provided by the type declaration. Invalid values ​​are indeed impossible to represent: you cannot display 6 using any of the 5 constructors.



This does not apply to the newtype declaration, since it has no intrinsic semantic difference from Int; its value is specified externally through the clever toOneToFive constructor. Any semantic difference implied by the newtype is invisible to the type system. The developer just keeps this in mind.



Revisiting non-empty lists



The OneToFive datatype is invented, but similar considerations apply to other, more realistic scenarios. Consider the NonEmpty I wrote about earlier:



data NonEmpty a = a :| [a]
      
      





For clarity, let's imagine the version of NonEmpty, declared via knewtype, as compared to regular lists. We can use the usual smart constructor strategy to provide the desired non-emptiness property:



newtype NonEmpty a = NonEmpty [a]

nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just $ NonEmpty xs

instance Foldable NonEmpty where
  toList (NonEmpty xs) = xs
      
      





As with OneToFive, we will quickly discover the consequences of not being able to store this information in the type system. We wanted to use NonEmpty to write a safe version of head, but the newtype version requires a different statement:



head :: NonEmpty a -> a
head xs = case toList xs of
  x:_ -> x
  []  -> error "impossible: empty NonEmpty value"
      
      





It doesn’t seem to matter: the likelihood that such a situation could occur is so unlikely. But such an argument depends entirely on believing in the correctness of the module that defines the NonEmpty, while the constructive definition requires only trusting the GHC type checking. Since we assume by default that type checking works correctly, the latter is more compelling evidence.



Newtypes as tokens



If you love newtypes, this topic can be frustrating. I don't mean that newtypes are better than comments, although the latter are effective for type checking. Fortunately, the situation is not so bad: newtypes can provide weaker security.



Abstraction boundaries give newtypes a huge security advantage. If the newtype constructor is not exported, it becomes opaque to other modules. A module that defines a newtype (that is, a "home module") can take advantage of this to create a trust boundary where internal invariants are enforced by restricting clients to a secure API.



We can use the above NonEmpty example to illustrate this technology. For now, let's refrain from exporting the NonEmpty constructor and provide the head and tail operations. We believe they are working properly:



module Data.List.NonEmpty.Newtype
  ( NonEmpty
  , cons
  , nonEmpty
  , head
  , tail
  ) where

newtype NonEmpty a = NonEmpty [a]

cons :: a -> [a] -> NonEmpty a
cons x xs = NonEmpty (x:xs)

nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just $ NonEmpty xs

head :: NonEmpty a -> a
head (NonEmpty (x:_)) = x
head (NonEmpty [])    = error "impossible: empty NonEmpty value"

tail :: NonEmpty a -> [a]
tail (NonEmpty (_:xs)) = xs
tail (NonEmpty [])     = error "impossible: empty NonEmpty value"
      
      





Because the only way to create or use NonEmpty values ​​is to use functions in the exported Data.List.NonEmpty API, the above implementation prevents clients from violating the non-emptiness invariant. The values ​​of opaque newtypes are like tokens: the implementing module issues tokens through its constructor functions, and these tokens have no internal meaning. The only way to do anything useful with them is to make them available to functions in the module using them and to retrieve the values ​​they contain. In this case, these functions are head and tail.



This approach is less efficient than using a constructive datatype because it could be wrong and accidentally provide a means to create an invalid NonEmpty [] value. For this reason, the newtype approach to type safety is not in itself proof that the desired invariant holds.



However, this approach limits the area where the invariant violation for the defining module can occur. To be sure that the invariant actually holds, testing the module API using fuzzing techniques or testing based on properties is necessary.



This compromise can be extremely useful. It is difficult to guarantee invariants using constructive data modeling, so it is not always practical. However, we need to be careful not to accidentally provide a mechanism to break the invariant. For example, a developer can take advantage of the GHC convenience typeclass that derives from the Generic typeclass for NonEmpty:



{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic)

newtype NonEmpty a = NonEmpty [a]
  deriving (Generic)
      
      





Just one line provides a simple mechanism for traversing the abstraction boundary:



ghci> GHC.Generics.to @(NonEmpty ()) (M1 $ M1 $ M1 $ K1 [])
NonEmpty []
      
      





This example is not possible in practice, as derived Generic instances fundamentally break abstraction. Moreover, such a problem can arise in other, less obvious, conditions. For example, with a derived Read instance:



ghci> read @(NonEmpty ()) "NonEmpty []"
NonEmpty []
      
      





To some readers these traps may seem commonplace, but such vulnerabilities are very common. Especially for data types with more complex invariants, as it is sometimes difficult to determine if they are supported by a module implementation. Proper use of this method requires care and attention:



  • All invariants must be clear to the maintainers of the trusted module. For simple types such as NonEmpty, the invariant is obvious, but for more complex types, comments are needed.
  • Every change to a trusted module needs to be checked as it can weaken the desired invariants.
  • You should refrain from adding unsafe loopholes that could compromise invariants if misused.
  • Periodic refactoring may be required to keep the trusted area small. Otherwise, over time, the probability of interaction will sharply increase, which causes violation of the invariant.


At the same time, data types that are correct by their construction do not have any of the above problems. The invariant cannot be violated without changing the definition of the data type, this affects the rest of the program. No developer effort is required because type checking automatically applies invariants. There is no "trusted code" for these data types, since all parts of the program are equally subject to the restrictions imposed by the data type.



In libraries, it makes sense to use a new concept of security (thanks to the newtype) through encapsulation, since libraries often provide building blocks used to create more complex data structures. Such libraries usually receive more scrutiny and scrutiny than application code, especially since they change much less frequently.



In application code, these techniques are still useful, but changes in the production codebase over time weaken the boundaries of encapsulation, so design should be preferred when possible.



Other uses of newtype, abuse and misuse



The previous section describes the main uses for newtype. However, in practice, newtypes are usually used differently than we described above. Some of these applications are justified, for example:



  • In Haskell, the idea of ​​typeclass consistency restricts each type to one instance of any class. For types that allow more than one useful instance, newtypes is the traditional solution and can be used successfully. For example newtypes Sum and Product from Data.Monoid provide useful Monoid instances for numeric types.
  • Likewise, newtypes can be used to inject or modify type parameters. The Newtype Flip from Data.Bifunctor.Flip is a simple example that swaps the Bifunctor arguments so that the Functor instance can work with the reverse order of the arguments:


newtype Flip p a b = Flip { runFlip :: p b a }
      
      





Newtypes are necessary for this kind of manipulation because Haskell does not yet support type-level lambda expressions.



  • Transparent newtypes can be used to prevent abuse when a value needs to be passed between remote parts of a program and there is no reason for intermediate code to validate the value. For example, a ByteString containing a secret key could be wrapped in a newtype (with the Show instance excluded) to prevent code from being accidentally logged or otherwise exposed.


All of these practices are good, but they have nothing to do with type safety. The last point is often mistaken for safety, and it does use a type system to help avoid logical errors. However, it would be wrong to argue that such use prevents abuse; any part of the program can check the value at any time.



Too often, this illusion of security leads to blatant abuse of the newtype. For example, here's a definition from a codebase I personally work with:



newtype ArgumentName = ArgumentName { unArgumentName :: GraphQL.Name }
  deriving ( Show, Eq, FromJSON, ToJSON, FromJSONKey, ToJSONKey
           , Hashable, ToTxt, Lift, Generic, NFData, Cacheable )
      
      





In this case, newtype is a pointless step. Functionally, it is completely interchangeable with the Name type, so much so that it produces a dozen type classes! Wherever newtype is used, it is immediately expanded as soon as it is retrieved from the closing record. So there is no benefit to type safety in this case. Moreover, it is not clear why to designate newtype as ArgumentName, if the field name already clarifies its role.



It seems to me that this use of newtypes arises from the desire to use the type system as a way of taxonomy (classification) of the world. Argument name is a more specific concept than generic name, so of course it must have its own type. This statement makes sense, but rather it is wrong: taxonomy is useful for documenting an area of ​​interest, but not necessarily useful for modeling it. When programming, we use types for different purposes:



  • Primarily, types highlight the functional differences between values. A value of type NonEmpty a is functionally different from a value of type [a] because it is fundamentally different in structure and allows additional operations. In this sense, types are structural; they describe what values ​​are inside the programming language.
  • -, , . Distance Duration, - , , .


Note that both of these goals are pragmatic; they understand the type system as a tool. This is a pretty natural attitude, since the static type system is literally a tool. Nevertheless, this point of view seems unusual to us, even though the use of types to classify the world usually creates useless noise like ArgumentName.



It is probably not very practical when the newtype is completely transparent and wrapped and deployed back into it as desired. In this particular case, I would completely rule out the distinction and use Name, but in situations where different labels are clear, you can always use the alias type:



type ArgumentName = GraphQL.Name
      
      





These newtypes are real shells. Skipping multiple steps is not type safe. Trust me, developers will happily jump over without a second thought.



Conclusion and recommended reading



I have long wanted to write an article on this topic. This is probably a very unusual tip about newtypes in Haskell. I decided to tell it in this way, because I myself earn my living with Haskell and constantly face similar problems in practice. In fact, the main idea is much deeper.



Newtypes is one of the mechanisms for defining wrapper types. This concept exists in almost every language, even those that use dynamic typing. If you don't write Haskell, much of this article is likely to apply to the language of your choice. You could say that this is a continuation of one idea that I have tried to convey in different ways over the past year: type systems are tools. We need to be more conscious and focused about what types actually provide and how to use them effectively.



The reason for writing this article was the recently published article Tagged is not a Newtype... This is a great post and I totally share the main idea. But I thought the author missed the opportunity to voice a more serious thought. In fact, Tagged is a newtype by definition, so the article title is leading us on the wrong track. The real problem goes a little deeper.



Newtypes are useful when applied carefully, but security is not their default property. We do not believe that the plastic from which the traffic cone is made provides road safety by itself. It's important to put the cone in the right context! Without the same clause, newtypes is just a label, a way to give a name.



And the name is not type-safe!



All Articles