STYLE.md

435 lines | UTF-8 | 13 KB

Blame

Code conventions and style guide

This is a short document describing the preferred coding style for this project. We've tried to cover the major areas of formatting and naming. When something isn't covered by this guide, err on the side of consistency with existing code.

Formatting

We use ormolu to format our code. The top-level Makefile has targets format and check-format that can be used for this. These targets will also format Nix files if nixpkgs-fmt is installed, but this is optional.

Line Length

We do not enforce a hard limit on line length, but we try to keep lines under 80 characters where possible to make it easier to tile files horizontally. It’s okay if long lines spill into the 80–100 character range if wrapping them would harm readability. In particular, we don’t want to discourage descriptive variable names, nor do we want to discourage introducing local variables, so consider 80 characters a target length rather than a limit.

One exception to the above is multiline comments, which should almost always be wrapped to 80 characters. Most editors support automatically reflowing blocks of text with a configurable line length; consider learning the relevant hotkey in your editor of choice.

Naming

Use camel case (e.g. functionName) when naming functions and upper camel case (e.g. DataType) when naming data types.

Prefer full words over abbreviations. For example, prefer the name ColumnType to ColTy and computeAggregateFunction to compAggFn. Exceptions: extremely common, idiomatic abbreviations, like using “err” to mean “error” or “ref” to mean “reference.” When in doubt about whether or not an abbreviation is well-known, prefer the full name.

Functions and variables

Avoid “Hungarian notation,” aka using short variable prefixes or suffixes based on the variable type. For example, do not prefix or suffix variables or functions with the letter “m” to indicate that they are or return Maybe values:

-- Bad
parseArgumentM :: Maybe UserRole -> Value -> Maybe Argument
parseArgumentM userRoleM value = case userRoleM of
  Just userRole -> ...
  Nothing -> ...

Instead, where possible, prefer names that convey why the value is wrapped in a Maybe, or, if Nothing is just an ordinary member of the value’s domain, don’t include any special indication in the name at all:

-- Good
parseOptionalArgument :: Maybe UserRole -> Value -> Maybe Argument
parseOptionalArgument userRole value = case userRole of
  Just knownUserRole -> ...
  Nothing -> ...

When possible, avoid the need to name intermediate results at all so that coming up with several different variants of the same name is unnecessary. For example, instead of writing

-- Bad
argumentM <- parseOptionalArgument userRole value
case argumentM of
  Just argument -> ...
  Nothing -> ...

prefer writing

-- Good
parseOptionalArgument userRole value >>= \case
  Just argument -> ...
  Nothing -> ...

which avoids the need for the extra name entirely. If such an approach isn’t practical or is hard to read, and a type-related prefix or suffix is absolutely necessary, prefer a prefix with the full name of the type over abbreviations:

-- Okay
maybeArgument <- parseOptionalArgument userRole value
doSomethingElse
case maybeArgument of
  Just argument -> ...
  Nothing -> ...

Modules

Use singular, not plural, module names, e.g. use Data.Map and Data.ByteString.Internal instead of Data.Maps and Data.ByteString.Internals.

Data types

Wrap primitives in newtype (or sum types)

It is usually a good idea to introduce a custom data type (sum types or newtype) instead of using a primitive type (like Int, String, Set Text, etc.). This prevents confusion and mistakes of using the wrong types in wrong places.

Prefer sum types to `Bool`

Avoid using Bool to represent two different states, especially if the set of possible states could potentially grow in the future. For example, instead of writing

-- bad
data LiveQuery = LiveQuery
  { ...
  , lqIsPaused :: !Bool
  , ...
  }

prefer a sum type that has meaning even without context:

-- good
data LiveQuery = LiveQuery
  { ...
  , lqState :: !LiveQueryState
  , ...
  }

data LiveQueryState
  = LQActive
  | LQPaused

Also, avoid defining boolean predicates that simply select distinguish between different constructors of a sum type. For example, avoid this:

-- bad
isPaused :: LiveQueryState -> Bool
isPaused LQPaused = True
isPaused _        = False

...

unless (isPaused queryState) do
  handleActiveQuery query

This pattern can easily bite you if you later decide to extend the sum type with a new constructor. For example, suppose LiveQueryState was extended with an LQStopped constructor. Our guard using isPaused would still compile, but it would return False for stopped queries, and we’d call handleActiveQuery even though the query is stopped!

Pattern-matching is much more robust to this kind of code evolution, and it isn’t very much more code:

-- good
case queryState of
  LQActive -> handleActiveQuery query
  LQPaused -> pure ()

Dealing with laziness

By default, use strict data types and lazy functions.

Data types

Constructor fields should be strict, unless there's an explicit reason to make them lazy. This avoids many common pitfalls caused by too much laziness and reduces the number of brain cycles the programmer has to spend thinking about evaluation order.

-- Good
data Point = Point
  { pointX :: !Double
  , pointY :: !Double
  }

-- Bad
data Point = Point
  { pointX :: Double
  , pointY :: Double
  }

Functions

Have function arguments be lazy unless you explicitly need them to be strict.

The most common case when you need strict function arguments is in recursion with an accumulator:

mysum :: [Int] -> Int
mysum = go 0
  where
    go !acc []    = acc
    go acc (x:xs) = go (acc + x) xs

(Though the above function could be written using foldl' instead, which would be preferable.)

Comments and documentation

Formatting

Use line comments (comments that start with --) for short comments (1–3 lines). For longer comments, use multiline comments (comments that begin with {- and end with -}).

Use Haddock syntax for documentation comments. Running Haddock should always complete successfully.

Module headers

Include a short purpose statement at the top of each module, usually about 1–3 sentences long. Try to describe how the contents of the module relate to each other (i.e. what criteria determine which bindings should go in this module?) and to other modules in the project (i.e. what high-level problem do the exports of this module solve?).

Examples:

-- | Types and functions for generating a GraphQL schema from information
-- about Postgres tables.
module Hasura.GraphQL.Context where

-- | Top-level HTTP routes and handlers for the server, provided as a WAI
-- application.
module Hasura.Server.App where

Certain modules may benefit from longer comments, which can serve as internal documentation for high-level architectural design choices that don’t make sense to document anywhere else. For example, the module comment for Hasura.RQL.DDL.Schema describes in a couple paragraphs the high-level concepts behind the schema cache and metadata catalog. As appropriate, link to that module documentation in other comments using the Haddock syntax of enclosing a module name in double quotes; for example:

-- | Types that represent the raw data stored in the catalog. See also the
-- module documentation for "Hasura.RQL.DDL.Schema".
module Hasura.RQL.Types.Catalog

Not every module needs documentation comment—it doesn’t make sense for small helper/utility modules, for example—but most modules should have one. A module that cannot be given a short, precise purpose statement may benefit from being broken up into multiple modules.

Functions and datatypes

Include documentation comments on function and datatype declarations to communicate additional context about their purpose or usage information, but avoid using comments as a replacement for precise names. For example, instead of writing

-- Bad

-- | Configuration options given on the command-line.
data Options = ...

prefer a more self-documenting name:

-- Good
data CommandLineOptions = ...

Likewise, avoid using comments to describe invariants of a function argument or datatype field instead of a more precise type:

-- Bad
newtype AuthenticationToken
  = AuthenticationToken
  { unAuthenticationToken :: Text
  -- ^ Invariant: contains Base64-encoded binary data.
  }

-- Good
import Data.Text.Conversions (Base64)
newtype AuthenticationToken
  = AuthenticationToken
  { unAuthenticationToken :: Base64 ByteString
  }

However, do use comments on functions and datatypes to clarify information that cannot easily be communicated via names or types:

-- Good
resolveTableReference
  :: OMap SchemaName TableName
  -- ^ All tables visible to the current user. Schemas with priority in the
  -- search path should come earlier.
  -> Maybe SchemaName -> TableName -> Maybe QualifiedTable
resolveTableReference = ...

Inline comments

Use inline comments within function bodies to provide additional context about why a function does something a particular way or to call special attention to an easily-overlooked subtlety. For example:

-- Good
allComparableTypes
  -- due to casting, we need to generate both geometry and geography operations
  -- even if just one of the two appears directly in the table schema
  | anyGeoTypes = geoTypes <> columnTypes
  | otherwise = columnTypes

Include links to external resources such as specifications or GitHub issues that provide context for decisions:

-- Good
isValidEnumName name =
  -- see https://graphql.github.io/graphql-spec/June2018/#EnumValue
  isValidName name && name `notElem` ["true", "false", "null"]

Avoid comments that state the obvious (i.e. just restate types or variable names), and prefer using informative variable or function names over comments where possible. For example:

-- Bad
runSelectQuery tables constraints cache shouldPrepare = do
  -- part 1: construct the query
  select <- buildSelect tables
  conditions <- traverse buildCondition constraints
  query <- buildQuery select conditions

  -- part 2: execute the query
  plan <- cacheLookup query (generateQueryPlan query) cache
  -- prepare if necessary
  prepared <- if shouldPrepare then prepareQueryPlan plan else pure plan
  runQueryPlan prepared

runSelectQuery tables constraints cache shouldPrepare =
  constructQuery >>= executeQuery
  where
    constructQuery = do
      select <- buildSelect tables
      conditions <- traverse buildCondition constraints
      query <- buildQuery select conditions

    executeQuery = do
      plan <- cacheLookup query (generateQueryPlan query) cache
      prepared <- if shouldPrepare then prepareQueryPlan plan else pure plan
      runQueryPlan prepared

Out-of-line `Note`s

For especially tricky details that deserve thorough explanation and may need to be referenced from multiple places, we emulate GHC’s Notes. A Note is an out-of-line comment written at the top-level of a module, written with a short title header:

{- Note [Checking metadata consistency in run_sql]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL queries executed by run_sql may change the Postgres schema in arbitrary
ways. We attempt to automatically update the metadata to reflect those changes
as much as possible---for example, if a table is renamed, we want to update the
metadata to track the table under its new name instead of its old one.

... -}

At any point where the comment is relevant, we add a short comment referring to the note:

-- see Note [Checking metadata consistency in run_sql]
containsDDLKeyword :: Text -> Bool
containsDDLKeyword = TDFA.match $$(quoteRegex ...)

A key advantage of notes is that they can be referenced from multiple places, so information does not necessarily need to be connected to any particular binding the way it must be for Haddock comments.

When updating a piece of code that includes a reference to a Note, take care to ensure the Note is updated as well if necessary! Inevitably, some will get stale and go out of sync with the code, but that’s okay—just fix them up when you find some information is outdated.

Misc

Point-free style

Avoid over-using point-free style. For example, this is hard to read:

-- Bad:
f = (g .) . h

Acknowledgement/Credits

Parts of this coding style guide have been adapted from:

Code conventions and style guide

Formatting

Line Length

Naming

Functions and variables

Modules

Data types

Wrap primitives in newtype (or sum types)

Prefer sum types to Bool

Dealing with laziness

Data types

Functions

Comments and documentation

Formatting

Module headers

Functions and datatypes

Inline comments

Out-of-line Notes

Misc

Point-free style

Acknowledgement/Credits

Prefer sum types to `Bool`

Out-of-line `Note`s