Plus de raison de ne pas l'essayer !

25 octobre 2019

Disclaimer

I may be a GraphQL evangelist

collective intelligence driven civic start-up that develops participatory applications.​

  • 25 people 🏃
  • 9 devs 👩‍💻

 Consultation

republique-numerique.fr

🇫🇷 Le Grand Débat National

granddebat.fr

 + 1,9 M

Contributions

 

+ 10 000

Local events

 

+ 2,7 M

Visitors

Allows clients to ask what they want.
 

Easily aggregate data from multiple sources.

 

Uses a type system to describe the data.

 

A GraphQL request is called a query and looks like this:

query {
  users {
    name
    votes {
      id
      value
    }
  }
}

➡️

👏 The way we ask for the data is the way we receive it.

{
    "data": {
         "users": [
           {
                "name": "John Doe",
                "votes": [
                    {
                        "id": "1",
                        "value": 1
                    },
                    {
                        "id": "2",
                        "value": 0
                    }
                ]
            },
            {
                "name": "Foo Bar",
                "votes": []
            }
        ]
    }
}

A query support arguments. For example, if you want to display a specific article, you can specify an id argument to the post field:

query {
  user(id: "user1"){
    id
    name
    votes(first: 1) {
      value
    }
  }
}

✌️Adding arguments will allow us to refine our queries, filter, sort, page ...

{
  "data": {
    "user": {   
        "id": "user1",
        "name": "John Doe",
        "votes": [
            { "value": 1 }
        ]
    }
  }
}

➡️

Introducing REST fanboy

GraphQL is a Facebook thing why should I care…

Some GraphQL users

Some public GraphQL APIs

 

An open and neutral home for the GraphQL community

I already have a REST API…

REST and GraphQL are not enemies

Supporting multiple protocols for a single API is ideal from the consumer’s perspective.

GitHub is using their GraphQL (v4) API to back some parts of their REST (v3) API.

 

It's a very smart way of handling multiple protocols without the maintenance effort !

Dogfooding

eating your own dog food

Do you use you own API ?

Welcome to frontend developement

Current status of Frontend requirements

Realtime

 

Offline support

 

Local First Architecture

 

Optimistic Updates

{
  "users": [
   {
     "id": "1",
     "votes": [
       {
         "id": "1",
       }
     ]
    }
  ]
}
query {
  users {
    id
    votes {
      id
    }
  }
}

Client Caching

Global Unique Cache Key

Refetch identifier

Opaque to clients

public function resolveUserId(User $user): string
{
   return \base64_encode('User:' . $user->getId());
}

Global Object Identification

 Associate a unique id to each object in order to be able to identify and retrieve it again regardless of its type.

{
  "users": [
   {
     "id": "VXNlcjox",
     "votes": [
       {
         "id": "Vm90ZTox",
       }
     ]
    }
  ]
}
query {
  users {
    id
    votes {
      id
    }
  }
}

Relay Store

 All records are normalized in the store, this is why we need a unique and global ID for each record.

{
  "users": [
   {
     "id": "VXNlcjox",
     "name": "John Doe",
     "votes": [
       {
         "id": "Vm90ZTox",
         "value": 0
       }
     ]
    }
  ]
}
{
 "VXNlcjox": {
   "name": "John Doe",
   "votes": [
     { "__ref": "Vm90ZTox" }
   ]
 },
 "Vm90ZTox": {
   "value": 0
 }
}

GraphQL Response

Client Store

GraphQL Frontend DX

👍 Ask what they want.

 

💖 Schema facilitate communication with backend devs.

 

😍 Remove most of data fetching code.

 

🚆 Optimistic UI and cached data.

 

✅ Generate Flow/Typescript typings using the strongly-typed schema.

I don't want to rewrite all my code

it also integrates very well in front of micro services and REST APIs .

 

GraphQL fits very well on a monolith, it is even the case of most users.

 

Did Facebook rewrote all their code to use GraphQL ?

No. Because it's only a thin layer.

Business Logic

Storage Layer

GraphQL

Client

GraphQL Gateway

REST API

Service

Service

Service

GraphQL breaks caching

server side caching ✅

client side caching ✅

HTTP caching ❓

What kind of caching?

HTTP Caching

Verb Operation Cacheable
GET Read Yes
POST Write No

Cache-Control: max-age=<seconds>

GraphQL and HTTP

The same URL is called for different queries, producing different results.

 

GraphQL commonly use POST verb.

 

Cache duration depends on responses fields.

We have 2 problems : differentiating mutations vs queries, and caching POST responses.

Verb Operation Cacheable
POST Query Yes
POST Mutation No

GraphQL doesn't say much about transport, it's up to you, POST is not the only way to use GraphQL.

#1 Differentiating mutations and queries

X-HTTP-Method-Override: GET

Always use POST but send an extra header for every query

The Haker method

#1 Differentiating mutations and queries

The Regex method

You can use a regex to make sure mutations are always send to backend.

sub vcl_recv {

  #...

  # Always send GraphQL mutations to the backend.
  if (bodyaccess.rematch_req_body("mutation") == 1) {
      return (pass);
  }

  return (hash);
 
}

 #2 POST requests cannot be cached 

When everyone say it's impossible, there is one solution left…

 

Responses to POST requests are only cacheable when they include
explicit freshness information (see Section 4.2.1 of [RFC7234]).
However, POST caching is not widely implemented. For cases where an
origin server wishes the client to be able to cache the result of a
POST in a way that can be reused by a later GET, the origin server
MAY send a 200 (OK) response containing the result and a
Content-Location header field that has the same value as the POST's
effective request URI (Section 3.1.4.2).

POST requests cannot be cached

 The solution is to make the request body a part of the hash, and let the normal caching logic happen.

 

The result is that only clients who supply the same body will receive the the same reply.

The Fake News method

You can use a regex to make sure mutations are always send to backend.

# Called at the beginning of a request, after the complete request has been received and parsed.
sub vcl_recv {

  # Only cache POST GraphQL API requests.
  if (req.method == "POST" && req.url ~ "graphql$") {

        # Will store up to 500 kilobytes of request body.
        std.cache_req_body(500KB);
        set req.http.X-Body-Len = bodyaccess.len_req_body();

        # If a client supplies a very big request (more than 500KB)
        if (req.http.X-Body-Len == "-1") {
            return (pass);
        }

        # Always send GraphQL mutations to the backend.
        if (bodyaccess.rematch_req_body("mutation") == 1) {
            return (pass);
        }

        return (hash);
  }
  
  #...
}

# Change the hashing function to handle POST request
sub vcl_hash {
    # To cache POST requests
    if (req.http.X-Body-Len) {
        bodyaccess.hash_req_body();
    } else {
        hash_data("");
    }
}

✅ It's quick, simple and it works well.

We use this in production since 1 year. 🙃

But people who read RFCs will not be happy 😅

POST requests cannot be cached

The Fake News method

If using POST is not an option

The Query hash method

GET is a valid way to query a GraphQL server over HTTP. This means that we could indeed cache GraphQL responses.

The only issue with GET is with the size limit of the query string, depending on browsers.

If using POST is not an option

The Query hash method

{
   node(id: $id) {
      ... on User {
         name
         createdAt
      }
   }
}

4fde400d10bdc6ca010d199cfce6091e3537d6a3

hash

GET /graphql?query=123&id=user1

ℹ️ Require some frontend and server tooling : the server must know how to turn hashes into queries.

How do we know cache freshness ?

It depends on every fields used…

{
  contributions(first: 10) { # Can be cached for 60s
    id
    title
    votes {
      totalCount # Can be cached for 30s
    }
  }
  latestContribution { # Can be cached for 5s
    id
  }
}
type Contribution @cacheControl(maxAge: 240) {
  id: Int!
  title: String
  author: Author
  votesCount: Int @cacheControl(maxAge: 30)
}

type Query {
  latestContribution: Contribution @cacheControl(maxAge: 10)
}

A GraphQL directive describe the cache policy for each field.

The lowest value is used for an entire response.

A proposal from Apollo server

Is HTTP caching worth it ?

Have an authenticated only API ? ❌

 

Have data that changes often ? ❌

 

Highly customizable API ? 

Consider the tradeoffs.

GraphQL is slow

A resolver has no idea if this data has been loaded before, or if it will be loaded after, or if other resolvers will end up asking for the same data requirements.

# Query.users
public function resolveUsers(): array
{
  return $this->userRepository->findAll();
}
# User.name
public function resolveUserName(User $user): string
{
  return $user->getName();
}
# User.votes
public function resolveUserVotes(User $user): array
{
  return $this->votesRepository->findByUser($user);
}

N+1 problem

query {                      
  users {    # fetches users (1 query)
    name       
    votes {  # fetches votes for each user 
      id     # (N queries for N users)
      value
    }
  }
}            # Therefore = N+1 round trips

Data-fetching problem that occurs when you need to fetch related items.

How do I make GraphQL efficient ?

Batching : within a single query, allow you to run a batch operation instead of many small searches. 

SELECT * FROM votes WHERE id IN ( '1', '2', '3' )
# Instead of :
SELECT * FROM votes WHERE id = '1';
SELECT * FROM votes WHERE id = '2';
SELECT * FROM votes WHERE id = '3';

DataLoader, a data loading mechanism.

Nothing to do with GraphQL but pair well with it.

//  User.votes
public function resolveUserVotes(User $user): Promise
{
  return $this->userVotesDataLoader->load($user->getId());
}

#load takes as an argument the loading key for the data the caller is interested in and it returns a promise, which will eventually be fulfilled with the data the caller asked for.

This method is used within resolvers

class UserVotesDataLoader {
 
 public function load($key): Promise
 {
   // Adds the key to an eventual batch and returns a promise
 }
}
A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values.
class UserVotesDataLoader {
 
 // Receives all keys that where asked to be loaded
 public function all(array $keys): Promise
 {  
   // Resolve data using batching
   $votes = $this->repository->findVotesForUsersIds($keys);
   
   // Create an array of values (key => votes)
   $results = array_map(
     function ($key) { /* your logic */ },
     $keys
   );
   
   // Fullfulls resolver promises with the data they asked for.
   return $this->promiseAdapter->all($results);
 }
 
}

Lazy Execution : we take an asynchronous approach to resolvers. This means resolvers don’t always return a value anymore, they can return somewhat of an “incomplete result” (promise).

 

Are our database engineers happy ?

It's better but, what about duplicates…

Memoization Caching : within the same query, keep in memory the result of a job to avoid duplicating it.

Application Caching : we can also use a DataLoader key, to cache it's result between requests.

DataLoader enable caching by default.

GraphQL let anyone query for everything

Limiting query complexity and deps

Sending a heavy query can consume too many resources, for example: user ➡️ friends ➡️ friends ➡️ friends …

One way to prevent this is, to do a cost analysis before the execution and to set a limit.

# app/config/config.yml

overblog_graphql:
    security:
        query_max_complexity: 1000
        query_max_depth: 10

How do I implement Authentication ?

Authentication is independent of GraphQL.

class VoteResolver {

  // Single source of truth for fetching
  public function fetch(string $id): ?Vote {
    $vote = $this->repository->find($id); // Nullable
    if (!$vote) return null;

    return $vote;
  }
}

How do I implement Authorization ?

class VoteResolver {

  // Single source of truth for fetching
  public function fetch(string $id): ?Vote {
      $vote = $this->repository->find($id); // Nullable
      if (!$vote) return null;
      
      // Single source of truth for authorization
      $canSee = checkCanSee($vote);
      return $canSee ? $vote : null;
  }
}

function checkCanSee(Vote $vote): bool {
  return true;
}
class VoteResolver {

  // Single source of truth for fetching
  public function fetch(Viewer $viewer, string $id): ?Vote
  {
      $vote = $this->repository->find($id); // Nullable
      if (!$vote) return null;
      
      // Single source of truth for authorization
      $canSee = checkCanSee($viewer, $vote);
      return $canSee ? $vote : null;
  }
}

function checkCanSee(Viewer $viewer, Vote $vote): bool {
  return $vote->getAuthor()->getId() === $viewer->getId();
}

A Vote can only be seen by its creator

  1. You run authentication logic 🛡️
  2. You get a viewer, available for resolvers spread threw context
  3. Your resolvers must use single source of truth for fetching (from business logic layer) 💡

Authorization workflow

GraphQL is impossible to monitor

Monitoring

REST

GraphQL

Monitoring

For an internal API, it's simple, just give a name to your queries. Then use it as the name of the transaction.

💡 There is an ESLint rule for that.

query ProposalListViewPaginatedQuery {
    # your query
}

Dan Schafer

Inspirations for this talk

Marc-André Giroux

Thanks !

Any questions ?

We work hard to update our democracy (with GraphQL)… 👍

Aurélien David - @spyl94 - spyl.net - aurelien@cap-collectif.com

GRAPHQL, PLUS DE RAISON DE NE PAS L'ESSAYER !

By Aurélien David

GRAPHQL, PLUS DE RAISON DE NE PAS L'ESSAYER !

  • 261
Loading comments...

More from Aurélien David