-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
1849 lines (1583 loc) · 80.4 KB
/
index.html
File metadata and controls
1849 lines (1583 loc) · 80.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en-US" prefix="og: http://opg.me/ns#">
<head>
<meta charset="UTF-8" />
<meta name="title" property="og:title" content="Apex" />
<meta
name="description"
property="og:description"
content="Apex is an API proxy for microservices that provides one place to log and control all service-to-service traffic"
/>
<meta name="type" property="og:type" content="website" />
<meta name="url" property="og:url" content="https://apex-api-proxy.github.io/" />
<meta name="image" property="og:image" content="images/logos/apex-logo-linkedin-feature.png" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="author" content="Derick Gross, Kelvin Wong" />
<title>Apex API Proxy</title>
<link
rel="apple-touch-icon"
sizes="180x180"
href="images/icons/favicons/apple-touch-icon.png"
/>
<link
rel="icon"
type="image/png"
sizes="32x32"
href="images/icons/favicons/favicon-32x32.png"
/>
<link
rel="icon"
type="image/png"
sizes="16x16"
href="images/icons/favicons/favicon-16x16.png"
/>
<link rel="manifest" href="/images/icons/favicons/site.webmanifest" />
<link rel="mask-icon" href="/images/icons/favicons/safari-pinned-tab.svg" color="#5bbad5" />
<link rel="shortcut icon" href="/images/icons/favicons/favicon.ico" />
<meta name="msapplication-TileColor" content="#da532c" />
<meta name="msapplication-config" content="/images/icons/favicons/browserconfig.xml" />
<meta name="theme-color" content="#ffffff" />
<!-- <style>reset</style> -->
<link rel="stylesheet" href="stylesheets/reset.css" />
<link
rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.13.1/styles/gruvbox-dark.min.css"
charset="utf-8"
/>
<!-- <style></style> -->
<link rel="stylesheet" href="stylesheets/main.css" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.13.1/highlight.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<!-- <script></script> -->
<script src="javascripts/application.js"></script>
<style></style>
</head>
<body>
<div class="logo-links">
<p id="apex-logo">MENU</p>
<!-- <a href="https://github.com/apex-api-proxy" target="_blank">
<img src="images/logos/apex-logo.png" alt="Apex logo" id="apex-logo" />
</a> -->
<a href="https://github.com/apex-api-proxy/apex" target="_blank">
<img src="images/logos/github_black.png" alt="github logo" id="github-logo" />
</a>
</div>
<a id="toTop-link" href="#" target="_blank">
<img src="images/logos/back-to-top.png" alt="Back to top" id="toTop-logo" />
</a>
<nav id="site-navigation">
<ul>
<li>
<a href="#home" id="home-link">HOME</a>
</li>
<li>
<a href="#case-study" id="case-study-link">CASE STUDY</a>
<nav id="case-study-mobile">
<ul></ul>
</nav>
</li>
<li>
<a href="#our-team" id="our-team-link">OUR TEAM</a>
</li>
</ul>
</nav>
<header id="home">
<h1>
<img src="images/logos/apex-logo.png" alt="Apex logo" />
<p>API proxy for logging and controlling traffic between microservices</p>
</h1>
</header>
<section class="integration">
<div class="box">
<img
id="banner-deploy"
src="images/diagrams//gifs/apex_query_logs_2.gif"
alt="best practices"
/>
</div>
<article class="box">
<div class="text-box">
<h1>Centralized logging and tracing for your microservices</h1>
<p>
No additional client libraries needed in your service code
</p>
<!-- <a class="button" href="#case-study">Learn More</a> -->
</div>
</article>
</section>
<section class="integration">
<article class="box">
<div class="text-box">
<h1>Manage all fault-handling logic e.g. retries in one place</h1>
<p>
Define both global traffic rules and custom rules for individual services
</p>
<!-- <a class="button" href="#case-study">Learn More</a> -->
</div>
</article>
<div class="box">
<img
id="banner-deploy"
src="images/diagrams/5.4) orders-shipping custom configuration_2.png"
class="softened"
alt="Deploy Apex proxy with a few commands"
/>
</div>
</section>
<section class="integration">
<div class="box">
<img
id="banner-deploy"
src="images/diagrams/gifs/apex_deploy_2.gif"
class="softened"
alt="Deploy Apex proxy with a few commands"
/>
</div>
<article class="box">
<div class="text-box">
<h1>Deploy to Docker containers with a few commands</h1>
<p>
No changes required in how your services are currently deployed
</p>
<!-- <a class="button" href="#case-study">Learn More</a> -->
</div>
</article>
</section>
<main>
<section id="case-study">
<h1>Case Study</h1>
<!-- <p class="subheader">
One place to log and control service-to-service traffic
</p> -->
<div id="side-nav">
<a href="#home">
<img src="images/logos/apex-logo.png" alt="Apex logo" />
</a>
</div>
<nav>
<ul></ul>
</nav>
<h2 id="introduction">1) Introduction</h2>
<p>
Apex is an API proxy for microservices. It provides one place to log and control
service-to-service traffic.
</p>
<p>
Apex is designed for small teams that have just begun migrating from a monolith to a
microservices architecture. While microservices bring many benefits such as faster
deployment cycles, they also bring a host of new challenges by relying on the network for
service-to-service communication. As network communication is unreliable and has latency,
faults become more likely to occur, leading teams to have to spend more time diagnosing
network-related faults, and writing pre-emptive fault-handling logic within each service.
[<a href="#footnote-1">1</a>]
</p>
<p>
Some current solutions exist to help teams perform these tasks faster. Client libraries
can be imported into each service’s code to automate networking concerns, an API gateway
can be inserted in front of all services to handle incoming traffic, and for large
systems, a service mesh is often deployed to abstract away networking concerns from
services altogether. These are all valid solutions with their own set of trade-offs.
</p>
<p>
For a small team running their first few microservices, however, none of the existing
solutions provide the right set of trade-offs: optimized for service-to-service traffic,
and ease of deployment and operation, over high availability and scalability. These are
the trade-offs that underpinned Apex’s design.
</p>
<p>
With Apex, a user can view the logs for all service-to-service traffic by querying just
one table, while grouping all requests and responses that belong to the same workflow.
They can also define and update traffic rules such as the number of times to retry a
request in one configuration store.
</p>
<h2 id="microservices">2) Microservices</h2>
<p>
To understand how Apex makes it easier to work with microservices, it is important to
first understand what microservices are. This, in turn, requires understanding that the
microservices architecture is a choice, the other choice being, of course, a monolith.
</p>
<h3 id="monolithic-architecture">2.1) Monolithic architecture</h3>
<p>
In a monolithic architecture, there is typically just one application server (the
‘monolith’) that holds all the business logic. In some cases, this application server
alone is already sufficient to serve an application to a user (e.g. a website with just
static HTML). More likely though, the application will also generate some user data that
must be persisted, and so the monolithic application server will also transfer data to and
from a database server.
</p>
<div class="img-wrapper">
<img src="images/diagrams/2.1) Monolith.png" alt="Monolith" />
</div>
<p>
Consider the above example of a monolithic system that serves an e-commerce store to
users. The business logic in the app server can be organized into classes or modules, or
more generally, ‘subsystems’, that encapsulate related functionality e.g. manipulating
customer data, checking and updating inventory, creating shipments. These subsystems can
each expose an interface of methods, or more generally ‘behaviors’, that can be invoked by
each other to facilitate communication between them.
</p>
<p>
As method or function calls take place within the same running process in memory, they are
reliable and very fast, with latency often measured in microseconds [<a href="#footnote-2"
>2</a
>].
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.1) In-process method calls.png"
alt="In-process method calls are reliable and fast"
/>
</div>
<p>
Another possible monolithic architecture is to further decouple the data store for each
subsystem, by separating it into multiple database servers. For example, the
<code>customers</code> subsystem and the <code>orders</code>
subsystem can be connected to separate database servers, if that is deemed to be e.g. more
flexible or scalable for a particular need.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.1) Monolith with multiple databases.png"
alt="Monolith with multiple databases"
/>
</div>
<p>
A simple analogy for a monolithic application is a small business run by just one owner.
The owner has to do everything - sales and marketing, procurement, operations, finance,
IT. There may be one central log book that keeps track of all business data, or the owner
could use several ‘persistent data stores’ in parallel e.g. CRM system for sales data,
accounting software for financial data, ERP system for inventory data, pen and paper for
tax filings.
</p>
<h3 id="microservices-architecture">2.2) Microservices architecture</h3>
<p>
The microservices architecture differs from the monolith in two major ways. First,
subsystems are decoupled even further. Each subsystem is deployed independently to its own
app server as a standalone ‘application’, or ‘service’, and the current best practice is
for every service to have its own database [<a href="#footnote-3">3</a>].
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.2) Microservices with multiple databases.png"
alt="Monolith architecture vs microservices architecture"
/>
</div>
<p>
Secondly, subsystems now communicate over the network via HTTP requests, rather than
through in-process method invocations. So for example, if our <code>orders</code> service
needs to create a new shipment, it might do this by sending a <code>POST</code> request to
the <code>/shipments</code> endpoint of the <code>shipping</code>
service, and attaching any other relevant information in the request body.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.2) Microservices communicating over the network.png"
alt="Microservices communicate over the network"
/>
</div>
<p>
Going with the same analogy of a small business, the microservices architecture is
comparable to a small team of several members (or ‘services’) who each specialize in one
function. For example, these could include a salesperson, a marketer, an operations
manager, accountant/bookkeeper and an IT manager. Now, function-to-function communication
no longer happens in the owner’s head (or ‘in-process’); instead, different team members
must communicate with each other in person, on the phone or by email (or ‘over the
network’) to get things done.
</p>
<p>
As we shall see, the use of the network for communication between subsystems is the key
enabler for many of the benefits of the microservices architecture, but also the main
culprit behind many of its drawbacks.
</p>
<h3 id="microservices-benefits">2.3) Microservices benefits</h3>
<p>
A first benefit of microservices is a wider choice of technologies for service developers.
[<a href="#footnote-4">4</a>] The network boundaries between services free them from
having to use the same technology stack. As long as each service maintains a stable
network interface, or API, for other services talking to it, it is free to choose the
language or framework that is most optimal for implementing its business logic.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.3) Microservices with different stacks.png"
alt="Microservices can use different technologies"
/>
</div>
<p>
Arguably the most defining benefit, though, is the option to deploy subsystems
independently of each other. [<a href="#footnote-5">5</a>] With subsystems now deployed to
independent services that each have a smaller scope, redeploying any one subsystem incurs
less overhead and so it becomes practical to redeploy each service more frequently. This
enables teams to ship new features faster and reap the corresponding business benefits
sooner.
</p>
<p>
More concretely, in our e-commerce example app, as soon as a feature in the
<code>orders</code> service is ready, <code>orders</code> can be redeployed. As long as
<code>orders</code>’s API remains the same before and after the deployment, other services
need not even know that a redeployment took place. On the other hand, if
<code>notifications</code>’s logic rarely changes, then that service can simply continue
to operate untouched.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.3) Redeploying one microservice.png"
alt="Each microservice can be independently redeployed"
/>
</div>
<p>
Independent redeployment also enables independent scaling. [<a href="#footnote-6">6</a>]
If our <code>orders</code> service is the first to reach its capacity, then we can simply
upgrade <code>orders</code> to a more powerful server or deploy more replicas of
<code>orders</code>, without having to also replicate every other service. Yet again, as
long as the replicated <code>orders</code> service retains the same API before and after
scaling, the other services can continue to operate in the same way as though nothing
happened. The result is fewer large-scale system-wide redeployments and higher utilisation
of provisioned resources, leading to savings in engineering time and costs.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/2.3) Scaling microservices.png"
alt="Each microservice can be independently scaled"
/>
</div>
<h2 id="microservices-challenges">3) Microservices challenges</h2>
<p>
We have now seen how the network boundaries between microservices result in several major
benefits over the monolithic architecture. The network, however, comes with baggage, and
relying heavily on it to communicate between subsystems introduces an entire new dimension
of challenges.
</p>
<h3 id="network-unreliability-and-latency">3.1) Network unreliability and latency</h3>
<p>
Recall that in a monolith, subsystems are simply classes or modules that communicate
through method invocations within the same process in memory. In contrast, in a
microservices architecture, equivalent calls are now sent between services using HTTP
requests and responses over the network.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/3.1) In-process method calls vs network hops.png"
alt="Method calls are fast, but network hops are (relatively) slow"
/>
</div>
<p>
As any sufficiently heavy user of the internet will have experienced, the network is
unreliable and has latency. That is, networks can disconnect for any number of reasons,
and network traffic can sometimes take a long time to reach its destination. Even though
in production, services are likely deployed to state-of-the-art data centers run by large
cloud providers, network faults still can and do occur.
</p>
<div class="img-wrapper">
<img src="images/diagrams/3.1) Network faults.png" alt="Network faults" />
</div>
<p>
Such faults introduce a whole new class of problems for developers - not only do they have
to ensure their service code is bug-free, now they also have to diagnose unexpected
network faults, and add logic to service code that preempts network faults by providing
compensating behaviors (e.g. displaying a ‘network down’ page to users, or retrying the
same request a few seconds later).
</p>
<h3 id="diagnosing-network-faults">3.2) Diagnosing network faults</h3>
<p>
Diagnosing a network fault can be especially cumbersome when a single workflow passes
through multiple services. Consider a user placing an order on our e-commerce example app,
and suppose the <code>orders</code> service needs to first update inventory in
<code>inventory</code>, then create a shipment in <code>shipping</code>. This one workflow
involves 3 services with at least 3 network hops between them. If the order placement
eventually fails, what caused that to happen?
</p>
<div class="img-wrapper">
<img
src="images/diagrams/3.2) Network fault in multi-service workflow.png"
alt="Network fault in multi-service workflow"
/>
</div>
<p>
To find out, a developer would have to trace the user’s initial <code>POST</code> request
through the entire system. Since each service generates its own logs, the developer would
have to first access <code>orders</code>’s logs, track down the request that failed,
follow the request to the next service (in our case, the <code>inventory</code> service),
access <code>inventory</code>’s logs, and so on, until they pinpoint the exact request
that failed. This can be a laborious and slow process.
</p>
<h3 id="managing-fault-handling-logic">3.3) Managing fault-handling logic</h3>
<p>
Other times, a network fault may be totally random, and a request should simply be retried
again. But how long should the requesting service wait before retrying again? How many
times should it retry before giving up? If too soon or too many, then all the retries
could overwhelm the responding service. Such logic must be defined thoughtfully.
</p>
<div class="img-wrapper">
<img src="images/diagrams/3.3) Retry logic.png" alt="Retrying failed requests" />
</div>
<p>
The next question becomes: where should all this logic be defined? For some teams, the
first answer to this question is in HTTP client libraries that are imported into each
service’s code. [<a href="#footnote-7">7</a>] So if the <code>orders</code> service is
written in Ruby, then it would <code>require</code> a gem that provides a configurable
client for making HTTP requests to other services. Another service written in Node might
<code>import</code> a similar package into its code.
</p>
<p>
Often, these libraries can also handle logging, as well as other networking and
infrastructure concerns, such as caching, rate-limiting, authentication etc.
</p>
<p>
Teams with more resources may go further, by having each service’s owner write a client
library for every other service that calls it. This is already common practice when
working with popular external APIs; for example, Stripe provides dozens of official and
third-party libraries in different languages that abstract away the logic for calling its
APIs. [<a href="#footnote-8">8</a>] Similarly, in a large team, each service’s owner may
be tasked with writing a new client library for every requesting service that uses a
different language.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/3.3) New client library for every language.png"
alt="shipping's owner writes a new shipping_client for every language"
/>
</div>
<p>
Needless to say, this solution becomes less and less manageable as the number of services
grows. Every time a new service is built in a new language, every other service owner must
write a new client library in that language. More critically, updating fault-handling
logic now incurs a great deal of repetitive work. Suppose the CTO wishes to update the
global defaults for the retry logic; developers would now have to update the code in
multiple client libraries in every service, then carefully coordinate each service’s
redeployment. The greater the number of services, the slower this process becomes. [<a
href="#footnote-9"
>9</a
>]
</p>
<div class="img-wrapper">
<img
src="images/diagrams/3.3) Microservices with client libraries.png"
alt="As more services are added, client libraries can get out of hand"
/>
</div>
<h2 id="existing-solutions">4) Existing solutions</h2>
<p>
With microservices becoming increasingly popular, a number of solutions have emerged to
help teams overcome these challenges. Here we explain how two of these solutions - the API
gateway and the service mesh - compare with each other.
</p>
<p>
Both of these solutions in fact share the same building block - a proxy server.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4) Proxy server as building block.png"
alt="The proxy server is a building block"
/>
</div>
<h3 id="proxy-server">4.0) Proxy server</h3>
<p>
A proxy is simply a server that sits on the path of network traffic between two
communicating machines, and intercepts all their requests and responses. These machines
could represent a client sending a request to another server, or for our purposes, two
internal services communicating within the same architecture.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.0) Proxy server.png"
alt="Proxy server can intercept and forward HTTP requests and responses"
/>
</div>
<p>
In the above diagram, <code>orders</code> does not send an HTTP request directly to
<code>shipping</code>; instead, it addresses its request to a host belonging to
<code>proxy</code> (i.e. <code>proxy.com</code>). In order for <code>proxy</code> to know
that <code>orders</code> actually wants to send its request to <code>shipping</code>,
<code>orders</code> must specify <code>shipping</code>’s host (i.e.
<code>shipping.com</code>) in another part of the request e.g. in the
<code>Host</code> header value.
</p>
<p>
When <code>proxy</code> receives a response back from <code>shipping</code>, it simply
forwards the same response back to <code>orders</code>.
</p>
<h3 id="api-gateway">4.1) API gateway</h3>
<h4>4.1.1) API gateway features</h4>
<p>
At its core, an API gateway is simply a proxy server (more precisely, a ‘reverse proxy’
[<a href="#footnote-10">10</a>]). When used with microservices, one of its primary
functions is to provide a stable API to clients and route client requests to the
appropriate service. [<a href="#footnote-11">11</a>]
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.1) API gateway.png"
alt="An API gateway proxies all incoming requests into a system"
/>
</div>
<p>
It is certainly possible to deploy microservices without an API gateway. In such an
architecture, whenever the client sends a request, it must already know which service to
send the request to, and also the host and port of that service. This tightly couples the
client with internal services, such that any newly added services, or updates to existing
service APIs, must be deployed at the same time as updates in the client code. Such an
architecture can be difficult to manage, as clients cannot always be relied upon to update
immediately (e.g. mobile apps cannot be easily forced to update); even if they can, doing
so would still incur additional engineering that could be avoided.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.1) Microservices without API gateway.png"
alt="Without an API gateway, a client must know the host port and path of every service it needs to call"
/>
</div>
<p>
With an API gateway, developers are largely free to update internal services while still
providing a stable API to clients.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.1) API gateway's stable API.png"
alt="An API gateway provides a stable API for clients, even if services are upgraded, replicated, or removed internally"
/>
</div>
<p>
In addition to routing requests, the API gateway also provides one place to handle many
concerns that are shared between services, such as authentication, caching, rate-limiting,
load-balancing and monitoring.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.1) API gateway's features.png"
alt="An API gateway also provides one place to manage other networking concerns"
/>
</div>
<p>
In a way, an API gateway can be thought of as a receptionist at a large company. Any
visitor does not necessarily have to know which employees are present in advance, or how
different teams work together to complete specific tasks. Instead, they simply speak with
the receptionist, who then decides, based on the visitor’s identity and stated purpose,
which company employee to notify, and/or what access to grant to the visitor.
</p>
<h4>4.1.2) API gateway for service-to-service traffic?</h4>
<p>
Let us revisit the challenges that were described back in Section 3: 1) diagnosing faults
in workflows that span multiple microservices, and 2) managing fault-handling logic that
is similar across services.
</p>
<p>
If the API gateway already provides one place to manage networking concerns, perhaps it is
already a sufficient solution to these challenges? For example, instead of deploying it as
a ‘front proxy’ that sits in front of all services, we could deploy it in a different
pattern than it was intended for - as a proxy that sits between services internally. Would
this not already provide the one place to log all service-to-service requests and
responses, and define fault-handling logic like retries?
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.2) Deploying API gateway internally.png"
alt="Could we just deploy an API gateway internally between services?"
/>
</div>
<p>
In theory, this is certainly possible, but in practice, existing API gateway solutions are
not ideal options for this.
</p>
<blockquote>
<p>
Optimized to handle [client-server] traffic at the edge of the data center, the API
gateway ... is inefficient for the large volume of [service-to-service] traffic in
distributed microservices environments: it has a large footprint (to maximize the
solution’s appeal to customers with a wide range of use cases), can’t be containerized,
and the constant communication with the database and a configuration server adds
latency.
</p>
</blockquote>
<p>
<cite>
- NGINX, maker of the popular open-source NGINX load balancer and web server [<a
href="#footnote-12"
>12</a
>]
</cite>
</p>
<p>
In short, although the API gateway looks close to the solution we need, existing solutions
on the market come built-in with many extra features that are designed for client-server
traffic, making them a poor fit for managing service-to-service traffic.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.1.2) API gateway's extra features for client-server traffic.png"
alt="API gateway's features for client-server traffic add unnecessary complexity when using it for service-to-service traffic"
/>
</div>
<p>
That is not to say a solution like an API gateway is completely out of the question. As we
shall see in Section 5, the API gateway pattern was a major source of inspiration for
Apex’s solution.
</p>
<h3 id="service-mesh">4.2) Service mesh</h3>
<p>
The service mesh is another existing solution to the challenges with microservices that
were outlined in Section 3. As mentioned previously, it also builds upon the proxy server.
</p>
<h4>4.2.1) Sidecar proxies</h4>
<p>
The service mesh is a highly complex solution, and we once again approach it through the
analogy of a company. Consider a large team of people (analogous to services) who all
communicate directly with each other.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.1) Everybody talks to everybody.png"
alt="Communication in a large team (1): everybody talks to everybody"
/>
</div>
<p>
As the team size grows, team members will likely find themselves spending more and more
time handling these scenarios:
</p>
<ul>
<li>
<b>[Retry logic]</b> At any given time a team member may be off sick, so any other
person who wishes to talk to them must retry again later.
</li>
<li>
<b>[Rate-limiting]</b> A team member may be temporarily working reduced hours, and can
only handle a limited number of incoming messages.
</li>
<li>
<b>[Caching]</b> A team member may be asked for the same piece of information multiple
times by other team members.
</li>
<li>
<b>[Encryption]</b> Each team member is required to only use secure communication
channels provided by the company.
</li>
<li>
<b>[Authorization]</b> Some team members may be allowed to access confidential financial
information, while others may not.
</li>
<li>
<b>[Routing]</b> Sometimes a team member may need a particular piece of information, but
does not know who has it, and so has to try several different people before obtaining
it.
</li>
<li>
<b>[Logging]</b> The company may wish to pull all messages from every team member’s
inbox, to create a centralized record for auditing purposes.
</li>
</ul>
<p>
Managing these communication-related issues would take away time and focus from each team
member’s core responsibilities.
</p>
<p>
In this example, adding a service mesh is analogous to giving every team member a personal
assistant (PA), who intercepts all incoming and outgoing messages and handles all the
above tasks. This team structure would free team members from having to handle
communication-related tasks, and allow them to focus more on their core responsibilities.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.1) Team members with their own PA.png"
alt="Communication in a large team (2): Every team member talks through their own personal assistant (PA)"
/>
</div>
<p>
In an actual service mesh, the PA would instead be a proxy server, known as a ‘sidecar
proxy’. Each service is deployed alongside its own sidecar proxy, which intercepts all
requests and responses to and from its parent service, and handles all the networking and
infrastructure concerns we listed above, such as retry logic, rate-limiting etc. As a
result, each service’s code can focus on its main business logic, while outsourcing
networking and infrastructure concerns to the service’s sidecar proxy. [<a
href="#footnote-13"
>13</a
>]
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.1) Services and sidecar proxies.png"
alt="In a service mesh, services talk through their own 'sidecar proxies'"
/>
</div>
<h4>4.2.2) Configuration server</h4>
<p>
In addition to the sidecar proxies, the service mesh has one other important component - a
central configuration server.
</p>
<p>
Back in our hypothetical company, a configuration server is akin to a centralized folder
containing data on team members and company policies e.g. who is on leave, who is working
reduced hours, which secure channels to use, who has access to what information. Each
personal assistant (PA) would have their own copy of this information to help them handle
communication quickly, but whenever anything is updated in the centralized folder e.g. by
the COO or HR Director, the changes are immediately sent to each PA, so that PAs always
have the most up-to-date information in their own copies.
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.2) Centralized folder with company policies and team info.png"
alt="One centralized folder containing personnel info, with updates automatically copied to each PA's copy of the folder"
/>
</div>
<p>
In the same way, the configuration server in a service mesh provides one place to update
network traffic rules, such as logic for retries, caching, encryption, rate-limiting,
routing. The configuration server is the source of truth for this information, but each
sidecar proxy also has a cached copy of the information. Whenever the configuration server
gets updated, it propagates the changes to each sidecar proxy, which then applies the
changes to its own cached copy. [<a href="#footnote-14">14</a>]
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.2) Services and configuration server.png"
alt="One configuration server containing all routes, retry logic, etc., with updates automatically pushed to each sidecar proxy's cached copy"
/>
</div>
<h4>4.2.3) Service mesh trade-offs</h4>
<p>
Again, let us revisit the challenges that were described back in Section 3: 1) diagnosing
faults in workflows that span multiple microservices, and 2) managing fault-handling logic
that is similar across services.
</p>
<p>
The service mesh provides a robust solution to these challenges. The configuration server
provides one place to define and update fault-handling logic; each sidecar proxy can be
responsible for generating logs and sending them to one place to be stored, and also for
executing fault-handling logic. Moreover, without any single point of failure or one
single bottleneck, the architecture is resilient and highly scalable. [<a
href="#footnote-15"
>15</a
>]
</p>
<div class="img-wrapper">
<img
src="images/diagrams/4.2.3) Service mesh as solution to microservices challenges.png"
alt="Service mesh as solution to microservices challenges"
/>
</div>
<p>
However, as with so many tools, rich functionality begets complexity. Implementing a full
service mesh more than doubles the number of components in the architecture that must now
be deployed and operated. In addition, both the sidecar proxy and its parent service are
usually containerized to run alongside each other in the same virtual server. [<a
href="#footnote-16"
>16</a
>] If any existing service is currently deployed without a container, then developers must
now containerize it and redeploy it. More domain expertise must be acquired, and
significant engineering effort expended.
</p>
<h3 id="summary">4.3) Summary</h3>
<p>
As we have seen, solutions certainly exist to handle the challenges we described with
microservices. Each existing solution embodies a different set of trade-offs.
</p>
<ul>
<li>
<b>API gateways’</b> features are designed for client-server, not service-to-service,
traffic.
</li>
<li>
<b>Service meshes</b> check all the boxes, but require teams to acquire more expertise,
operate double the number of components, and redeploy existing services in a different
pattern.
</li>
</ul>
<div class="img-wrapper">
<img
src="images/diagrams/4.3) API gateway and service mesh trade-offs.png"
alt="API gateway and service mesh trade-offs"
/>
</div>
<h2 id="design-architecture">5) Design & architecture</h2>
<h3 id="apex-trade-offs">5.1) Apex trade-offs</h3>
<p>
For some teams, neither an API gateway nor a service mesh provide the right set of
trade-offs. Consider a small team that are just beginning to migrate their monolith to
include a few microservices. For ease of deployment, most of the services have been
deployed to Heroku, or another platform as a service (PaaS) solution.
</p>
<p>
It is likely that this team will have already experienced the challenges we mentioned back
in Section 3: 1) diagnosing faults in workflows that span multiple microservices, and 2)
managing fault-handling logic that is similar across services.
</p>
<p>
For this team, a solution with the following trade-offs are needed:
</p>
<ul>
<li>Optimized for handling service-to-service traffic</li>
<li>One place to aggregate logs and manage traffic rules</li>
<li>Simple to deploy and operate</li>
<li>Fewer built-in features are acceptable</li>
<li>
Does not require changes to deployment pattern for existing services (e.g. does not
require existing services to be containerized), since this may be difficult or not
possible at all in PaaS solutions
</li>
<li>Lower availability is acceptable</li>
<li>Lower scalability is acceptable</li>
</ul>
<div class="img-wrapper">
<img src="images/diagrams/5.1) Apex trade-offs.png" alt="Apex trade-offs" />
</div>
<p>
These are precisely the trade-offs we chose when building Apex.
</p>
<h3 id="proxy-server-with-middleware-layers">5.2) Proxy server with middleware layers</h3>
<p>
Apex’s architecture includes 5 components:
</p>
<div class="img-wrapper">
<img src="images/diagrams/5.2) Apex architecture.png" alt="Apex architecture" />
</div>
<ol>
<li>Proxy server</li>
<li>Logs database</li>
<li>Configuration store</li>
<li>Admin API</li>
<li>Admin UI</li>