-
Notifications
You must be signed in to change notification settings - Fork 4
/
rpc.tex
286 lines (189 loc) · 27.9 KB
/
rpc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
\subsection{Relevant Reading}
\begin{itemize}
\item \textit{A Critique of the Remote Procedure Call Paradigm}, Tanenbaum and van Renesse, 1987~\cite{tanenbaum1987critique}.
\item \textit{A Note On Distributed Computing}, Kendall, Waldo, Wollrath, Wyant, 1994~\cite{kendall1994note}.
\item \textit{It's Just A Mapping Problem}, Vinoski, 2003~\cite{vinoski2003s}.
\item \textit{Convenience Over Correctness}, Vinoski, 2008~\cite{vinoski2008convenience}.
\end{itemize}
\subsection{Commentary}
\begin{quote}
``Does developer convenience really trump correctness, scalability,performance, separation of concerns, extensibility, and accidentalcomplexity?''~\cite{vinoski2008convenience}
\end{quote}
\subsubsection{Timeline}
\begin{itemize}
\item{1974:} RFC 674, \\ ``Procedure Call Protocol Documents, Version 2'' \\
RFC 674 attempts to define a general way to share resources across \textbf{all 70 nodes} of the Internet. This work is performed at Bolt, Beranek and Newman (BBN Technologies).\footnote{Fun fact: BBN's Internet division would later become Genuity and then Level3 out of bankruptcy and has autonomous system number 1 (ASN 1).}
\item{1975:} RFC 684, \\ ``A Commentary on Procedure Calling as a Network Protocol'' \\
First outlines the problems of RPC and how they related to fundamental problems in distributed systems.
\item{1976:} RFC 707, \\ ``A High-Level Framework for Network-Based Resource Sharing'' \\
An attempt to ``mechanize'' the TELNET and FTP protocols through a generalization to functions.
\item{1984:} ``Implementing Remote Procedure Calls''~\cite{birrell1984implementing} \\
In this work, the \textbf{Cedar RPC} mechanism is designed at Xerox PARC.
\item{1987:} Distribution and Abstract Types in Emerald~\cite{1702134} \\
One of the first major implementations of distributed objects containing distribution-specific calling conventions, such as \textit{call-by-move}.
\item{1987:} A Critique of the Remote Procedure Call Paradigm~\cite{tanenbaum1987critique} \\
Tanenbaum and van Renesse provide a criticism, similar to the one in RFC 684, on why the RPC model is the wrong model for distributed computing.
\item{1988:} Distributed Programing in Argus~\cite{liskov1988distributed} \\
In Argus, atomic units of computation referred to as ``guardians'' coordinate application of effects and rollback.
\item{1988:} RFC 1057, \\ ``Remote Procedure Call Protocol Specification, Version 2'' \\
Defines \textbf{sunrpc} as a standard along with its supporting infrastructure, such as portmapper, that's used in the creation of NFS.
\item{1991:} CORBA 1.0 \\
CORBA 1.0 introduces distributed objects.
\item{1994:} A Note On Distributed Computing~\cite{kendall1994note} \\
Kendall et al.\ also talk in great length about why the RPC model, extended to objects, is problematic.
\item{1996:} A Distributed Object Model for the Java System~\cite{wollrath1996distributed} \\
Introduces the Java RMI system.
\item{1997:} CORBA 2.0 \\
CORBA 2.0 represents the major release of CORBA that most people are familiar with.
\item{1999-:} EJB, XML-RPC, SOAP, REST, Thrift, Finagle, gRPC, etc. \\
Modern day RPC mechanisms.
\end{itemize}
\subsubsection{Overview}
Remote Procedure Call (RPC) is a general term for executing a subroutine in a different address space without writing the actual code used to perform the remote execution. To provide an example, we can imagine a user wishing to invoke the random number generator function on another machine, but, the only difference between the local and remote invocation is supplying an additional node identifier where it should occur. While not the first implementation, because it was preceded by Apollo Computer's Network Computing System (NCS), the first major implementation to be widely known and adopted was the SunRPC mechanism, from Sun Microsystems, used to back their Network File System (NFS).
Remote Procedure Call mechanisms you may be more familiar with are Java's Remote Method Invocation (RMI), it's predecessor Modula-3's Network Objects, XML-RPC, SOAP, CORBA, Avro, Facebook's, (now Apache) Thrift, Google's Protocol Buffers with Stubby, Twitter's Finagle, and Google's gRPC.
\subsubsection{RFC 684}
\begin{quote}
``Rather, we take exception to PCP’s underlying premise: that the procedure calling discipline is the starting point for building multi-computer systems.''
\end{quote}
RFC 684 is commentary on RFC 674 that introduced the Procedure Call Paradigm, Version 2 (PCP). This commentary highlights, what boils down to, three major points from a critical analysis of the Procedure Call Paradigm.
\begin{itemize}
\item Procedure calling is usually a primitive operation; by primitive, it should be an extremely fast context switch operation performed by the underlying abstraction.
\item Local and remote calls each have different cost profiles; remote calls can be delayed, and in the event of failure, may \textbf{never return}.
\item Asynchronous message passing, or sending a message and waiting for a response when the response is needed, is a much better model because it makes the passing of messages \textbf{explicit}.
\end{itemize}
Following from these three points, we see a series of concerns develop about this programming paradigm, all of which become a common theme across the 40+ years in RPC's history. These are:
\begin{itemize}
\item Difficulty in recovery after malfunction or error. For instance, do we rollback or throw exceptions? How do we handle these errors? Can we just try again?\footnote{Some systems that attempt to address this include Liskov's promises~\cite{liskov1988promises} and Lee's \cite{lee2015implementing} work on a reusable framework for linearizability.}
\item Difficulty in sequencing operations. If all calls are synchronous and some of these calls can fail, it can require a significant amount of code to ensure correct re-execution to preserve order moving forward.
\item Remote Procedure Call forces \textbf{synchronous programming}: a method is invoked and the invoking process waits for a response.
\item Backpressure, or blocking on previous actions completing, load-shedding, or dropping messages on the floor when the system is overloaded, and priority servicing become more difficult with the call-and-response model of Remote Procedure Call.
\end{itemize}
\subsubsection{RFC 707}
\begin{quote}
``Because of this cost differential, the applications programmer must exercise discretion in his use of remote resources, even though the mechanics of their use will have been greatly simplified by the RTE. Like virtual memory, the procedure call model offers great convenience, and therefore power, in exchange for reasonable alertness to the possibilities of abuse.''
\end{quote}
RFC 707 generalizes the ideas from RFC 684 and discusses the problem of resources sharing for services such as TELNET and FTP: each of these services presents a \textit{different} interface for interacting with it, which requires the operator to know the specific protocol for interacting with that service. In realizing that both services like TELNET and FTP follow the call-and-response model, the authors propose an alternative idea: rather than needing to know all of the available commands and protocols on the remote machine, can we define a generic interface for executing a remote procedure that takes an argument list and follows the call-and-response model?
While we can, the problems of control flow and priority servicing outlined in RFC 684 remain; however, not enough that it prevents this model from being adopted by many systems in the future.
\subsubsection{``A Critique of the Remote Procedure Call Paradigm''}
\begin{quote}
``We propose the following test for a general-purpose RPC system. Imagine that two programmers are working on a project. Programmer 1 is writing the main program. Programmer 2 is writing a collection of procedures to be called by the main program. The subject of RPC has never been mentioned and both programmers assume that all their code will be compiled and linked together into a single executable binary program and run on a free-standing computer, not connected to any networks.''
``At the very last minute, after all the code has been thoroughly tested, debugged, and documented and both programmers have quit their jobs and left the country, the project management is forced by unexpected, external circumstances to run the program on a distributed system. The main program must run on one computer, and each procedure must run on a different computer. We also assume that all the stub procedures are produced mechanically by a stub generating program.''
``It is our contention that a large number of things may now go wrong due to the fact that RPC tries to make remote procedure calls look exactly like local ones, but is unable to do it perfectly. Many of the problems can be solved by modifying the code is various ways, but then the transparency is lost. Once we admit that true transparency is impossible, and that programmers must know which calls are remote and which ones are local, we are faced with the question of whether a partially transparent mechanism is really better than one that was designed specifically for remote access and makes no attempt to make remote computations look local at all.''
\end{quote}
Tanenbaum and van Renesse directly attack the Remote Procedure Call paradigm stating that it is fundamentally wrong to treat local and remote calls the same. They state that the transparency that RPC tries to achieve is impossible and posit that a protocol that is designed specifically for remote access is better.
The authors go on to describe an alternative paradigm: a ``virtual circuit''. This alternative comes off sounding very similar to Distributed Erlang running with TCP: non-blocking send and receive operations across a sliding-window network protocol.
\begin{quote}
``An alternative model that does not attempt any transparency in the first place is the virtual circuit model (e.g., the ISO OSI reference model [Zimmermann, 1980]). In this model, a full-duplex virtual circuit using a sliding window protocol is set up between the client and server. If nonblocking SEND and RECEIVE primitives are used, incoming messages can be signalled by interrupts to allow the maximum amount of parallelism between communication and computation.''
\end{quote}
Tanenbaum and van Renesse outline many of the same criticisms as RFC 684: latency, lack of parallelism, lack of streaming, exception handling and failure detection. In addition, they raise a few additional criticisms that we will highlight below.
\paragraph{Unexpected Messages} With a synchronous call-and-response protocol between client and server, how is one supposed to send a message to the client that is unexpected if that client is not waiting for a message?
\paragraph{Single Threaded Servers} How does one handle the situation where the server does not have a response ready for a client immediately -- for instance, if it needs to wait on input from another server? Not only does this block the server, but it also blocks the client from proceeding further with local computation.\footnote{Our section on promises talks about this problem in depth~\cite{liskov1988promises}.}
\begin{quote}
``There is, in fact, no protocol that guarantees that both sides definitely and unambiguously know that the RPC is over in the face of a lossy network.''~\cite{tanenbaum1987critique}
\end{quote}
\paragraph{The Two Army Problem} How do we handle requesting irreplaceable data? (or, more generally have two servers come to agreement that some RPC was successfully executed and the response received.) Well, we can acknowledge the receipt of that information, but then we would also have to acknowledge the acknowledgements to be sure the acknowledgement was delivered, right? This topic, central to the agreement problem, has been discussed extensively in distributed systems literature~\cite{halpern1990knowledge}.
\paragraph{Parameters} Tanenbaum and van Renesse discuss the problem of parameter passing and parameter marshalling. This issue becomes exacerbated with references in object systems such as CORBA, where specific distributed references have to be used to ensure they remain accessible and valid over time.
\begin{quote}
``However, if there are reference parameters or pointers, things are more complicated. While it is obviously possible to copy pointers into the message, when the server tries to use them, it will not work correctly because the object pointed to will not be present.''
``Two possible solutions suggest themselves, each with major drawbacks. The first solution is to have the client stub not only put the pointer itself in the message, but also the thing pointed to. However, if the thing pointed to is the middle of a complex list structure containing pointers in both directions, sublists, etc., copying the entire structure into the message will be expensive. Furthermore, when it arrives, the structure will have to be reassembled at the same memory addresses that it had on the client side, because the server code will just perform indirection operations on the pointers as though it were working on local variables.''
...
``The other solution is just to pass the pointer itself. Every time the pointer is used, a message is sent back to the client to read or write the relevant word. The problem here is that we violate one of the basic rules: the compiler should not have to know that it is dealing with RPC. Normally the code produced for reading from a pointer is just to indirect from it. If remote pointers work differently from local pointers, the transparency of the RPC is lost.''
\end{quote}
\paragraph{Idempotence} Finally, the authors highlight the problem of providing exactly-once semantics across the network and the power of idempotence. Tanenbaum and van Renesse say it very concisely.
\begin{quote}
``Suppose a client does a nonidempotent RPC and the server crashes one machine instruction after finishing the operation, but before the server stub has had a chance to reply. The client stub times out and sends the request again. If the server has rebooted by then, there is a chance that the operation will be performed two or more times and thus fail.''
\end{quote}
\subsubsection{CORBA}
The Common Object Request Broker Architecture (CORBA) is an abstraction for object-oriented languages, popularized by C++, that allows you to communicate between different languages and different address spaces running on different machines. CORBA relied on the use of an Interface Definition Language (IDL) for specifying the interfaces of remote classes of objects; this IDL was used to generate stubs of what the remote systems object interfaces appeared as on the local machine. These IDL's would be used to generate mappings between the abstract interfaces provided by the IDL's and the actual implementations in languages such as C++ and Java.
CORBA attempted to provide several benefits to the application developer: language independence, OS-independence, architecture-independence, static typing through a mapping of abstract types in the IDL to machine and language specific implementations of those types, and object transfer, where objects can be migrated over the wire between different machines. CORBA's promise was that through the use of mapping that remote calls could appear as local calls, and that distributed systems related exceptions could be mapped into local exceptions and handled by local exception handling mechanisms.
However, as Vinoski points out in 2003, the evaluation of programming languages and abstractions based on transparency alone is flawed:
\begin{quote}
``The goal is to merge middleware abstractions directly into the realm of the programming language, minimizing the impedance mismatch between the programming language world and the middleware world. For example, mappings make request invocations on distributed objects and services appear as normal programming-language function calls, and they map distributed system exceptions into native programming language exception-handling mechanisms.''~\cite{vinoski2003s}
\end{quote}
\subsubsection{``A Note On Distributed Computing''}
\begin{quote}
``It is the thesis of this note that this unified view of objects is mistaken.''~\cite{kendall1994note}
\end{quote}
In this pinnacle Waldo paper, they argue that ``it is perilous to ignore the differences'' between local and distributed computing and that the unified view of objects is flawed.~\cite{kendall1994note} They cite two independent groups of work, the systems of Emerald and Argus, and the modern equivalents of those systems, Microsoft's DCOM and OMG's CORBA: all systems that extended the RPC mechanism to objects and method invocation.
We can summarize the ``promise'' of the unified view of objects, as Waldo does in the paper.
\begin{itemize}
\item Applications are designed using interfaces on the local machine.
\item Objects are relocated, because of the transparency of location, to gain the desired application performance.
\item The application is then tested with ``real bullets.''
\end{itemize}
This strategy for the design of a distributed application has two fundamental flaws. First, that the design of an application can be done with interfaces alone, and that this design will be discovered during the development of the application. Second, that application correctness does not depend on object location, but only the interfaces to each object.
Waldo swats down this design with the ``three false principles''~\cite{kendall1994note}:
\begin{quote}
``there is a single natural object-oriented design for a givenapplication, regardless of the context in which that applicationwill be deployed''
\end{quote}
\begin{quote}
``failure and performance issues are tied to the implementation ofthe components of an application, and consideration of theseissues should be left out of an initial design''
\end{quote}
\begin{quote}
```the interface of an object is independent of the context in whichthat object is used''
\end{quote}
\subsubsection{``Every 10 years...''}
\begin{quote}
``The hard problems in distributed computing are not the problems of getting things on and off the wire.''~\cite{kendall1994note}
\end{quote}
Waldo argues that every ten years we approach the problem of attempting to unify the view of local and remote computing and run into the same problems, again and again: \textbf{local and remote computing are fundamentally different.}
\paragraph{Latency} Waldo argues that the most obvious difference should be the issues of latency: if you ignore latency, you will end up directly impacting software performance. He states that is it wrong to ``rely on steadily increasing speed of the underlying hardware'' and that it is not always possible to test with ``real bullets''. Performance analysis and relocation is non-trivial and a design that is optimal at one point will not necessarily stay optimal.
\paragraph{Memory Access} His criticisms of memory access are very specific to CORBA and its predecessors in the object space: objects can retain pointers to objects in the same address space, but once moved these pointers will no longer be valid. He states that one approach to solving the problem is distributed shared memory, but more practically techniques such as marshalling or replacement by CORBA references, which are marshalled for distributed access, are used.
\paragraph{Partial Failure} Finally, the most fundamental problem: partial failure. In local computing, he argues, failures are detectable, total, and result in a return of control. This is not true with distributed computing: independent components may fail, failures are partial and a failure of a link is indistinguishable from a failure of a remote processor.
As always, Waldo says it best:
\begin{quote}
``The question is not `can you make remote method invocation look like local method invocation?' but rather `what is the price of making remote method invocation identical to local method invocation?'''~\cite{kendall1994note}
\end{quote}
Waldo argues that there is only two paths forward if we want to achieve the goal of the unified object model.
\begin{itemize}
\item Treat all objects as local.
\item Treat all objects as remote.
\end{itemize}
However, he states that if the real goal is to ``make distributed computing as simple as local computing'', that the only real path forward is the first. This approach, he believes, is flawed, and that distribution is fundamentally different, and must be treated so.
\begin{quote}
``This approach would also defeat the overall purpose of unifyingthe object models. The real reason for attempting such aunification is to make distributed computing more like localcomputing and thus make distributed computing easier. Thissecond approach to unifying the models makes local computingas complex as distributed computing.''~\cite{kendall1994note}
\end{quote}
The paper provides two examples of where this paradigm is problematic, but we will highlight one case, that builds upon RPC.
\subsubsection{Network File System}
Sun Microsystem's \textbf{Network File System (NFS)}, built upon RPC, is one of the first distributed file systems to gain popularity. Network File System adhered to the existing filesystem API, but introduced an entire new class of failures resulting from network partitions, partial failure, and high latency. Network File System is a stateless protocol implemented in UDP; the decision to implement it this way is motivated by crash recovery avoidance and protocol simplification~\cite{Sandberg:1988:DIS:59309.59338}.
Network File System operated in two modes: soft mounting and hard mounting. Soft mounting introduced a set of new error codes related to the additional ways file operations could fail: these error codes were not known by existing UNIX applications and led to smaller adoption of this approach. Hard mounting introduced the opposite behavior for failures related to the network: \textbf{operations would block until they could be completed successfully}.
\textit{It's just a mapping problem, right?}~\cite{vinoski2003s}
\subsubsection{``Convenience Over Correctness''}
\begin{quote}
``We have a general-purpose imperative programming-languagehammer, so we treat distributed computing as just another nail tobend to fit the programming models.''~\cite{vinoski2008convenience}
\end{quote}
Vinoski highlights three very important points in ``Convenience Over Correctness'' criticism of RPC many years later.
\begin{itemize}
\item \textbf{Interface Definition Languages (IDL) ``impedance mismatch''}: base types may be easy to map, but more complex types may be less so.
\item \textbf{Scalability:} the RPC paradigm does not have any first class support for caching, or mechanisms for mitigating high latency, and is remains a rather primitive operation to build distributed applications with.
\item \textbf{Representational State Transfer (REST)}: REST is good: it specifically addresses the problem of managing distributed resources; but most frameworks built on top of REST alter the abstraction and present something that repeats the problem~\footnote{For instance, if one was to build an object model on top of REST.}.
\end{itemize}
\subsection{Impact and Implementations}
Remote Procedure Call (RPC) has been around for a very long time and while many opponents of RPC have been extremely critical of it, it still remains one of the most widely used way of writing distributed applications. There's an unprecedented amount of use of frameworks for RPC such as Google's Protocol Buffers and Apache's Thrift used in production applications.
Frameworks such as Google's gRPC for HTTP/2.0 and Twitter's Finagle continue to reduce the amount of complexity in building applications with them, attempting to bring RPC to an even wider audience. For instance, Twitter's Finagle is protocol-independent and attempts to deal with the problems of distribution directly. Finagle does this through the use of futures, which allow composition and explicit sequencing; Google's gRPC does similar. These frameworks claim that since they do not attempt to hide the fact that the calls are remote, that it provides a better abstraction. However, now we have returned to the aforementioned problem of soft mounting in NFS: explicitly handling the flow control from the array of possible network exceptions that could occur, though mitigated through the use of promises/futures~\footnote{Our related post on promises and futures challenges whether that abstraction is right either.}.
But, the question we have to ask ourselves is whether the abstraction of an individual method invocation or function call is the correct paradigm for building distributed applications. Is the idea of treating all remote objects as local, and making distribution as transparent as possible, the correct decision moving forward? Does it mask failure modes that will allow developers to build applications that will not operate correctly under partial failure?
When we talk about distributed programming languages today, many developers equate this to \textbf{programming languages} that can be, and have been used, to build \textbf{distributed systems.} For example, any language with concurrency primitives and the ability to open a network socket would suffice to build these systems; this does not imply that these languages are distributed programming languages.
But, a \textbf{distributed programming language} is where the distribution is \textbf{first class}. Languages like Go are more closely related to \textbf{concurrent} languages, where concurrency is first class; and, while concurrency is a requirement for distribution, these are different topics. CORBA is an example of trying to make distribution first class in languages such as C++.
Erlang~\cite{claessen2005semantics, svensson2007more, svensson2007programming} is one language where distribution is first class. Erlang has a RPC mechanism, but prefers the use of asynchronous message passing between processes. In fact, the RPC mechanism in Erlang is implemented using Erlang's native asynchronous message passing. While you can peek under the covers and see \textbf{where} processes are running, Erlang tries to make the programmer assume that each process could be executing on a different node. Motivated by the expressiveness of the design, both Distributed Process, from the Cloud Haskell group, and Akka, in Scala, are examples that attempt to bring Erlang-style semantics to Haskell and Scala, respectively.
One approach taken in the Scala community for distributed programming is serializable closures~\cite{miller2014spores}, or what's known as the function shipping paradigm. In this model, entire functions are moved across the network, where the type system is used to ensure that all of values in scope can be properly serialized or marshalled as these closures move across the network. While this solves some of the problematic points in systems like CORBA and DCOM, it does not have a solution for the problem of how to ensure exactly-once execution of functions, or to handle partial failure where you can not distinguish the failure of the remote node from the network.
Languages like Bloom, and Bloom\textsubscript{L}, and Lasp~\cite{alvaro2011consistency, conway2012logic, meiklejohn2015lasp} take an alternative approach: can we build abstractions that rely on asynchronous programming, very weak ordering and structuring our applications in such a way where they are tolerant to network anomalies such as message duplication and message re-ordering. While this approach is more expensive in terms of state transmission, and more restrictive in what types of computations can be expressed, this style of programming supports the creation of \textit{correct-by-construction} distributed applications. These applications are highly tolerant to anomalies resulting from network failures by assuming all actors in the system are distributed. The restrictions, however, might prohibit wide adoption of these techniques.
So, we ask again:
\begin{quote}
``Does developer convenience really trump correctness, scalability,performance, separation of concerns, extensibility, and accidentalcomplexity?''~\cite{vinoski2008convenience}
\end{quote}