Some thoughts about SOAP versus REST on Security

REST is the underlying architecture of the World Wide Web and its two core specifications, URIs and HTTP. It has been proposedthat instead of using new-from-scratch Web Services technologies we can get much more bang for our buck by understanding the full generality of what we've got. A community has arisen around this idea and we spend our time proving that what the Web already has is better than what is being developed. This page addresses the security weaknesses of the SOAP approach.

Simplicity

SOAP is an intrinsically complex specification. It typically runs on top of HTTP and therefore inherits any bugs and security holes in HTTP implementations. The web services world keeps churning so quickly that we cannot expect security experts to keep up with the auditing of either the specifications or the code. Implementors have enough of a problem just getting SOAP toolkits to talk to each other, much less making them secure, stable and scalable! The basic concepts of both HTTP and SOAP are widely misunderstood but most people by now have had experience with HTTP whereas SOAP is still an unknown (and changing) quantity.

Firewalls

SOAP is designed to slip through firewalls as HTTP. There is no doubt that this is a design goal. Microsoft advertises it as such. Don Box (one of SOAP's inventors) is quite open about this: "if you look at the state of the average organization, they use proxy servers and they use firewalls to prevent normal TCP traffic from making it from one machine to another. Instead, they set up this infrastructure to allow HTTP to work. So part of the problem was replacing the transport, which is the way DCOM does framing, with an ACDP-based transport. That was the first part of the SOAP effort."

This is completely ridiculous from a security point of view. Firewalls exist so that corporations can monitor what goes in and out. If they wanted SOAP going in and out then they would configure their firewalls that way. But instead of making the reasoned argument to CIOs that SOAP is safe and should go through firewalls, it is instead tunneled over HTTP. This moves the decision out of the hands of the CIOs and into the hands of the software developer.

As a developer, I know that this is the worst possible place to put it because developers are often more interested in neat technology than security. Now on some days of the week I might consider myself an anarchist/hacker/get-the-job-done developer and decide to do a tunnel like that to get some important functionality deployed...but let's be honest about the fact that I'm circumventing policy and potentially opening up a security hole.

This issue has also been discussed by Bruce Schneier, the noted security expert. It has also got some media discussion. According toTim Bray (an editor of the XML specification), "SOAP goes through firewalls like a knife through butter." (though he is otherwise supportive of SOAP) Alan Cox uses the same analogy.

Unfortunately, many people in the media think that security is a boolean proposition. Either you have it or you do not. There are SOAP-related security standards so they think "SOAP has it." But the truth is that security is about not doing a long list of things wrong. SOAP does many things wrong and tunnelling through firewalls over HTTP is the first one.

Filtering SOAP looks very difficult based on my knowledge of the spec. SOAP uses the same port as HTTP. Although it predates SOAP, the world already has experience with security holes caused by running multiple services over the same port in Microsoft's RPC protocol. SOAP uses a standard HTTP POST method when it should use an extension method. The SOAPAction header is now deprecated. So you can only recognize SOAP by doing XML parsing in the firewall. Okay, so now you know it is SOAP. You want to decide whether it is supposed to be allowed in the firewall. How do you decide that? SOAP has no uniform addressing model or reliable internal structure. Sometimes there is a header, sometimes there is not. Sometimes the body is RPC-encoded sometimes it is not. Sometimes there is a method name, sometimes there is not. Roy Fielding (one of the major inventors of HTTP) and Jim Whitehead (of WebDAV fame) discussed this problem years ago.

A Concrete example

Let's say that a vicious new Trojan horse or virus is sweeping the Internet. A system administrator might decide that they wish to monitor binaries coming in from the outside (whether the download was triggered internally or externally). SMTP and HTTP have well-defined ways for transporting binaries and simple pattern matching can be used to detect the virus. It is therefore trivial to monitor all SMTP or HTTP traffic passing through the firewall.

It is trivial, that is, until the traffic passing through the firewall is SOAP encoded. The SOAP version must go through an extra decoding step because SOAP has (at least) three different ways of transporting binary data. This would not be such a problem if SOAP used its own port because it would be trivial to merely disable that port as you might disable the SSH port or the FTP port. Unfortunately SOAP tunnels through HTTP and overloads the HTTP port.

Now lets say that six months before you were developing a P2P application for sharing astronomy data. Call it "starster". You have three choices for the protocol that Starster uses. One option is to use a totally proprietary protocol on a proprietary port number. This is okay from a security point of view because the paranoid firewall vendor will just turn off that port when they get scared (if they ever turned it on...which they probably did only explicitly).

A second option is to use SOAP. In this case, Starster will run its proprietary protocol as a specialization of SOAP (a set of methods) and SOAP will tunnel through HTTP. This means that the system administrator will not, by default, even know that Starster is running. They also will have no very easy way to turn it off. They would have to detect a particular XML namespace buried somewhere in an XML document buried in a SOAP envelope embedded in an HTTP envelope inside of a TCP packet.

A third option is to use HTTP, but use it according to the intent of the HTTP specification with no extra encodings or obfuscation. In this case the Starster traffic would continue to flow through the firewall because it would be indistinguishable from Web traffic. But more important, it would also be as safe as Web traffic, because the system administrator could use standard Web trafffic filtering to screen out the virus.

Philosophy of mediation

REST has at the heart of its philosophy the idea that resources should be presented on the Web as resources with URIs and only a very limited number of methods. Resources are never, ever, accessed directly. They may only be accessed through representations. Right at the core of the REST philosophy is the idea that you should never get at the implementation of an object. You only ever ask it about itself and send it information.

Conversely, consider how Microsoft sellsSOAP+WSDL+Visual Studio.NET:

We begin by opening Visual Studio and creating a new Visual Basic XML Web service project called "Stocks".
Next we open the XML Web service in Code view and write the following function:

<WebMethod()> Public Function BuyOrSell(Ticker As String)
   If Ticker=="ACME" Then
      Return "BUY"
   Else
      Return "SELL"
   End If
End Function

Microsoft encourages no mediation between business logic and the dangerous outside world. You just write a function, annotate it as a WebMethod and you are off to the races. There is no mediation between interface and implementation. You generate the public interfacefrom the implementation.

This is appropriate on a single desktop or department. What Microsoft does not tell you is that it is a wildly inappropriate style for the public Web or even an enterprise intranet. Microsoft sells Web Services as easy to use. And as they sell them, they are certainly easier than the REST style.

You cannot, general, move this approach out to the public Internet for a variety of reasons. Security is one of them. Some level of network data sanity checking should be required before your business logic comes close to the data. Code lacking in explicit network mediation is also prone to hacks where the hacker guesses the implementation from the interface.

You might argue that this is just Microsoft, not the actual specification. But what is the virtue of the SOAP RPC model if not to make this sort of thing easy and also to make it easy to wrap existing DCOM and CORBA components in SOAP? Most people agree RPC is not the best model for scalable apps. From my point of view, its only virtue is that it makes networking look like local method calls.

The REST approach to the problem is to address the networking issue first through XML schemas and HTTP resource design. Once that's done you can attack the implementation details. This approach is likely to yield better security, scalability and extensibility.

SOAP encourages re-invention of wheels

OSI defines many layers of networking protocols. It calls the top level the "application level". Most definitions of OSI do not go into much detail about the application layer. I guess they presume it is the stuff that doesn't fit in other layers. What that means practically, however, is that the application layer is where data resource/ component addressing and manipulation takes place. For the class of problems that REST and RPC are supposed to solve, I claim that this is the primary issue.

HTTP defines an addressing model based on URIs. Any application that uses HTTP "properly" will use this addressing model directly. Systems that build on HTTP's addressing and data manipulation model are not properly considered new protocols but rather extensions or applications of HTTP.

SOAP subverts HTTP's addressing model by hiding all of the data objects behind a component end-point interface. It needs to subvert HTTP's addressing model in order to be "transport agnostic" because SMTP does not (for example) use the same addressing model as HTTP. SOAP also has no addressing and data manipulation model of its own. This means that application programmers must invent their own addressing and data manipulation models. In effect, they must become protocol designers themselves. Essentially they are forced to extend the OSI stack to 9 levels instead of 7. And each layer adds security flaws. There are still likely to be bugs in IIS and even Apache. On top of that will be bugs in the SOAP implementations. On top of that will perch the most vulnerable layer of all: the protocol developed by the business programmer.

Inventing new protocols is hard work and involves careful security consideration. Every IETF-defined protocol has a section dedicated to security. Will business developers be as careful about the security implications of their new protocols? Should they be forced to be? Why does SOAP require them to re-invent wheels when HTTP is already a sufficient application protocol and can already meet the needs of exposing services as web resources.

I said that an application protocol consists primarily of data resource addressing and manipulation. Manipulation is done through methods. In HTTP, some methods are defined to write and modify data and some to only read data. That means that barring "tunnelling", firewalls can be configured to make a portion of a network "read only" by filtering methods. Log files can be searched easily for writes that might have corrupted data. Read/write access control can be separated out.

SOAP has no such distinction. Any method could be a read method or a write method. Even WSDL does not allow you to make the distinction. Thus a powerful filtering tool has been lost. In general, system administrators can read the logfiles for a wide variety of HTTP-based services because they all look pretty similar: GET, PUT, POST, DELETE and a URI. SOAP, on the other hand, allows messages to look almost however you want. It is designed to be completely free-form. Messages from different services can and will look radically different. A security person's understanding of them requires much more detailed business logic knowledge.

As a trivial example, GET /getHistoricalStockQuotes?MSFT says to a security person: "okay, it's a GET. Barring tunnelling or a bug, it can't modify the server. Probably returning some kind of report for historical stock quotes. If there is tunnelling or a bug it isn't my fault. We'll fire the programmer."

When he sees getHistoricalStockQuotes("MSFT") he says: "Hmmm. Probably returns stock quote. But can I be sure it doesn't modify anything on the server? Maybe it's creating a new object that can be queried about different quote dates. If so, who is allowed to create these objects? When are the destroyed? Can a malicious hacker leak them until the server runs out of memory? I better go read the documentation for this thing because what it does isn't obvious at first glance. Maybe i better go find the programmer to make sure I understand it."

Of course the two are equally simple: they both return a report. But one is very explicit about a promise not to modify server state. The other is not.

HTTP/REST/The Web has a unified namespace

I have said that for our purposes, application protocols are about addressing and manipulating data and component resources. On the Web, all resources are merged into one massive global namespace the URI namespace. One great thing about using URIs is they give you something very tangible to hang your permissions upon. For instance, just as you could imagine adding or removing permissions for a purchase_orderfile you could imagine doing the same for a Web purchase_order resource. This is standard web stuff and has its roots in permissions techniques that people have been using for decades.

The big difference between a resource and a file is that the resource is virtual: it may actually be represented as a row in a database or a field in comma-delimited file. The permissions you apply are to the virtual purchase order abstraction, not the file.

SOAP has no unifying concept of a resource and every SOAP application invents its own addressing mechanism. Unlike CORBA and COM there is no pointer mechanism and unlike HTTP, SOAP data objects are typically not available through a URI. This means that there is nothing to hook permissions or access controls onto. Every applicaion developer will have to invent her own addressing mechanism and figure out how to associate permissions herself.

SOAP Security Literature is Misleading

SOAP puts much more responsibility for security in the laps of the developer, but SOAP literature does not prepare them for this responsibility. Consider this quote: "For security, use of the Secure Sockets Layer (SSL) protocol is supported, as well as standard Web authentication techniques." This implies that SOAP "has" security, as if it were a boolean property. It has some security features so it "has" security.

If you read Microsoft'sliterature on SOAP security it does not address anything about how to make your service secure. Rather it describes how to limit access to it to certain domains. But the whole point of web services was that we would put services on the public Web as we put websites on the public Web. By the way, the article mentions firewalls but does not instruct programmers how to play with them nicely, using a SOAP-specific port.

SOAP is new and untested

Confidence in code comes through auditing, testing and use. People have tremendous confidence in the Apache Web Server. IIS is considered less robust but is still robust enough to run millions of sites every day. Consider how long it took to get their security to that point. Are you willing to trust that toolkits for Web Services are anywhere near this level of reliability and safety? The Web Services world must slow down and consider the security implications of its creations with much more rigour.