Home / Tufts / Projects / filecopy

filecopy

This was one of the projects I did for my Distributed Systems course. The goal was to write a C program that transferred a directory of files over the network. Easy, right? The catch was that we had to do all of our networking using a custom UDP wrapper which our professor wrote to be incredibly unreliable; it can be adjusted to drop or duplicate packets frequently. On top of this, we also used a file descriptor wrapper that often corrupts reads and writes. The purpose was to get accustomed writing code for unreliable circumstances and maintain robustness.

This project primarily demonstrated the end-to-end principle by introducing many intermediate points of failure between nodes, requiring a robust end-to-end check to ensure correctness. My partner and I also utilized idempotence to simplify out protocol and used caching as an optimization.

My partner and I produced a very fast and incredibly reliable program. It was capable of handling the highest "nastiness" levels of both the socket and file descriptor wrappers - meaning the majority of packets were dropped, many duplicated, and disk writing became increasingly unreliable as more data was written at once. However, with our carefully designed protocol, we were able to curb these nasty issues.

We spent at least a week designing our protocol before we wrote a single line of code. It was immediately evident that if we didn't carefully tackle the problem, then the program would become a tangled mess of edge case handling and afterthoughts cobbled together. As it turns out, it's really quite tricky to design for tons of dropped and out of order packets! An issue we kept coming back to was keeping consistent sequence numbers - a strategy we borrowed from TPC - between the client and the server in the event of a failure. We found that it would be really tricky to reset both the client and server back to a stable place if the end-to-end check failed, and ensure that whatever reset packet that was sent would not be recognized later on if it was duplicated due to a networking error. This was complicated by the fact that we really wanted to be capable of sending multiple packets in parallel to speed up large file transfers, so strict incremental sequence numbers were insufficient.

Our solution ended up revolving around the idea that the client should do most of the decision making, while the server should be very simple and only respond to client requests as it receives them. The server's operations that the client can invoke are all designed to be idempotent under identical sequence numbers, meaning that if a packet from the client was duplicated and triggers the same operation on the server side twice, the result would still be the same. For example, if the server receives a data packet for a chunk of data it already received, it will respond as if it had just received it for the first time. This also produces nice behavior in the event that an ACK (acknowledgement of a client's message) is dropped and the client resends a packet after timing out. This one restriction simplified many parts of our protocol.

So how did this solve the reset-upon-error issue? Well, after the client believes that it has successfully transferred all of the chunks of a file, it asks the server to compare its file's hash with a hash sent over by the client, and it responds either positively or negatively based on the result of the comparison. If the result is negative, then the client will backtrack to the beginning of that file's transfer process. This first involves telling the server that it should expect a file F composed of N parts with sequence number K, at which point the server will prepare a fresh buffer if the previous one was created under an older sequence number. The client then proceeds to send all the file parts over again and performs the hash check once finished. The secret sauce is that the server does not keep track of a global sequence number, but instead attaches sequence numbers to chunks of files and file buffers so that if it receives a file chunk or a preparation request with a sequence number older than the chunk or buffer it already has, it knows to throw it out. This also allows the client to send many file chunks at one time without concern for them arriving out of order because the server isn't waiting for a particular sequence number, it's just waiting for more up-to-date data than what it currently has.

This solution also does away with the concept of a "reset" operation, as the server really has no idea where in the process it's at. It just operates on its data as the client instructs it to, and if that includes preparing a new buffer for a file it already has parts to, then so be it.

There are plenty of other minutiae that handle file saving and packet parsing, but they're less interesting than the rest of the protocol design, in my opinion, and perhaps break more academic integrity rules than I already have.

If you're not actively taking Distributed Systems (seriously kids, no peeking!) then feel free to peruse our code here.


signpost care to venture further?

Projects

CanJam - a multithreaded peer-to-peer sound canvas

CyberGrape - a tactile spatial audio system

RPC - remote procedure calls in C

filecopy - copy files over a highly unreliable network

arrow_upward Back to top