Assignment 2 – File Encryption

Assignment 2 – File Encryption with the Java Cryptography Architecture
This assignment will teach you how file encryption is performed using the standard cryptoprimitives provided in the Java Cryptography Architecture. Along the way, you’ll explore issues such as padding, block cipher modes and how to generate large, reasonably random keys from passwords (or, better still, passphrases). You can split this task into three easy stages to allow you to check your work, and as you proceed, these instructions ask you several questions – you should record your answers because they are part of your assignment submission.

Download the file from iLearn and import it into an Eclipse Java project called FileEncryptor so that you can work on it. is the skeleton of a simple file encrypt/decrypt program. On the command line, the first parameter is “E” or “D” for encrypt or decrypt, respectively, while the second and third parameters are the input file and output file, as you’d expect. However, the fourth parameter is not a key, but a password or passphrase. If a passphrase which includes spaces is used, it must be surrounded by quotes to stop the shell parsing it as multiple parameters, e.g. “My Super Secret Passphrase”.
Your mission, should you choose to accept it – and you don’t really have a lot of choice – is to complete the program by filling in the missing code which instantiates objects of the right classes and then calls the appropriate method calls. I created the skeleton by taking the completed, working version, and deleting code while leaving the explanatory comments – but I’ve been reasonably careful to replace multi-line function calls and blocks of code with the same number of blank lines (there’s usually a blank line below the excised code). So if you see a two-line gap, you can reasonably assume that one line of code will provide the missing functionality – and if you think a single line of code will fill a 15-line gap, you should wonder whether you’re missing something.
Along the way, you will need to refer to the online documentation for the Java Cryptography Architecture, which you will find at . The JavaDoc for the various javax.crypto classes are at while javax.crypto.spec JavaDoc is at
You may also need to refer to the Standard Algorithm Names at and the Oracle Providers documentation at .
Anything else can be found under the main Security Documentation at .
This is a shameless ploy to get you to become at least slightly familiar with the JCA reference documentation – you are quite likely to need it in the real world, as well as for this assignment. However, I will provide some overview guidance in this article.
I’ve also moved the declarations of the required variables and objects to the beginning of the main() method – looking at these will give you some valuable clues.

If you are not a strong Java programmer, you might want to review the skeleton code first, while referring to the notes on Programming Style at the end of this document. However, it should be possible for a non-programmer to work out the required methods and their arguments like algorithm names and transformations from the lectures and the write-up that follows.
The Java Cryptography Architecture
The JCA provides a standard interface which Java programmers can use to both use cryptographic functionality and also implement crypto functionality. Notice the latter point: anybody can develop cryptoprimitives that conform to the JCA and package them, as JAR files, into what are called providers. In this exercise, we will be using one of the default providers from the Oracle Java SE SDK – the SunJCE provider. Others exist and may be preferable if you require some advanced functionality – a good example is the Australian-developed Bouncy Castle package found at
Because the JCA is highly generic and algorithm-independent – you can utilise DES, Blowfish or AES with almost-identical code – in many cases you do not directly instantiate a particular class of cipher. Instead, you call a generator or factory method to get an instance of the required cipher or other cryptoprimitive. So the various base classes – Cipher, SecretKeyFactory, KeyPairGenerator, etc. – all provide a static getInstance() method which you should call, usually with the name of the required algorithm as the first parameter.
This technique allows the JCA runtime to search multiple different providers in order to instantiate the required algorithm, rather than tying your code to a specific provider.
In practice, just the algorithm name alone is insufficient, as discussed in the lectures – we also need to specify what mode the cipher will be used in, as well as a padding mechanism. These options are concatenated with “/” characters as a delimiter, so we arrive at strings like “AES/GCM/NoPadding”, which the JCA documentation calls transformations.

One of the benefits of strongly typed languages like Java is that many errors can be discovered either by the compiler, at compile time, or even by the editor of your IDE. However, in the JCA many different cipher implementations share the same class or interface – and the actual cipher implementation to be used is specified by a string parameter specifying the required transformation.
This means non-existent cipher implementations cannot be discovered at compile time, but only at run time, and so many of the getInstance() calls will need to be surrounded by try/catch blocks. Fortunately, Eclipse will take care of most of that work for you.
The various cryptoprimitives require different sets of their corresponding parameters, and so there are supporting classes such as AlgorithmParameters, KeySpec and its derivatives SecretKeySpec, PBEKeySpec, RSAPrivateKeySpec, etc. that allow the programmer to fully specify how he wants ciphers configured, keys generated, etc.
In general, the various algorithms will provide default values if a parameter is not specified. However, be aware that the various API’s don’t like null pointers. So, for example, if you don’t want to specify a salt value for a KeySpec, you can’t say
salt = null;
but must write:
salt = new byte[20];
in order to avoid an exception being thrown at run time.
For this exercise, we’ll use the standard SunJCE provider – the original Sun Java Cryptography Engine. All its documentation is in the standard JCA documentation and JavaDoc linked above, and no installation or configuration is required.
Converting a Passphrase to a Key
We’re going to be encrypting and decrypting using AES with a 128-bit key – but humans are really bad at remembering 128-bit binary strings, so we’re letting the user enter a passphrase instead. As a result, we’ll need an algorithm that converts an arbitrary-length string into a fixed-length binary value.
That really ought to ring bells in your head; we spent a week discussing this type of algorithm.
Right: we need a hash function. Fortunately, there’s an algorithm that is purpose-built for this task – taking a passphrase and turning it into a key. It’s called PBKDF2, which stands for “Password Based Key Derivation Function 2”, and it’s based on repeated hashing of a single block of input text, typically using SHA-1 thousands of times over – the first time to hash the input text, and then to hash the previous hash value. It can also add in a little salt (something we’ll discuss in a later lecture, in connection with password hashing and storage).
You may have already used PBKDF2 – it’s the function that Veracrypt uses to turn a passphrase into a key, and it’s also the way wi-fi access points and routers create a 256-bit AES key for use in WPA2. When setting up WPA2, you could enter a 64-digit hex string as a specific 256-bit key – or a string up to 63 characters long, which will be passed through PBKDF2, using 4096 iterations to produce what is called the Pairwise Master Key. In addition, the network SSID is used as salt, so that the same passphrase produces different key values on different networks.
Here’s an outline of the missing code:
To create a PBKDF2 key factory, call the getInstance() method of SecretKeyFactory with the right transformation string (which you get from the Standard Algorithm Names page linked above).
Next, you’ll need to create a key specification, which sets up the number of iterations, the required key size, the salt string, and – very importantly – the passphrase the key will be derived from. There are various types of KeySpec, but you’ll need one for Password-Based Encryption (this is a big hint – I’ve already mentioned this type of KeySpec).
Finally, you get the key by calling your key factory’s generateSecret() method with the KeySpec object as parameter.
Notice that the key is returned as a SecretKey, which is a class that wraps around the actual key value in order to provide type safety and also deal with various types of key encodings – it is impossible to do things like attaching a key to an email or pasting it into a web form if it is in a purely binary representation, and so it may be saved in formats like ASN.1 DER (ISO Abstract Syntax Notation type 1, Data Extended Representation). To get the actual key as an array of bytes, call the getEncoded() method on the SecretKey object.

Finally, you’ve got a key!
Performing Symmetric Encryption
The SunJCE’s Cipher implementations support a wide variety of cryptoprimitives, including DES, RC4, Blowfish and AES along with various modes and padding types.
To use a Cipher, the general sequence of events is this:

Create a Cipher of the required type by calling Cipher.getInstance() while specifying the appropriate transformation.
Initialise the Cipher object by calling its init() method, specifying the encryption mode (encrypt or decrypt), a key spec and optionally an IVSpec.
Loop around, calling the cipher object’s update() method which returns blocks of ciphertext.
Finally, call the cipher object’s doFinal() method, which allows it to perform padding if it has a partial block left over from the previous update.
Note that the JCA Cipher interface does not have separate encrypt() and decrypt() methods – instead, the same update()and doFinal() methods are used to both encrypt and decrypt and the operating mode is set in the init() method call. You can see the static final constants that are passed to init() in the skeleton code.
The other thing that is passed to init() is a KeySpec – this a wrapper that goes around the raw bytes of the key. In this case, you’ll need to set up a SecretKeySpec – take a look at its constructor in the JCA documentation.
ECB mode (Version 1)
For the first version of your program, you should implement the encryption using AES with 128-bit keys in ECB mode and PKCS #5 padding. This means that no IVSpec will be required.

PKCS #5 padding is discussed in the lecture – it basically means that if the final block falls n bytes short of the AES 128-bit block size, we fill that space with n bytes, each with the value n. There’s a curious Java weirdness here: PKCS #5 padding is only defined for ciphers which use an 8-byte – that’s 64 bits, of course – block size, which means DES. PKCS #7 defines this style of padding for arbitrary block sizes, but the JCA won’t accept that as part of a transformation string, and so we’re forced to call it PKCS #5 padding, even though it’s really #7.

In any case, the good news is that you don’t have to write any code to do the padding manually, because the Cipher will take care of this for you, if you specify it as part of the transformation string – that’s the point of the doFinal() method.

Once you’ve written your code, test your program, using increasingly long text files. You may need to insert some statements to print values to System.err so that you can see what’s going on.
Once your program is working, encrypt the file infile20.txt, using a passphrase of your choosing:
java FileEncryptor e infile20.txt crypto.bin passphrase
The ciphertext is written to the file crypto.bin – open it in Notepad++ or a similar editor. What do you notice about the ciphertext? Why do you think this is?
Experiment with encrypting small files of various sizes – say, 50 bytes, 60 bytes and 64 bytes – and comparing the size of the generated ciphertext file. Record your results. What do you notice?
CBC Mode With Default IV (Version 2)
Now edit your code to operate in Cipher Block Chaining mode. You will need to change the transformation string, and also create an IVSpec. Once again, check the documentation for IVSpec in the online documentation – especially its constructors. Notice that if you do not provide an initialization vector, the Cipher will provide a default. However, remember the note about default values above – passing a null value will generally cause an exception, so that a default IV is provided as
byte[] iv = new byte[16];
Once you have completed your program and tested it with some small text files, try encrypting infile20.txt once again and viewing the ciphertext in Notepad++. Does it look different?
Once again, try encrypting small files and comparing the size of the input plaintext and generated ciphertext. Record your results – what do you find?
Obviously, decryption has to be performed with the same key (i.e. the same passphrase) but also with the same initialization vector. How is the initialization vector being passed between encryption and decryption? Use the getIV() method of the Cipher interface to get the actual default initialization vector being used and print it – do you think this is particularly secure?
Final Challenge: CBC Mode With Random IV (Version 3)
The answer to the last question above should suggest a further weakness with using a default IV – setting aside the obviously problematic value, every file is being encrypted using the same IV value and quite possible the same passphrase, leaving us increasingly open to a known-ciphertext attack as we encrypt more and more files.
We really need a randomly-generated IV. Fortunately, the JCA provides a cryptographically adequate pseudo-random number generator class, in SecureRandom. Once you’ve create a 16-byte array, iv, as above, getting it filled in with a random value is as easy as:
SecureRandom random = new SecureRandom();
Now modify your CBC-mode file encryption program so that, instead of using the same constant value for an IV, it uses a “genuinely” random IV.
But before you copy and paste the two lines above into your program and set about compiling it, consider that question asked in the previous task: How is the initialization vector being passed between encryption and decryption? Whenever you encrypt a file, a new, random value for the IV is generated – and the file has to be decrypted using the same IV value. If this is going to work, you’re going to have to persistently save the IV somewhere.
Test your program by encrypting infile20.txt and decrypting the resultant ciphertext – obviously, you should get back the same file. Now, repeat the test of encrypting some small files and comparing the sizes of the generated ciphertext files. Your ciphertext should be a bit larger – and you should know exactly how much larger and why.
Hopefully, you can see that initialization vectors aren’t as straightforward as people often naively assume. In fact, many secure systems have failed, not because DES was cracked or AES was cracked, but because a naive programmer chose a weak way of generating initialization vectors. And you should also be able to see why initialization vectors for encrypting disk sectors are particularly challenging, as discussed in the lecture, and we have to come up with schemes like ESSIV and XTS.
Notes on Coding Style
Some general comments on coding style – for this demo program, I opted to let Eclipse generate try/catch blocks around various method calls, then replaced the generated printStackTrace() with a call to a routine which prints a more meaningful error message – in particular, a message that gives a clue as to where things went pear-shaped. This is usually followed by a call to System.exit() to exit the program, setting the errorlevel to an appropriate value. I use this technique a lot in “exploratory” programming, but for larger and more polished programs I tend to use larger try/catch blocks and throw to a higher-level error handler.
The main processing loop is written in a slightly verbose style, just to make what is happening clear to the less experienced programmers. The paradigm used is:
while (something was successfully read) {
However, it’s quite common for C/C++/Java programmers to combine the read_from_file() with the while loop test. In this example, it would read like this:
while ((bytesRead = > 0) {
This paradigm would make the program a few lines shorter, as both calls to are consolidated into a single call.
Notice that the file I/O reads binary blocks of data, rather than using a stream reader to read lines of text. There are two reasons for this: firstly, the program cannot be restricted to simply encrypting text files – you might want to encrypt an Excel spreadsheet, a JPEG image, or any other kind of data. And secondly, the ciphertext written to the encrypted file will definitely not be text – and the program needs to be able to read it back in to decrypt it.
You may have noticed that I’ve used buffered streams for file input and output. This means that the block size for reading and writing has virtually no impact on performance: reading lots of small blocks of data will not cause lots of slow disk accesses because the Java runtime will buffer all I/O. The buffer size of 128 bytes set at the beginning of the program can be changed arbitrarily.

Leave a Reply

Your email address will not be published. Required fields are marked *