Archive for July, 2009

Generation of MD5/SHA File Hashes in Java

Sunday, July 26th, 2009 | Java, Tech-savvy | 2 Comments

This post is about generating file hashes in Java. I came across the need to generate file hashes for a media application that I am working on and I wanted to implement a way to identify dupes. The best way IMHO is a hash code of the file, which has a constant size (even for large files) and can be easily compared to other hash codes thus making the identification of dupes a breeze.

Java provides a .hashcode()method for all objects, inherited by java.lang.Object – but this is not what we are looking for as this excerpt of the Java SE6 API Doc states:

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

To be perfectly honest it would be quite silly to believe that the .hashcode() method of java.lang.Object would be sufficient for generating file hashes. We might be lucky enough, that the .hashcode() method of java.io.File overrides the default behaviour of Object to something more suitable for files. Well, it does indeed, but this is still not what we want (API Doc excerpt):

Computes a hash code for this abstract pathname.

Well, java.io.File.hashCode() computes a hash based on the pathname. Again, not suitable.

What we really need is a method, that reads all the bytes of a file and computes a hash of the file contents, not some meta data. This is how we do it (not my work, just the first snippet Google provided):

public static String generateHash(File file)
		throws NoSuchAlgorithmException,
		FileNotFoundException, IOException {
 
	MessageDigest md = MessageDigest.getInstance("SHA"); // SHA or MD5
	String hash = "";
 
	byte[] data = new byte[(int)file.length()];
	FileInputStream fis = new FileInputStream(file);
	fis.read(data);
	fis.close();
 
	// Reads it all at one go. Might be better to chunk it.
	md.update(data);
 
	byte[] digest = md.digest();
 
	for (int i = 0; i < digest.length; i++) {
		String hex = Integer.toHexString(digest[i]);
		if (hex.length() == 1)
			hex = "0" + hex;
		hex = hex.substring(hex.length() - 2);
		hash += hex;
	}
 
	return hash;
}

This worked for me, but there are (at least) two things I don’t like about this solution. First, as the comment already states, this method reads the whole file at once – this will give you an java.lang.OutOfMemoryError: Java heap space exception quite fast. Second, the for loop tinkers the String representation of the hash – this is error prone and not easily maintainable.

So I looked further an came across this solution:

public static String generateBufferedHash(File file)
	throws NoSuchAlgorithmException,
	FileNotFoundException, IOException {
 
	MessageDigest md = MessageDigest.getInstance("MD5");
 
	InputStream is= new FileInputStream(file);
 
	byte[] buffer=new byte[8192];
        int read=0;
 
        while( (read = is.read(buffer)) > 0)
                md.update(buffer, 0, read);
 
        byte[] md5 = md.digest();
        BigInteger bi=new BigInteger(1, md5);
 
        return bi.toString(16);
}

Wow, just a small helper method, a buffered reader that hashes large files without taking too much memory and a provided toSting() method. This is just what I was looking for. I hope some people out there save some time trying to implement their file hash solution reading this post. Happy coding!

P.S.: If you care about the hash algorithm used (e.g. MD5 or SHA) have a look at java.security.Security.getProviders() and the .getInfo() of each given Provider.

Tags: , , , ,

SpringSource Certified Spring Professional

Friday, July 24th, 2009 | Certification, Java | No Comments

cert_spring_proAs of today I am a SpringSource Certified Spring Professional. I am totally happy that I made it, this was one of the most challenging certifications I ever achieved (okay, the Certified JBoss Developer was even harder, but it is “open book”).

I can’t go into the details right now (and I am not allowed to) because I have some friends over for a beer. Just one thing:  JavaBlackBelt provides a great summary of the topics that will be in the exam. The Spring Reference Documentation is just awesome and will give you all the information that you need. It is a lot of information, but it’s worth the effort. Okay, gotta go partying 🙂

Tags: ,

“Hey, that was a great movie, let’s remake it!”

Friday, July 24th, 2009 | Movies, Reviews | No Comments

The Uninvited 2009I already complained about Hollywood Remakes in recent posts. Today my wife asked me, whether I already knew that “A Tale of Two Sisters” also was remade. Funny thing is, that I was going to watch the film quite soon anyway.

She gave me an interesting link that lists all the blockbuster remakes of Asian cinema since 1980. See for yourself if you have been fooled by a rip off of an Asian original. (Okay, sometimes the remake is better than the original, the exception proves the rule.) @see Remakes of Asian Movies

Tags: , ,

Review: Dolan’s Cadillac (2009)

Sunday, July 19th, 2009 | Movies, Reviews | 1 Comment

Dolan's CadillacScience teacher Tom methodically plots revenge against Las Vegas crime boss James Dolan (Christian Slater) who is behind the murder of Tom’s wife Francey, a witness to a mob execution.

I like Stephen King adaptations, even though they are often TV only or straight to VHS/DVD productions, some of them a really enjoyable on a rainy Sunday afternoon.

This one was quite nice, but it hasn’t the potential to become a low budget classics such as The Stand, It or even Sleepwalkers (90’s trash :D, yeah! Don’t take me wrong: one of my favourite films (and IMDb’s #1!!) is a 90’s King adaptation. Can you guess what it is?).

I guess the absence of a Stephen King typical evil force / parallel reality is condemning this flick to just another crime story. This is *not* a mystery or horror film!

6.5/10 – quite nice, but nothing special

Tags: , ,

eclipse IDE Subversion integration – Galileo still foobared

Sunday, July 19th, 2009 | IDE, Java, Tech-savvy | No Comments

eclipseAnother post on getting your subversion running in eclipse – this time it’s the brand new Galileo release.

To cut a long story short: SVN integration is still a manual tedious process. I don’t want to complain too much this time, now that I know it’s because of legal issues eclipse is not coming with an out of the box SVN support – even though the Subversive team provider is part of eclipse, the connectors cannot be published together with Subversive because the eclipse legal rules don’t support the connectors’ license, just Google it for more information)

So here is how you do it:

Help ->  Install New Software… -> Galileo -> Collaboration -> Subversive SVN Team Provider (Incubation)

Help ->  Install New Software… -> Add -> http://community.polarion.com/projects/subversive/download/eclipse/2.0/galileo-site/ -> SVN Connectors

Install all or just the connector you really need, my favourite one is SVN Kit, because it works fine and has svn+ssh:// support.

If you chose to install all connectors you can change the implementation in Window -> Preferences -> Team -> SVN

Tags: , , ,

Search

Categories