This should not be here … or Notes to a bioinformatician – two years later

- Image via CrunchBase
I was contacted a couple of months (maybe more) to write a guest entry for their blog. I did that and on March 11th I sent the text you see below. It’s not my best creation (and not the worst, believe me), but as they’re taking so long to publish it and I don’t want it to go to waste, I’m publishing it.
—
Exactly two years ago (or almost exactly), I posted a follow-up blog entry to Notes to a young computational biologist. I still believe that 97% of all advice is worthless, but I also believe that it’s worth sharing your experiences as it might be useful to someone in the future.
In this guest blogging, I will examine those two-year old posts, mixing, matching, remixing, adding and deleting thins. Consider this my four cents on the subject.
1- Scripting language – Python and Ruby are gaining space in biological sciences, stick to one of the two. But don’t forget Perl, as it has the best bio-related functionality (BioPerl). And I still stick to my original point, shell scripting is a key-point, bash (or any other shell scripting), awk and sed are a must to work with large amounts of data and text. It will also improve command-line skills and make computer usage more efficient.
2- Programmable statistics package – In 2007, I agreed with Bosco and just add the point of R, but now I partially disagree with him. Stick with R or Numpy, free applications/packages. Throw away your Matlab, Excel or any other proprietary solution. I’m not saying you shouldn’t learn them, at least have some basic knowledge on how to use them, and take advantage if your university or boss (or yourself) has a licence to any proprietary package. It might be handy in the future, but I would stick to the free stuff, especially now with this economy.
3- Text files, lab notebook – Yes, as a computational biologist you don’t need a physical notebook, you need a wiki or a project management software. Something that is easy to update and manage so you have a good timeline of what you’re doing or have done in the past. I used to have small log files in each directory so every time an action is performed, the log is updated. Now you can do that updating a wiki from the command line or any project management software. There are free wikis that can be installed and managed locally in your desktop/laptop, or remotely.
4- Config files – Two years ago I would care about config files, but not today. I prefer to have different parameters in one script that starts one application. That’s why we have scripting languages and shell scripting, so we don’t need to add a third element to the mix. Learn how to run external processes from your favourite scripting/shell language and you’re set.
5- Delete data you don’t need – You need all your data. Storage is cheap, cheaper than in 2007. Buy an external disk and don’t mind deleting data you don’t need. It might be useful in the future. And you can help stimulate the economy by buying storage. I stick to my 2007 advice: backup, compress and keep your data.
6- Command line plotting program – R and Matplotlib (for the Python crowd), but there are great packages for Ruby and Perl. You don’t need anything else. If you are into Java, Processing is all you need to create beautiful plots, just ask Ben Fry.
7- Code editor – Find something you’re comfortable with, either Vim, Emacs, Kate. Nowadays I use Textmate and Eclipse. And you have to learn them inside and out.
8- Version control system – Learn one, be Darcs, Mercurial, Subversion or Git. Stay away of CVS (does anyone still use it?). Get an account on FriendFeed, read their blogs, their comments, experiences, difficulties. Being a scientist is also being social. Get out of the cocoon and out in the open.
In essence, not much has changed in two years, and I see a greater and greater online presence and collaboration among science groups. Be in the know.
About this entry
You’re currently reading “This should not be here … or Notes to a bioinformatician – two years later,” an entry on Blind.Scientist
- Published:
- 04.17.09 / 11am
- Category:
- Bioinformatics - opinion
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_a.png?x-id=64f075c9-b768-4f40-a86b-7005dd52f3b3)
Comments are closed
Comments are currently closed on this entry.