Jul 01

Via Bioinformatics Zen:

Jun 11

Chelsea

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Now I have at least one reason to root for any other team in England but Chelsea.

Zemanta Pixie
May 26

Thanks to Pawel, I got introduced to the world of Processing and I must say I am impressed. It is a nice tool to create nice data displays. In the picture below I used a source code that Pawel sent me modified to display duplicons (larger than 30 kilobases) blast hits (larger than 5 kilobases) on a small arm of an human chromosome. Of course due to the number of blast hits, different duplicons involved and the chromosome arm’s length (60 Mb) the arcs and the colours are not easy to format. But still there is a good amount of information regarding hit identity and similar duplicons in different chromosomal regions.

Processing and duplicons on human chromosome

May 10

A tiger in India's Bandhavgarh reserveImage via WikipediaI have no source, but the creation date of the text file is August, 7th 1995. Good luck.

Final Exam

Instructions: Read each question carefully. Answer all questions. Time Limit: 4 hours. Begin immediately.

History

Describe the history of the papacy from its origins to the present day, concentrating especially, but not exclusively, on its social, political, economic, religious, and philosophical impact on Europe, Asia, America, and Africa. Be brief, concise, and specific.

Medicine

You have been provided with a razor blade, a piece of gauze, and a bottle of Scotch. Remove your appendix. Do not suture until your work has been inspected. You have 15 minutes.

Public speaking

Twenty-five hundred riot-crazed aborigines are storming the classroom. Calm them. You may use any ancient language except Latin or Greek.

Biology

Create life. Estimate the differences in subsequent human culture if this form of life had developed 500 million years earlier, with special attention to its probable effect on the English parliamentary system. Prove your thesis.

Music

Write a piano concerto. Orchestrate and perform it with flute and drum. You will find a piano under your seat.

Psychology

Based on your degree of knowledge of their works, evaluate the emotional stability, degree of adjustment, and repressed frustrations of each of the following: Alexander of Aphrodisias, Rameses II, Gregory of Nicea, Hammurabi. Support your evaluations with quotations from each man’s work, making appropriate references. It is not necessary to translate.

Sociology

Estimate the sociological problems which might accompany the end of the world. Construct an experiment to test your theory.

Management science

Define management. Define science. How do they relate? Why? Create a generalized algorithm to optimize all managerial decisions. Assuming an 1130 CPU supporting 50 terminals, each terminal to activate your algorithm; design the communications interface and all necessary control programs.

Engineering

The disassembled parts of a high-powered rifle have been placed in a box on your desk. You will also find an instruction manual, printed in Swahili. In ten minutes a hungry Bengal tiger will be admitted to the room. Take whatever action you feel is appropriate. Be prepared to justify your decision.

Economics

Develop a realistic plan for refinancing the national debt. Trace the possible effects of your plan in the following areas: Cubism, the Donatist controversy, the wave theory of light. Outline a method for preventing these effects. Criticize this method from all possible points of view. Point out the deficiencies in your point of view, as demonstrated in your answer to the last question.

Political science

There is a red telephone on the desk beside you. Start World War III. Report at length on its socio-political effects, if any.

Epistemology

Take a position for or against truth. Prove the validity of your position.

Physics

Explain the nature of matter. Include in your answer an evaluation of the impact of the development of mathematics on science.

Philosophy

Sketch the development of human thought; estimate its significance. Compare with the development of any other kind of thought.

General knowledge

Describe in detail. Be objective and specific.

* * E X T R A C R E D I T * *

Define the universe; give three examples.

Apr 23

Do you know the answer to the above question? No? Me neither, but I can offer some suggestions. On a daily basis, a bioinformatician is exposed to hundreds of applications, computer languages, websites, you name it. Some of them are commercial, some of them free and open source. Some of the academia-developed software are open-source, some of them are not.

A good portion of the academia-developed software are published in scientific journals, as an 2-page application note in Bioinformatics , or on a longer paper on BMC Bioinformatics, just to name two of the journals of the field.

I cannot complain of non-published applications. Usually they are free, open, and were developed during someone’s spare time. I have the option of not using them, or modify them or helping the developer to improve it. I cannot complain of lack of documentation, bugs, minor errors or even the lack of an interface.

On the other hand, I can complain about published applications. Usually they are also free, but not always open and they were developed with a publication in mind, or at least as a mean towards a publication. It should have proper documentation, be slightly portable (yes, that’s important, so if you are developing your next groundbreaking phylogenetic tool in OCaml, distribute the executable, don’t ask me to “compile” or install OCaml) and be easy or moderately easy to use. I give a break for the lack of interface.

And, why should I expect such things from a published application? Because, apart from the developers, at least one editor and two reviewers have (supposedly) tested (or glanced over) the application. Usually, scientific journals ask the authors to provide a copy of the package that is being submitted to publication. Basically one should include everything that is going to be installed, compiled, etc. So, a simple manual or readme, installation instructions, source code, executable(s), you name it, should be in the package.

I can say, from personal experience, that the majority of the published applications will have most of the items required for a user-friendly experience. But many fail in at least one of these aspects:

  • poor (or nonexistent) documentation
  • errors and bugs, too evident to be missed by the reviewers/editor
  • far from being user-friendly, even the command-line ones
  • non-portable (even between Linux distros)

This makes me ask myself: Are the editors/reviewers testing, checking, using these applications at all, before accepting the manuscript? I guess not, or it seems that they are not. Some errors you see in some applications are easy fix, and wouldn’t harm the program’s merit, but in the end would greatly improve user satisfaction and require less email exchanges with developers/scientists regarding bugs and errors. After all, you already published it, nobody told you that you have to support it, right?

This brings us to the original question: how to improve scientific software? Simple … no, not really.

Far from having the ideal solution, I will add my two cents:

1- Journals can to create a more rigorous publication process for applications, what would include more testing from the editor/reviewers. That would make the review process slower, will make editors and reviewers to spend time, that they already have in short supply, on a job that they are not compensated for. Everybody will be unhappy and the process fails. So, if journals are willing to publish application only manuscripts, why not have an in house testing facility, at least to check for basic things that the authors claim their software should do? Too expensive for the journals? Maybe, maybe not.

2- Publication of applications per se should be abolished. If you want to release an application and publish it, be sure you prove the merits of it with a publication that includes the software, its application and results that are scientifically relevant. This way we also abolish the publication-of-the-main-project, along with the paper on the application-that-was-used-to-generate-the-results and the paper about the program-that-was-used-to-display-the-results-generated-by-the-other-application. This way we would also foster collaboration between pure bioinformatics/application development groups with wet-lab groups.

3- Create a centralized scientific software repository, something like Sourceforge, and let people contribute, collaborate, code and develop and when the application is mature enough, publish it and give credit to everyone that had some input. Depending on the quality and amount of time dedicated to a project, the person would be a co-author or cited in the acknowledgments. This would also increase collaboration, people interested can learn and teach, exchange ideas and develop good application development standards, different computer languages, etc. Maybe you might not even need to publish the software in a journal, the user input and evaluations would be enough to solidify a good developer/scientist CV.

All options above are not mutually exclusive, they can be implemented at the same time, or they can’t be implemented at all. Would Open Science benefit from this? I bet so.

Apr 21

He got it wrong. Nobody is “wrong” on the internets, everyone is right. Always.

Apr 21

jModelTest

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

ResearchBlogging.org

tested on a Pentium 4, 3 Ghz running Fedora Core 8

David Posada released sometime ago jModelTest, a program that tries to be the ultimate maximum likelihood model selector for phylogenies. Written in Java, jModelTest is cross-platform, but differently from the latest available Java-based programs that are executables, the program is distributed as a jar file. Other programs like TreeStat have “compiled” versions for Macs and Windows, so why not jModelTest?

The manual mentions that it was created using Xcode under OS X and even though Xcode provides a great environment for interface design, jModelTest interface is very simple, poor. It is basically a non-shrinkable window with a text editor/log occupying its entirety. All actions are provided with menus, no buttons or other widgets at first. Some example files are bundled with the package, and they show the versatility of jModelTest that allows input of different file types, phylip, nexus among others. This is an excellent aspect of the application that should be followed by other packages and programs.

After firing up jModelTest, I decided to load one of the files, example.phy, composed of 10 sequences of 1000 nucleotides. After using the File menu, it was easy to notice a major problem with the menu items. All of them have menu accelerators, simple key combinations to easy access menu items from the keyboard, but they are noted as Meta + letter (I am having trouble uploading images on Wordpress, I will find a solution later). OK, the Meta (option) key on a Mac is easy to get, and Windows and Linux? The file loaded fine but so my next guess (yes, I don’t read manuals upfront) was to use the Analysis menu. Luckily only one item was enabled, the one to calculate the likelihood scores.

ModelTest, that can be called a predecessor of jModelTest, used PAUP externally to calculate the likelihood scores for different methods. Now, Phyml is used on the background, hence the average to good speed of jModelTest calculations. After selecting Analysis->Compute likelihood scores the program presented a simple dialog with some options to set the likelihood calculation parameters. I tested the ML optimized topology with 11 substitution schemes (default). It took roughly 4 minutes to calculate the 88 available models for the example.phy file, with a good amount of log information output to the main screen. After that, there were three other items enabled in the Analysis menu, allowing for different statistical calculations of the best ML model. It was possible to generate AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion) and DT (Decision Theory) to determine the best (or ideal) ML evolutionary model while hLRT calculation was not enabled. All three were performed with default parameters, with AIC and BIC being very fast and DT very slow and memory hungry.

The results from the statistical analysis were displayed in the same log text box and in some places very confusing

There are 88 models in the 100% confidence interval: [ HKY+G TPM2uf+G TrN+G TIM2+G TPM1uf+G TIM1+G TIM3+G TPM3uf+G GTR+G TVM+G TVM+I+G GTR+I+G TPM3uf+I+G TIM3+I+G TPM2uf+I+G TIM2+I+G HKY+I+G TrN+I+G TPM1uf+I+G TIM1+I+G TPM1+I+G K80+I+G TPM2+I+G TPM3+I+G TrNef+I+G TIM1ef+I+G TVMef+I+G TIM3ef+I+G TIM2ef+I+G SYM+I+G K80+G TPM1+G TrNef+G TIM1ef+G TPM3+G TPM2+G TIM3ef+G TPM1uf+I TPM2uf+I TVMef+G TIM2ef+G HKY+I TIM2+I TPM3uf+I TrN+I TVM+I TIM1+I SYM+G GTR+I TIM3+I K80+I TPM1+I TPM3+I TrNef+I TPM2+I TIM1ef+I TIM3ef+I TVMef+I TIM2ef+I SYM+I F81+I+G F81+G JC+I+G JC+G F81+I JC+I TIM1 TIM3 TrN TPM3uf TPM1uf TVM TIM2 HKY GTR TPM2uf TIM1ef TrNef TPM1 K80 TIM3ef TPM3 F81 TIM2ef TPM2 SYM TVMef JC ]

and in other places misleading

Model selection results also available at the “Model > Show model table” menu

as there is no such menu entry. Maybe the developer was referring to Results->Show Results table menu, but you never know. This results table presented all the values calculated for all test models, pointing out the suggested model in red. It was clearer to check the values in a grid than the main log window/text interface. There were some other tools in jModelTest to calculate the likelihood ratio test and model-averaged phylogeny. Also most calculations allowed the generation of a PAUP block, which in still widely used.

As a package, jModelTest is somewhat solid, even though the interface is bland and buggy (menu accelerators, wrong information in logs, it does not access help in Linux), computation of certain tasks is time consuming (Java?) and the manual contains some wrong terms (Operative (sic) Systems). It is easy to use and should be a good replacement for ModelTest (and PAUP).

Apart from the ease of use and the analytical capabilities, there are some issues with the jModelTest:

  • the program is distributed under the GPL, but there is no source code in the package and no information on how to get the source in the manual or the webpage. Should I write the author?
  • indeed is great the jModelTest makes possible to break free from the PAUP relationship, but at the same time is a double standard to use other applications under-the-hood and not allowing ModelTest to be distributed the same way in the past. As an excuse to control all possible copies and downloads of ModelTest, Posada didn’t allow me to distribute the ModelTest executable with MrMTgui. So much for GPL’ing your programs
  • the download process is very ‘95ish, where you have to enter name, email and institution in order to get the link. Someone should invent access logs …

Posada, D. (2008). jModelTest: Phylogenetic Model Averaging. Molecular Biology and Evolution DOI: 10.1093/molbev/msn083

Apr 15

We are releasing multiGUI (Windows Vista and XP) as a limited beta. If you are interested in trying the software send an email to multigui at genedrift dot org. We are intending to provide a group of 20-30 users the opportunity to try the program, what would allow us to work on future improvements, kill some bugs that weren’t found in the development/test phase and check for the level of interest of our application in the scientific community.

Apr 14

Twitter

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Attila Csordas created @biotecher in Twitter so we, you, everyone can follow what every other bioinformatician, scientist, human being is doing. Just join Twitter and follow @biotecher, and me if you want.

Apr 14

I had problems installing Google’s AppEngine in Vista. I had Python 2.5.1 installed in my machine but every time I tried to install the msi package it failed, claiming that Python was not present, even though C:\Python25 was in the path. AppEngine issues site did not help much either, the “solution” listed there was to make sure Python was in the path.

So, I decided to start over. I removed Python (and ActiveState Python, which I installed before to see if AppEngine would work) and re-installed it, or tried to. Strangely, Python’s msi package was installing it in the C drive root, not under Python25. For half an hour I tested all possible combinations, versions and tricks to have it installed in the proper directory/folder. Then I remembered msiexec, a command line tool that runs msi packages. Running msiexec with no parameters shows a window with options, but basically /i is the the only required. /i tells msiexec the input package, while ‘code>/qb make the installation quiet and with a basic interface. msiexec worked flawlessly and then Python was in its right place.

msiexec /i python-2.5.2.msi TARGETDIR="C:\Python25" /qb

Then the same “trick” was used with AppEngine, but without a TARGETDIR. Bingo, it installed perfectly (I am assuming that).

Now, I just have to wait for my AppEngine account.