How much programming should scientists know

What you should know as a bioinformatician - Part 2

In the spirit of the first part, we would like to summarize what we currently see as required “basic knowledge” for bioinformaticians. Maybe it will help one or the other self-learner.
First things first: English. Even if you somehow manage to get a job somewhere where everyone can speak German - and the chances are slim - 99.9% of the specialist literature is English. This applies to both bioinformatics itself and biology and computer science separately. As far as we know, there are no German-language blogs that seriously deal with bioinformatics. More or less everything that happens in science is communicated in English. Like it or not, it's the only way to communicate with colleagues.

Which then leads directly to the next point: Communication skills. Due to the interdisciplinary nature of the subject, it is practically impossible to work alone in his office alone in order to emerge after years with exciting results. Instead, you will almost certainly collaborate with other people, often with a background in biology or computer science. And sooner or later you will have to rely on the help of both groups and you will have to be able to communicate. The translation from the language of one subject area to the other is often required. Retiring to the monastery following Mendel's example is therefore not an option.
How much biological knowledge is needed is difficult to classify. The field of bioinformatics itself is too diverse for that.

It ranges from the areas that are more in molecular biology, which are largely biologically based on DNA and protein sequences, to automated image analysis. Anyone who works with plants like Philipp does not need to know a lot about epigenetics in the human genome or how the human eye color is created; For this he / she has to know, for example, what photosynthesis has to do with salt resistance. In general, you should know how genomes work, how cells process and distribute information, how the entire process from DNA to protein works. In general, however, it is easier to learn more about computer science than about biology. How-tos, blogs, question-and-answer pages such as stackoverflow etc. with a focus on programming are a dime a dozen, but understanding why a PCR failed somewhere is much more difficult to understand without rudimentary laboratory experience. Practical experience in molecular biology in particular can later help to identify why the input data for your analyzes may look so terrible. And last but not least, the biological expertise helps to check whether your results make sense in the context at all.

Mick Watson goes one step further, who also takes up the curriculum ideas and concludes in his blog post: “I may appear as if I’m being mean, but actually biological knowledge, and knowing how to apply it, is the most important“ competency ”(aka skill) that a bioinformatician can possess. In a field full of techies, the thing that will make you stand out is your biological knowledge, not your impressive array of awk one-liners. ". Translated: “It looks like I'm mean, but real biological knowledge, and the ability to apply it is the most important skill a bioinformatician can have. In a field full of techies, what makes you stand out is biological knowledge, not impressive awk one-line programs. "

Then there is that Computer science knowledge: Most of the bioinformatics software only runs under Linux, so you should be familiar with it and not be afraid of the command line. In addition: Bash (makes any work under Linux easier), a script language such as Python or Perl, and for speed reasons a compiled language such as C ++, C (if you like to shoot yourself in the foot) or more recently Go and D (if you want a little more want to experiment). In addition, one should know about algorithms and data structures - e.g. why is looking up data in a dictionary in Python so much faster than in a list? Anyone who has to do with statistics a lot (and this is usually the case in bioinformatics) should also learn R - you don't have to be able to program R to use it, however. If you only use publicly accessible packages or methods in R, you don't need to know the difference between “S3” and “S4” in R, for example.

Nowadays there are many, many sources for self-learners: books are a dime a dozen, but it is currently difficult to name “the” book on bioinformatics: there are too many sub-sections for that, and the field is changing too quickly. MOOCs like Coursera have bioinformatics courses, e.g. Bioinformatics Algorithisms.
Those who prefer to “do” rather than read will find a collection of programming tasks tailored to bioinformatics at Rosalind, Project Euler is a similar collection but requires more mathematical knowledge, and HackerRank is a collection of tasks that can only be dealt with with knowledge of “advanced” algorithms and solve data structures. None of the three projects dictate the programming language for the user.

Speaking of which statistics, as a central component of bioinformatics, should not be missing either. Philipp's new favorite book is “Intuitive Biostatistics” by Motulsky, just came out in the 3rd display. As far as we know there is no German translation. Covers most of what a biologist or bioinformatician should know, while staying comfortably far away from any formulas. The book describes most common methods in the light of their basic assumptions (e.g. whoever compares two populations using the t-test assumes that all measurements are independent of one another, and both populations are normally distributed) and shows how to interpret the results and what can go so wrong there. Almost no one calculates statistical results by hand - there is, for example, R. The book is also worthwhile for "normal" biologists.
These are Self-organization skills mandatory. Anyone who has ever stood in the laboratory will be used to writing down all the work steps in the laboratory notebook. The same care naturally applies to bioinformatics while you are working on your computer. What was done with which data must be documented just like the software that is being written. Nobody wants to have analyzes done that are later no longer traceable (and therefore worthless). And if you look into your code 2 years later to fix a bug, you thank yourself for every comment you left. It is just as annoying to have raw data misplaced somewhere on the hard drive.

Philipp is involved with Software Carpentry, a non-profit organization whose goal is to teach scientists “proper” programming. In other words: programming (also object-oriented) in Python or R, version management via git, reproducible work via Make, data management via SQL, documentation, etc. The whole thing is dealt with in 2 to 3-day workshops.
If you would like to take part in a workshop, the list of future workshops can be found here. Participation costs vary between workshops, depending on whether the venue wants money for the rooms, whether people have to be flown in, etc.
Those who prefer to learn self-organization in an even less formal setting can also consider helping with open source projects. In addition to programming experience, there are also organizational and communication skills. We both have had a good part of our experience with openSNP ourselves learning by doing / failing to get.

In addition to self-organization is also Self help asked. As already mentioned, for many problems that one encounters sooner or later (also here: mostly earlier), at least from the IT side, there are already solutions to problems in the vastness of the WWW, one just has to find them. Solid Google skills are therefore imperative. Be it to solve problems in your own software or just to install third-party software that causes problems (which unfortunately is still the rule and not the exception in bioinformatics). The effective formulation of search queries is just as necessary as perseverance in the search. And if that doesn't help, you can use your communication skills to formulate reasonable requests for help to other developers or communities.

Learning critical or scientific thinking “directly” is difficult. Those who regularly romp through subject-specific blogs (such as Getting Genetics Done, Living In An Ivory Basement, The Genome Factory, or opiniomics and much more ...) see critical and scientific thinkers at work too. Maybe something will rub off.
Anyone who is put off by this list of required skills should also know that you don't have to be an absolute high-flyer in any of these areas from the start. We didn't go out into the world as professional bioinformaticians in 2008 either, as our first (also jointly written) programs show, which fortunately made it through to this point: In German-written scripts that were of course not in version control at the time. And in addition, only reproduce functionalities that already existed as standard in software packages at that time (which we did not know because we could not google properly). So what should also be on the list: Willingness to learn.

In the sense: Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. - Samuel Beckett

Philipp has a bachelor's degree in biology, a graduate certificate in IT and is currently studying for his master's degree in IT in an exaggeratedly large country full of spiders and sheep. For beerology, he mostly writes about biology, evolution and everything else that washes up on the edges of the areas.