Basic bioinformatics using command line

As sequencing technologies become more advanced and widespread, so does the need to be able to analyse the vast amount of data these technologies produce. There are numerous freely available online tools that can be used to analyse bacterial sequence data, many of which can be found as resources on the course pages for the Module 1 and 2 courses or is taught during those courses. However working and with and analysing genomic data using command line tools is a way to work faster, easier and more efficient by automating repetitive tasks and allows you to combine smaller tasks into more powerful workflows.

It can be very daunting to start down the path of bioinformatics using the command line without a strong background in computing and/or computer science. Here we have put together a list of introductory courses to command line which is already freely available online as well as a list of sites that offer in-person or online courses for a course fee and links to useful tools or guides. We especially recommend entering courses in online universities.

Learning command line is like learning a new language and the learning curve can be steep. Do not be discouraged and remember "Google is your friend!". Whenever you encounter a problem, someone else has probably encountered it before and there can be great help in googling.

This recent paper also provides ten simple rules that can help you when starting your command line journey:

Brandies and Hogg 2021 Ten simple rules for starting with command-line bioinformatics. PLOS Computational Biology

The first step

The first part to working with command line is to find the terminal on your computer and gain an understanding in how to navigate it. For this we recommend The Unix Shell from Software Carpentry. This free site starts with "Setup", where the user is shown how to open a Unix shell in the different operating system and download the data needed for the lessons. Afterwards the user is walked through different topics as navigation, directories, pipes, loops and scripts. The site does not introduce any bioinformatic topics but rather gives a step-by-step guide to installing and working in a Unix shell.

A similar alternative is Unix Tutorials for beginners. The tutorial is made of eight lessons of written instructions and picture guides, which guides you through the main subjects needed to use Unix. However, there is only very limited guiding in how to set up a Unix Shell.

Beginning to work with computer science

Beginning with command line bioinformatics is like learning a new language and way of thinking. Understanding the how to utilize computational power available on your PC or HPC and the terms used by other bioinformaticians is essential. This video goes through 10 rules for starting to work with command line. The video explains some of the computational expressions and considerations a biologist/microbiologist would need to know when beginning to use command line bioinformatics.

Recommended Coursera courses

Coursera offers courses from leading universities providing a learning platform where students can watch lectures online at any time, take quizzes and read course material. You can either enter single courses or enroll in a Coursera Specialization. If you enter the Bioinformatics Specialization you will automatically enroll in a series of bioinformatic courses with a natural progression in difficulty level and topics. The courses recommended by SEQAFRICA are free, however certificates upon completion of the courses are provided for a fee. To receive a certificate for completing a Specialization you will also have to complete a hands-on project. Coursera provides scholarships and financial aid to cover the cost of certificates on a single-course basis or for an entire Coursera Specialization if you cannot afford it yourself. A guide to who qualify for the scholarships/financial aid and the application process can be found here.

Biology Meets Programming: Bioinformatics for Beginners - University of California, San Diego introduces programming in Python in a bioinformatic context in this course. No previous coding experience is required to follow the course. The course uses an interactive textbook and exercises to walk the students through biological problems. The course provider recommends beginning with this course and completing your bioinformatic training with their Bioinformatics Specialization.
Bioinformatics Specialization (Coursera Specialization) - University of California, San Diego provide this Bioinformatic Specialization for a fee, which can be covered by a Coursera Scholarship. The Specialization is made of 7 courses in which the student is introduced to topics as sequencing, DNA replication, molecular evolution etc. from a bioinformatic view. If you follow the course in the "Honors Track", you will be met with exercises of applying the course topics in computational challenges. We recommend this Specialization if you need to brush up or are unfamiliar with bioinformatic topics and would like to begin using command line tools.
Genomic Data Science Specialization (Coursera Specialization) - John Hopkins University offers this Specialization for a fee, which can be covered by a Coursera Scholarship. The Genomic Data Science Specialization teaches the student the tools and techniques needed to work with NGS data by Command Line.. It introduces the student to working in an Unix environment and use software, as R and Python, for managing big biological data.. A total of 6 courses are included in the Specialization and we especially recommend Course 4: Command Line Tools for Genomic Data Science. We suggest this Specialization to students already familiar with bioinformatic topics wanting to begin using command line, as the courses in the Specialization focuses on the practical aspect of using command line tools for analyzing bioinformatic data and less on the theoretical part.
The Unix Workbench (Coursera course) - John Hopkins University offers this course to introduce students to using Unix and working in a command line interface. The course is designed for students who does not have any prior programming experience and does not focus on bioinformatics but rather provides the student with a foundation for working with Unix in every context. The student is also introduced to GitHub and bash scripts.

Other courses

Some of the courses on these sites are freely available, others have participation fees.

NGS Academy - for Africa CDC Pathogen Genomics initiative. The site provide an overview and links for a variety of pathogen surveillance courses for NGS. The site has live online courses and previously courses. The site provides links to sessions from other organizations covering working with NGS.
H3ABioNet - Pan African Bioinformatics Network for H3Africa provides courses on bioinformatics on a regular basis and has other resources available on their site. The site offers training both online and in local classrooms in Africa. The training is run in live sessions and the students will need to be available regularly over a period of time to enter the courses.

Tools and resources

Rosalind - A free bioinformatic learning platform. It guides you through installing, setting up, and using Python and provides exercises for following the accompany Bioinformatics Algorithms: An Active-Learning Approach by Phillip Compeau & Pavel Pevzner, which is freely available in a non-interactive version.
Explain shell - A site where you can enter lines of code and have it broken down into the separate commands with explanations
Command-line cheat sheets - A single sheet papers for download/print containing most of the basic commands
Command-line cheat sheet for Linux - A single sheet containing the most used commands for Linux. The page is set up for easy printing.

Disclaimer

Information presented on this website is considered public information (unless otherwise noted) and may be distributed or copied. Use of appropriate byline/photo/image credit is requested. While The National Food Institute makes every effort to provide accurate and complete information, various data such as names, telephone numbers, etc. may change prior to updating. The National Food Institute welcomes suggestions on how to improve our home page and correct errors. The National Food Institute provides no warranty, expressed or implied, as to the accuracy, reliability or completeness of furnished data.

Some of the documents on this server may contain live references (or pointers) to information created and maintained by other organizations. Please note that The National Food Institute does not control and cannot guarantee the relevance, timeliness, or accuracy of these outside materials.

INFORMATION

Technical University of Denmark, National Food Institute

CVR: 30060946

AdDress

Denmark

DK-2800 Lyngby

Kemitorvet Building 204

Contact us

Tlf.: +45 3588 6601

E-mail: suska@food.dtu.dk

Created and hosted by Group Online