Book "A Computer Science Guide for Every Programmer"

imageHello Habitants! A colossus with feet of clay is what you might call a programmer without a background in Computer Science. Confident mastery of the basics allows you to "not reinvent the wheel" and build effective solutions into the software architecture. All this eliminates errors and excessive costs for testing and refactoring.



It doesn't matter if you feel out of place when other programmers are discussing the approximate limit. Even experienced experts make mistakes because they have forgotten Computer Science.



Why this book is needed



Many of my (author's) fellow developers came into the profession from a wide variety of fields. Some have higher education in Computer Science; others studied photography, mathematics, or did not even graduate from university.



In recent years, I've noticed that programmers are increasingly eager to learn Computer Science for a number of reasons:



  • to become good programmers;
  • to answer questions about algorithms during interviews;
  • to satisfy their curiosity in the field of Computer Science, or finally to stop regretting that at one time they did not have the opportunity to master this subject.

    This book is for all of you.


Many will find here topics that are interesting in their own right. I tried to show in what real (non-academic) situations this knowledge will be useful. After reading this book, I want you to gain the same knowledge as after studying the basic course in Computer Science, and also learn how to apply it.



Simply put, the purpose of this book is to help you become a more skilled and experienced programmer through a better understanding of Computer Science. I can't squeeze 20 years of college teaching and professional experience into one book ... but I'll try to do my best. I hope that you will find at least one topic here, about which you can say: “Yes, now I understand this,” and apply knowledge in your work.



What you won't find in the edition



The meaning of the book is to help the reader better understand Computer Science and apply knowledge in practice, and not at all to completely replace four years of study.



In particular, this is not a book of proof. Indeed, in the second volume of the book , methods of proof are considered, but standard algorithms are usually presented here without proofs. The idea is for the reader to know about the existence of these algorithms and learn how to use them without going into details. As a proof book written for undergraduate and graduate students, I highly recommend Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (authors are usually grouped under abbreviation CLRS).



This is not a programming book either: you won't find guidelines on when to use int and when to use double, or an explanation of what a loop is. It is assumed that the reader will be able to understand the pseudocode listings used to describe the algorithms (all programs in this book are written in pseudocode based on the C language). The purpose of the book is to connect the concepts of Computer Science with programming techniques already familiar to the reader.



10. Dynamic programming

10.1. Missing fields problem



Suppose we have an n × n chessboard missing several squares. How to find the largest k × k piece of board without missing fields?

If you haven't faced such a problem before, take a few minutes to write a solution and determine the execution time of your algorithm.



Faced with this problem, I reasoned as follows. Each checkerboard field can belong to many of the largest areas, but only in one of them it can be the upper left corner. If I mark each box with the size of the largest intact area of ​​which it is the upper left corner, then the field with the largest such mark will correspond to the desired area.



Suppose the board is represented as an n × n matrix in which each cell contains 1 if the corresponding field is present and 0 if it is missing. If the current cell value is 0, then the corresponding field is absent and cannot be part of a contiguous segment, so it does not need to be changed. If the value is 1, then we can replace it with a number that is one more than the minimum value of the three cells located below and to the right.



We change each cell of the matrix once, including checking if the cell's value is zero, checking the values ​​of up to three more cells, and writing the new cell value. Each of these operations takes O (1) time, so the time it takes to process the entire checkerboard isimage



Note that this is linear, not quadratic, execution time of the algorithm - on a chessboard n (we assume that n is the total number of fields, and we adhere to the usual convention that n is the amount of input data) fields (some of them are absent), therefore the total time spent by the algorithm is proportional to the number of fields. If we more accurately denote the size of the chessboard as √ n × √ n, then we get n fields and the total execution time is O (n).



10.2. Working with overlapping subtasks



The approach used in this section is called dynamic programming. It is used when a problem can be divided into several subtasks, the solution of each of which will be used several times. This approach differs from the principle of "divide and conquer", when the problem is divided into subtasks that are solved independently of each other. In the chessboard problem, each subproblem depended on the solutions to three other problems, and the solutions to all the subproblems were saved for further use.



Dynamic programming is usually done by building tables as shown above. This means solving a problem using a bottom-up approach, where we start by solving the smallest subproblems and work our way up until we come to an answer. Another method is memoization, where we go from top to bottom, solving sub-problems as needed and caching the results for reuse. Building tables is the preferred option when you need to solve each subproblem (in my example with a chessboard, it was necessary to find the largest intact area for each field of the board), since the costs of this method are less than with memoization. If some subtasks from the solution area do not have to be solved, then memoization allows you to solve only those subtasks that are really necessary.



image


The Key Point



Where divide and conquer involves dividing a task into several independent subtasks, dynamic programming means dividing a task into several overlapping subtasks. The solution to each subproblem is cached for later reuse.


10.3. Dynamic programming and shortest paths



Consider the problem of finding the shortest path: for a given graph with weighted edges, you need to find a path from one node to another that has the lowest cost.

Definition An



edge-weighted graph is a graph in which each edge has its own weight (cost). The cost of a path from one node to another is determined by the sum of the cost of all traversed edges.


Suppose we find a path between nodes s and t that contains the third node v. Then the path from s to t must contain the shortest path from s to v, because otherwise we could replace this segment with a shorter path and reduce the length of the shortest path from s to t, which contradicts the initial condition (this is Bellman's optimality principle).





( ) , . , . , .



, , .



The problems of finding the shortest path are striking examples of dynamic programming, since the optimal property of a substructure is intuitively clear - it is obvious that the fastest way to go from point A to point C through point B is also the fastest way to go from point A to point B and from point B to point C. The number of algorithms based on this principle includes the Bellman - Ford method, which finds the shortest path from the starting point to any vertex of the graph (or from any vertex of the graph to the end point) and the Floyd - Warshall method - with its help the shortest path between each pair of graph vertices is calculated. In both cases, the idea is to start with a small subset of nodes close to the nodes of interest, and gradually expand that set using the nodes already found to calculate new distances.



image


10.4.

10.4.1. Git merge



Another task that is commonly used to demonstrate dynamic programming is finding the Longest Common Subsequence. The task is to find, for two given strings A and B, the longest sequence that occurs in both strings while maintaining the sequence of characters. Characters in strings do not have to be in a row; for example, if A = {acdbef} and B = {babdef}, then {adef} will be their common subsequence.



When merging changes in Git (merge), it searches for the largest common subsequence for master and working branches. Characters present in master but not present in the largest common subsequence are removed; characters that are in the working branch but are not in this subsequence are added to master.



10.4.2. LATEX



The LATEX document preparation system is often used to create technical documents. One of the tasks of the typing system is to align text to the right and left at the same time; to do this, the spaces between words and characters are stretched or compressed so that all lines are the same length. Another way to align text is to wrap the last word so that part of the word appears on the next line. LATEX (From a technical point of view, the TEX text input system does almost all the work; LATEX is built on top of TEX. Here I use LATEX for simplicity reasons) tries to find optimal line break points to make the text look nice. If this fails, then several lines in a row will end with a word hyphen, or the distance between words in different lines will differ.LATEX has a set of rules for evaluating alignment failure. The program seeks to find the "least bad" option.



If there are n possible line break points in a paragraph, then there are imagepossible options for breaking the text. The “failure” of the selection for each break point depends on which break points were selected before it. Hence, we have overlapping sub-tasks again. The use of dynamic programming techniques reduces the execution time to imagewhich can be improved with additional methods.



»More details about the book can be found on the website of the publishing house

» Table of Contents

» Excerpt



For Habitants a 25% discount on coupon - Computer Science



Upon payment for the paper version of the book, an e-book is sent by e-mail.



All Articles