Programming Guidelines
My opinionated programming guidelines.
1. Introduction
About this README
I was born in 1976. I started coding with basic and assembler when I was 13. Later turbo pascal. From 1996-2001 I studied computer science at HTW-Dresden (Germany). I learned Shell, Perl, Prolog, C, C++, Java, PHP, and finally Python.
Sometimes I see young and talented programmers wasting time. There are two ways to learn: Make mistakes yourself, or read from the mistakes which were done by other people.
This list summarises a lot of mistakes I did in the past. I wrote it, to help you, to avoid these mistakes.
It's my personal opinion and feeling. No facts, no single truth.
I need your feedback
If you have a general question, please start a new discussion.
If you think something is wrong or missing, feel free to open an issue or pull request.
Relaxed focus on your monitor
Do not look at the keyboard while you type. Have a relaxed focus on your monitor.
I type with ten fingers. It's like flying if you learned it. Your eyes can stay on the rubbish you type, and you don't need to move your eyes down (to keyboard) and up (to monitor) several hundred times per day. This saves a lot of energy. This is a simple tool to help you to learn touch typing: tipp10
Measure your typing speed: 10fastfingers.com
Avoid switching between mouse and keyboard too much.
I like Lenovo keyboards with track point. If you want more grip, then read Desktop Tips "Keyboard"
Once I was fascinated by the copy+paste history of Emacs and PyCharm. But then I thought to myself: "I want more. I am hungry. I want a copy+paste history not only in one application, but I also want it for the whole desktop". The solution is very simple, but somehow only a few people use it. The solution is called a clipboard manager. I use CopyQ. I use ctrl+alt+v to open the list of last copy+paste texts. CopyQ supports regex searches in the history.
Avoid searching with your eyes
Avoid searching with your eyes. Search with the tools of your IDE. You should be able to use it "blind". You should be able to move the cursor to the matching position in your code without looking at your keyboard, without grabbing your mouse/touchpad/TrackPoint and without looking up/down on your screen.
Compare two files with a diff tool, otherwise, you might get this ugly skeptical frown.
How often per day do you search for the mouse cursor on your screen? Support your eyes by increasing the cursor size. If you use Ubuntu, you can do it via Universal Access / Cursor Size
Increase font size
During daily work, you often jump from one information snippet to the next information snippet.
When was the last time you read a text with more than 20 sentences?
I think from time to time you should do so. Slow down, focus on one
text, and read slowly. It helps to increase the font-size. ctrl-+
is
your friend.
KISS
Keep it simple and stupid. The most boring and most obvious solution is often the best. Although it sometimes takes months until you know which solution it is.
From the book "Site Reliability Engineering" (O'Reilly Media 2016) https://landing.google.com/sre/book/chapters/simplicity.html
Quote:
: The Virtue of Boring
Unlike just about everything else in life, "boring" is a
positive attribute when it comes to software! We don’t want our programs to be spontaneous and interesting; we want them to stick to the script and predictably accomplish their business goals.
Example: Pure Functions are great. They are stateless, their output can be cached forever, they are easy to test.
Increase the obviousness
But it is not only about code. It is about the experience of all stakeholders: Users, salespeople, support hotline, developers,...
It is hard work to keep it simple.
One thing I love to do: "Increase the obviousness".
One tool to get there: Use a central wiki (without spaces), and define terms. Related text from me: Documentation in Intranets: My point of view
Avoid redundancy
See heading.
Premature optimization is the root of all evil.
The famous quote "premature optimization is the root of all evil." is true. You can read more about this here When to optimize.
MVP
You should know what an MVP (minimum valuable product) is. Building an MVP means to bring something useable to your customer, and then listen to their feedback. Care for their needs, not for your vision of a super performant application.
Avoid i18n in MVP. German is my mother tongue. If I develop a MVP for German users, than I won't to i18n. This can be done later, if needed.
2. Data structures
Introduction
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds (creator and developer of the Linux kernel and the version control system git)
Cache vs Database
There is a fundamental fact which you need to understand: The difference between a cache and a database.
Remember the basic Input-Process-Output pattern.
In a cache you store data which is output. That's handy since you can access the output without doing the processing again. But cache-invalidation is hard. Maybe the input has changed, and the value in the cache is outdated? Who knows? If possible avoid caching, since this will never give you outdated data. You don't need to backup your cache data. You can create it again.
In a database you store data which is input. Usually it was entered by a human by hand, or generated by measuring some real word data. You can use the data in database to create a nice HTML page. It is important to backup your valuable database data, since you can't create it again. The generated output (HTML, JSON, ...) has no value.
Data which is input usualy has value. Data which is output has only little value, since you can re-create it again.
Relational Database
I know SQL is..... It is either obvious or incomprehensible. And, yes, it is boring.
A relational database is a rock-solid data storage. Use it.
When I studied computer science, I disliked SQL. I thought it was an outdated solution. I tried to store data in files in XML format, used in memory Berkley-DB, I used an object-oriented database written in Python (ZODB), I used NoSQL .... And finally, I realized that boring SQL is the best solution for most cases.
I use PostgreSQL.
I don't like NoSQL, except for caching (simple key-value DB).
The PostgreSQL Documentation contains an introduction to SQL and is easy to read.
If you want to share small SQL snippets, you can use https://dbfiddle.uk/
Cardinality
It does not matter how you work with your data (struct in C, classes in OOP, tables in SQL, ...). Cardinality is very important. Using 0..* is often easier to implement than 0..1. The first can be handled by a simple loop. The second is often a nullable column/attribute. You need conditions (IFs) to handle nullable columns/attributes.
https://en.wikipedia.org/wiki/Cardinality_(data_modeling)
If this is new to you, I will give you two examples:
- 1:N --> One invoice has several invoice positions. For example, you buy three books in one order, the invoice will have three invoice positions. This is a 1:N relationship. The invoice position is contained in exactly one invoice.
- N:M --> If you look at tags, for example at the Question+Answer site StackOverflow: One question can be related to several tags/topics and of course a topic can be set on several questions. For example, you have a strange UnicodeError in Python then you can set the tags "python" and "unicode" on your question. This is an N:M relationship. One well know example of N:M is user and groups.
Conditionless Data Structures
If you have no conditions in your data structures, then the coding for the input/output of your data will be much easier.
Avoid nullable Foreign Keys
Imagine you have a table "meeting" and a table "place". The table "meeting" has a ForeignKey to table "place". In the beginning, it might be not clear where the meeting will be. Most developers will make the ForeignKey optional (nullable). WAIT: This will create a condition in your data structure. There is a way easier solution: Create a place called "unknown". Use this senitel value as default. This data structure (without a nullable ForeignKey) makes implementing the GUI much easier.
In other words: If there is no NULL in your data, then there will be less NullPointerException in your source code while processing the data :-)
Fewer conditions, fewer bugs.
Avoid nullable boolean columns
[True, False, Unknown] is not a nullable Boolean Column.
If you want to store data in a SQL database that has three states (True, False, Unknown), then you might think a nullable boolean column (here "my_column") is the right choice. But I think it is not. Do you think the SQL statement "select * from my_table where my_column = %s" works? No, it won't work since "select * from my_table where my_column = NULL" will never return a single line. If you don't believe me, read: Effect of NULL in WHERE clauses (Wikipedia). If you like typing, you can work-around this in your application, but I prefer straightforward solutions with only a few conditions.
If you want to store True, False, Unknown: Use text, integer, or a new table and a foreign key.
Avoid nullable characters columns
If you allow NULL in a character column, then you have two ways to express "empty":
- NULL
- empty string
Avoid it if possible. In most cases, you just need one variant of "empty". Simplest solution: avoid that a column holding character data is allowed to be null.
If you think the character column should be allowed to be NULL (for example you want a unique, but optional identifier for rows), then consider a constraint: If the character string in the column is not NULL, then the string must not be empty. This way ensure that there are is only one variant of "empty".
SQL: I prefer subqueries to joins
In most cases, I use an ORM to access data and don't write SQL by hand.
If I do write SQL by hand, then I often prefer SQL Subqueries to SQL Joins.
Have a look at this example:
SELECT id, name
FROM products
WHERE category_id IN
(SELECT id
FROM categories
WHERE expired = True)
I can translate this to human language easily: Select all products, which belong to a category that has expired.
Use all features PostgreSQL does offer
If you want to store structured data, then PostgreSQL is a safe default choice. It fits in most cases. Use all features PostgreSQL does offer. Don't constrain yourself to use only the portable SQL features. It's ok if your code does work only with PostgreSQL and no other database if this will solve your current needs. If there is a need to support other databases in the future, then handle this problem in the future, not today. PostgreSQL is great, and you waste time if you don't use its features.
Imagine there is a Meta-Programming-Language META (AFAIK this does not exist) and it is an official standard created by the ISO (like SQL). You can compile this Meta-Programming-Language to Java, Python, C, and other languages. But this Meta-Programming-Language would only support 70% of all features of the underlying programming languages. Would it make sense to say "My code must be portable, you must use META, you must not use implementation-specific stuff!"?. No, I think it would make no sense.
My conclusion: Use all features PostgreSQL has. Don't make your life more complicated than necessary and don't restrict yourself to use only portable SQL.
Great features PG has, which you might not know yet:
- Insert/Update/Delete Trigger
- "SELECT FOR UPDATE .... SKIP LOCKED" gives you the perfect foundation for a task-queue. For example Procrastinate
- PGAdmin nice GUI to configure your databases.
- Fulltext Search
There is just one hint: Avoid storing binary data in PostgreSQL. An S3 service like minio is a better choice.
Where to not use PostgreSQL?
- For embedded systems SQLite may fit better * Prefer SQLite if there will only be one process accessing the database at a time. As soon as there are multiple users/connections, you need to consider going elsewhere
- TB-scale full-text search systems.
- Scientific number crunching: hdf5
- Caching: Redis fits better
- Go with the flow: If you are wearing the admin hat (instead of the dev hat), and you should install (instead of developing) a product, then try the default DB (sometimes MySQL) first.
Source: PostgreSQL general mailing list: https://www.postgresql.org/message-id/5ded060e-866e-6c70-1754-349767234bbd%40thomas-guettler.de
Transactions do not nest
I love nested function calls and recursion. This way you can write easy to read code. For example recursion in quicksort is great.
Nested transactions ... sounds great. But stop: What is ACID about? This is about:
- Atomicity
- Consistency
- Isolation
- Durability
Database transactions are atomic. If the transaction was successful, then it is Durable.
Imagine you have one outer-transaction and two inner transactions.
- Transaction OUTER starts
- Transaction INNER1 starts
- Transaction INNER1 commits
- Transaction INNER2 starts
- Transaction INNER2 raises an exception.
Is the result of INNER1 durable or not?
Conclusion: Transactions do not nest
Related: http://stackoverflow.com/questions/39719567/not-nesting-version-of-atomic-in-django
The "partial transaction" concept in PostgreSQL is called savepoints. https://www.postgresql.org/docs/devel/sql-savepoint.html They capture linear portions of a transaction's work. Your use of them may be able to express a hierarchical expression of updates that may be preserved or rolled back, but the concept in PostgreSQL is not itself hierarchical.
My customer wants to extend the data schema...
Imagine you created some kind of issue-tracking system. Up until now, you provide attributes like "subject", "description", "datetime created", "datetime last-modified", "tags", "related issues", "priority", ...
Now the customer wants to add some new attributes to issues. It would be quite easy for you to update the database schema and update the code.
Maybe you are lucky and you have 100 customers. Then you would like to prefer to spend your time improving the core product. You don't want to spent too much time on the features which only one customer wants.
Or the customer wants to update the schema on its own.
What can you do now?
One solution is EAV: The Entity–attribute–value model
Why I don't want to work with MongoDB
MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. (Wikipedia)
One document in a collection can differ in its structure. For example, most all documents in a collection have an integer value on the attribute "foo", but for unknown reasons, one document has a float instead of an integer. Grrr.
What does the solution look like?
return try {
this.getLong(key)
} catch (e: ClassCastException) {
if (this[key] is Double) this.getDouble(key).toLong() else null
}
No! I want a clear schema where all values in a column are of the same type.
Of course, my wish has a draw-back: If you want