Photo by Erik Mclean on Unsplash
🚡Programming language, under-the-hood stuff :Documentary.
Documentary of my week-long research and learning of under-the-hood stuff in programming languages. Just out of curiosity 🫣
How I ended up writing this blog
It was a time when I could not handle a neverending list of unanswered questions saved in my brain. It was clear that they were taking up a lot of space, so I wrote them on a piece of paper to free my brain storage. 🧠🧟
After week-long sleepless nights, it was the moment when every dot suddenly started to connect. I didn't go through compiler design books but still managed to understand fundamentals which is enough to code up my first programming language haha. BTW I'm not kidding! 🐸
The modern way to explore and learn
Before going into the path that I took to understand the concepts, I want to clarify one thing. Apart from the theories, it was my curious mind being in an unstable state and strong hunger to understand how languages work from the inside.
Traditionally things have been taught like it's detached from reality. Hell, seriously. Lemme give you an example here, for learning an operating system concept the only thing you need do is to `build your operating system`. Yup, it's damm simple. Why do people complicate them with generalized learning approaches where you get multiple random tosses of never used terms?
I'm not opposing the traditional idea, instead, it's not the better initial approach to take when you don't have your fundamentals clear to build more advanced stuff upon.
Considering my magical way to explore stuff that works as fast as NURALINK 🥹. I decided to learn concepts by putting my list of favorite languages on the inspection table😉
Language | Compiled | Interpreted |
C/C++ --> Carbon | ✅ | ❌ |
Golang | ✅ | ❌ |
Rust | ✅ | ❌ |
Python | ❌ | ✅ |
Java/C# | ❌ | ✅ |
Javascript | ❌ | ✅ |
Ruby | ❌ | ✅ |
Zig | ✅ | ❌ |
OCaml | ✅ | ✅ |
Mojo | ✅ | ✅ |
The first question in mind
To date, I'm using multiple programming languages combining compiled and interpreted. I wondered how a piece of software can convert highly human-readable language with symbols into the processor's native language or code.
Thus I began my exploration research in a top-down fashion. The languages I'm familiar with gave me enough motivation to start with them cause I use them. Being already informative about compiled language saved me a ton of time. While going through them, I took real-life examples of them like :
C/C++
Golang
Rust
Zig
...
Along the way explored multiple terms including Ahead of time
, Natively compiled
. Things got pretty serious when it was tough to wrap my head around unanswered questions like :
Can an executable code ( code.exe ) run on the same operating system ( Mac ) but different Instruction set architecture?
If the operating systems have a binary for multiple processor architectures ( ISA ) why do we need to compile for specific architecture then?
These questions led me to a term called CROSS-COMPILATION, it had all the answers I needed. WoW, problem solved then, Yeah kinda 🥹
Cross-compilation enables us to compile programs for different processor architectures ( ISA ) from being in a single processor architecture. The above diagram illustrates this incredibly ✨
Lemme give you a real-life example of cross-compilation in Golang. Execute the below command in the terminal to compile Caddy for Windows running on amd64 Instruction set architecture.
env GOOS=windows GOARCH=amd64 go build github.com/mholt/caddy/
The executable will be created in the current directory, using the package name as its name. However, since we built this executable for Windows, the name ends with the suffix .exe
. Run the below command to verify the created file.
ls | grep caddy
output
caddy.exe
The env
the command runs a program in a modified environment. This lets you use environment variables for the current command execution only. The variables are unset or reset after the command executes.
The following table shows some of the possible combinations of GOOS
and GOARCH
you can use:
GOOS - Target operating system | GOARCH - Target Instruction set architecture |
Android | arm |
Dragonfly | amd64 |
Linux | ppc64 |
NetBSD | 386 |
Windows | amd64 |
What's the matter with JAVA?
I have written Java code before. Initially, I used to run my code in IDE ( Integrated development environment ) - Eclipse
, which gave me a sweet abstraction layer to prevail me from looking into the internal execution processes. This was long before in my high school days, but 🍑 things have changed now 🥲
This time I played with SDK instead and did my terminal magic to understand under-the-hood stuff. The very next moment I got myself confused between Javac
and Java
inside command line env. why do we require a two-step process to execute Java code? why did the .class
file gets generated after the first process? My curious mind needs answers now 🤥
// Doing Javac thing first...
javac demo.java
// Doing Java thing second...
java demo
Did some digging and finally got to a question that says, 'Is Java compiled or interpreted?' 🤔 found out that Java is both compiled as well as Interpreted but how? Okie things get pretty juicy here cuz the Java architecture is entirely different than natively compiled languages like zig
or rust
. Let's look at the bigger picture together.
Phase 1 - Compilation
If we look at things closely, the compiler javac
in Java is functionally very similar to natively compiled language, the only dissimilarity is the output of the compiler which is the Bytecode instead of machine code. The Bytecode is also called Intermediate code which is closer to the machine but not as much as the binary stuff.
The question arises here is that why we even need to compile into Bytecode to execute it further by the Interpreter. we can just do direct Interpretation like LISP
In the old days. BTW LISP
was the first Interpreted programming language.
Performance: Bytecode is a lower-level representation of the code compared to the original source code. This allows the JVM to perform various optimizations during the interpretation or Just-In-Time (JIT) compilation process. These optimizations can result in improved performance and execution speed compared to directly interpreting the source code.
Portability: Java aims to be platform-independent, allowing Java programs to run on any system that has a compatible JVM. By compiling Java code into bytecode, which is a standardized intermediate representation, it can be executed on different operating systems and architectures without the need for recompilation.
Interoperability: By using bytecode as an intermediate representation, Java programs can seamlessly interact with other languages that target the JVM, such as Kotlin, Scala, and Groovy. These languages can also be compiled into bytecode, allowing them to utilize Java libraries and frameworks.
Phase 2 - Interpretation
We now have bytecode in hand, we just need to pass it on to Interpreter. The Interpreter then analyses the Bytecode line by line and produces direct output instead of an executable file { specific to the operating system }.
Variations in Interpreters
Like Native compilers, Interpreters are rapidly evolving over time. Earlier we used to directly Interpret source code, for example, LISP
but now things have gotten advanced. Let's introduce you to JIT
what stands for just in time.
Here is a thing, Interpreters are slow than running what's produced by native compilers, and because it's executing bytecode line by line in most of the cases which in result takes a lot of time doing stuff while being in runtime mode. That's why we have JIT with us which optimizes out dynamically typed code on the fly in runtime. Code optimization in Interpreters is not possible because it scans each line and executes it.
Code optimization let us execute the same code much faster than before. we won't go deeper into optimization for now, that's a topic for another blog but you get the point.
Okie here are the Interpreter's variations in terms of architecture that you will see in most cases:
Variations | Example language |
Direct Interpreter without JIT | LISP |
Direct Interpreter with JIT | |
Bytecode Interpreter without JIT | Python |
Bytecode Interpreter with JIT | JAVA, MOJO |
In the background, even the compiled programs( C, C++ etc) are interpreted. There is an interpreter running the binary file which is implemented by the underlying processor. But its not commonly said.
A CPU can be viewed as a hardware-based interpreter for its machine code.
A VM can be viewed as a software-based interpreter.
Architectures
Now that I have a broad understanding of a couple of programming language architectures, I began to dig more into language architectures and I found:
Natively compiled ✅
Direct Interpreter ( with or without JIT ) ✅
Bytecode Interpreter ( with or without JIT ) ✅
Transpiler
Multiple Implementations of the same language
Let's take it back to 90tees. We all know C language which is epic for system programming but don't have an official implementation like python.org or ruby-lang.org. If you type C programming language in Google you won't get a website dedicated to C language instead you are forced to download its compiler according to the operating system.
Platform | Implementation |
Windows | MinGW, Cygwin |
Mac/Linux | clang, GCC |
List of C compiler Implementations ... https://en.wikipedia.org/wiki/List_of_compilers#C_compilers
Let's roll towards Interpreted language like Python. Earlier, I said Python has an official implementation so why am I talking about it in this section? In addition to official implementation, it has some unofficial implementations like pypy
.
List of Python Implementations ...
https://wiki.python.org/moin/PythonImplementations?action=show&redirect=implementation
List of implementations
For learning Implementations of other languages refer to this Link: