Please read this page from the main YASEP interface

Assembly language template text

The assembly language of the YASEP

Introduction

This page describes the YASEP's JavaScript assembler (the code that translates the textual instructions into binary code) and the language conventions (syntax) used when writing software for the YASEP. These informations are specific to the yasep.org tools and third-party tools (such as defora's) might differ significantly, check their respective documentations carefuly. For informations about specific instructions, see the opcode map.

This page covers the following points :

The Instruction-Level Assembler
How to write numbers
The pseudo-instructions (like DB, DW or YASEP)
The reserved keywords
The instruction aliases
The flags
The default values
The high level assembler

Architecture

The YASEP's assembler is split into two layers :

The line-by-line, or instruction-level assembler is a stateless routine that assembles one instruction at a time. Its interface can be accessed through the corresponding menu or when you click on instructions that look like this: add d2, r3. It is not very sophisticated but very useful in countless places.
The high-level assembler takes a whole file, plits it into single lines then feeds them to the above routine. It doesn't deal with the instructions directly but assembles the results, manages the symbol tables, evaluates/computes values, create the binaries, handles preprocessing...

Anatomy of an instruction

An instruction is the basic unit of software, like an atom.
* For the YASEP, it is a 16-bit or 32-bit word that contains several fields that describe what to do with what.
* For the assembler, an instruction is a line containing all these informations in readable, symbolic form.

An instruction typically contains

An opcode (short of operation code) that describes what operation to perform (add, subtract, copy...),
One immediate value, a signed number (4, 6, 16 or 20 bits) included by the software,
One or two register operand(s), these are data already stored inside the processor,
One destination register, that tells where to write the result of the current instruction,
One eventual condition code that describes under which condition to abort the instruction.

The binary structure of the instruction is explained there.

Instruction-level assembly

The YASEP's instructions are quite simple but they are not always practical, so some opcodes introduce a few modifications, as indicated by opcode flags (internal properties of the opcodes, listed here). They are usually harmless and don't impact the architecture, but they make the instructions more handy, thus helping writing/reading programs easily.

The assembler uses these flags when transforming the source text into binary codes. They keep the source code readable and coherent, independently from any hardware tricks, exceptions or processor versions.

The purpose of assembly language is to let the software developer write code without having to remember all the architectural details and its subtleties. He only needs to remember these five rules :

1 : The opcode is always written first.
2 : When a numerical value (some immediate data) is needed, it comes just after the opcode.
3 : Then the source registers are written, to provide with other operand(s)
4 : After the operands comes the destination register (where the result will be sent)
5 : The last words are the condition (that can use another register).

These rules are enough to understand the meaning of all the instructions, even if some have variations or exceptions.
For example : add 3 r2 d3 LSB1 a2

"add" is the opcode, which specified which operation is performed
"3" is the immediate number, data that is directly included inside in the instruction
"r2" is the name of a register, its value will be added to the other operand "3".
"d3" is the name of a register in which the YASEP will write the result of the addition.
"LSB1 a2" means that the result will be written to d3 only if the Least Significant Bit of a2 is set to 1.

Input processing

The assembly routine takes a single line (a character string with no "end of line" like '\n', '\r'...)
The line may end with a comment, started with ';'. Starting from this delimiter, all the following characters are flushed.
The leading and trailing spaces are removed, the comas and tabs are replaced by spaces, and duplicated spaces are merged. You are free to write add 234h d2 r3, add 234h d2, r3 or add 234h, d2, r3
The whole line is converted to upper case. So it does not matter if you write adD 234h D2 r3 or ADd 234h d2 R3 because it will become ADD 234H D2 R3 internally.
The first word must be the opcode (rule #0). Its properties (flags and forms ) are looked up in the internal JavaScript tables.

An immediate value is looked up as the first operand. If one is found, its size (the number of significant digits) will determine which forms are allowed next. For example, a 8-bits number (including sign bit) disables "short immediate" forms that can only hold 4 or 6 bits.

Depending on the instruction, register names and keywords follow. The order and function of these parameters are defined by the "form".

and that's all there is to say about the "input syntax".

Number formatting

The numbers are accepted in 3 formats :

Hexadecimal ([0-9A-F]* with a trailing 'h'), for example dw 1234h (the 0x prefix or the $ prefix is never used)
Decimal ([0-9]* without prefix or suffix) : dw 1234.
This is the only format that accepts a leading minus sign : dw -1234
Binary ([01]* with a trailing 'b') : dw 01101101b

Numbers are mostly used in contexts where the number of significant bits is bounded to the size of the container (usually, the immediate fields of instructions). When too many digits are given, the assembler might keep only the desired number of LSB, and discards the MSB. The following example shows how the number might be truncated : db 1234h.

Note : the disassembler always uses hexadecimal as output format.

Pseudo-instructions

The ability to output arbitrary numbers is critical for many uses so the assembler has the following four pseudo-instructions :

DB : "Data Byte"
outputs 8 bits : db 12h

DH : "Data Half-word"
outputs 16 bits : dh 1234h

DW : "Data Word"
output 32 bits : dw 12345678h

DU : "Data Unicode"
outputs a string of UTF8-encoded characters : du "you can write almost\b anything : ϗãþßǢƱ\n"
- the preprocessor is disabled for the whole line
- the string must be surrounded by quotation marks
- the recognised escape sequences are \b \f \b \r \t \\ \/ \"
- if you want something more complex, you can still use DB

An unbounded number of litteral numbers is accepted, the limit depends on the JavaScript engine, not on the assembler's design. The assembler can provide all the output's numbers as a stream when the emit_bin() function is defined, such as in this example:

Result :

(error messages)

Core datapath width

Since 2008-08, the YASEP exists in 16-bit and 32-bit variants. The opcodes don't change but a few of them are pointless in 16-bit mode or 32-bit mode. The source code can specify that a certain width is used so a warning is issued when an invalid instruction (depending on the CPU) is assembled.

YASEP16 specifies that the targetted CPU has a 16-bit datapath. All 32-bit only instructions generate a warning.

YASEP32 specifies that the targetted CPU has a 32-bit datapath. All 16-bit only instructions generate a warning.

YASEP resets the target CPU to generic/undefined.

These pseudo-instructions don't generate any code and can be used in any order, as they simply control an internal flag. This flag is compared with each instruction's flag (see YASEP32_ONLY and YASEP16_ONLY).

Since 2012-02, these pseudo-instructions are deprecated because the datapath width is one of the parameters that make a "CPU profile". You can select a CPU profile with the .profile keyword in your source code. You can also check or create profiles in the dedicated graphic interface.

Reserved symbols

The instruction-level assembler recognizes the following symbols, and rejects anything else :

Register names (PC R1 R2 R3 R4 R5 A1 D1 A2 D2 A3 D3 A4 D4 A5 D5)
Pseudo-instructions (DB DH DW YASEP16 YASEP32 YASEP)
The condition codes (LSB0 LSB1 MSB0 MSB1 ZERO NZ)
The instruction opcodes and the opcode aliases (NOP...)

Aliases

The assembler eases instruction coding (letting the programer think about what to do, while caring about how to do it) with two types of aliases : form aliases (see ALIAS_RR) and instruction aliases (they are listed under the opcode map). This section is about instruction aliases.

Internally, they can be used like normal instructions, but they provide different forms and/or different semantics. However, they use real opcodes of other instructions.

The substitution is handled at the assembly level and the disassembler probably won't infer the originally assembled alias. So don't be surprised if instructions like NOT or NEG assemble correctly, but the disassembly returns a different opcode.

Instruction Forms and flags

Despite the very simple instruction format, the assembly language instructions appear with several different forms. This is due usually to reasons like :

The instruction does not make sense with one form, for example HALT does not require an argument,

The assembly language must be easy to read and understand, hence some fields are written in a different order internally.

The various instructions forms are described in the forms page.

The flags are listed in their own page too.

Default values

All uninitialized data, fields or values are cleared (zero). For example, the condition codes and the update fields are 0 when not needed (the instruction always executes with no update).

The instruction that use the Imm20 form always fill the 20 bits, even when the 4 MSB are not decoded or used by YASEP16. A warning might appear during assembly.

The high level assembler

Since 2012-02, the tools include a "high level assembler" called YASMed. This is a graphic user interface that handles instructions line-by-line, with little respect for the underlying low-level CPU architecture. You can start a new instance with the ASM menu or by clicking on code zones like this:

  NOP ; source code example

YASMed is not a classic multiple-passes assembler, as it solves references at edition time, which can be out-of-order. In case of unresolved symbols, hit the "re-assemble" button to pop up a new window with an updated symbol table.

Currently, YASMed recognises certain keywords with a leading dot at the start of the line:

Note that the "dot" directives must start at the first character of the line, no leading space is allowed.

`.name` BlockName

Defines the name of the function, or code block, as well as the file's name when the editor's contents are saved.

`.profile` ProfileName

Indicate the type of the target CPU so the low-level assembler knows which opcodes are valid (among other details).

`.subst` NewWord Words to substitute

USE WITH CAUTION, WILL BE DEPRECATED

Lazy preprocessing like #define in C. This line defines a substitution of the first words with all the words that follow.

`.` NewWord

Defines a substitution where NewWord is replaced by the numerical value of the current address (this is a kind of "label").

`.org` Address

Usually, a program starts at address 0 but it is possible to force the address of the first byte of a file with the .org directive, which should be used before any byte is assembled.

The special value auto (instead of a positive number) declares that the file's code is position-independent, and can be relocated anywhere the assembler needs.

`.align` Number

Add padding bytes (cleared to 0) so the next line's address is a multiple of "Number". "Number" must be a power of 2 (1, 2, 4, 8, 16, 32, 64, 128, 256, 512). The YASEP's instructions YASEP must be at even addresses, equivalent to .align 2.

Here is a simple example that uses the above keywords:

.name Dumb_Example ; this code will be saved to
; a file named Dumb_Example.yas

.profile YASEP16 ; This program expect to run
; on a generic 16 bits version of the YASEP

.subst Counter R1 ; substitute variable names
.subst tmp R2 ; with actual register names

.org 22 ; locate the code at address 22

  mov 0 Counter
. LabelLoop ; Loop entry label

  ; Loop body of any size

.align 32 ; the next instruction
 ; will be aligned to a 32-byte boundary

; Loop 65536 times :
  add 1 Counter ; increments the counter

  mov LabelLoop tmp ; load the loop address in R2
  mov tmp PC NZ Counter ; Loop if the counter is not 0

  HALT 1 ; End of program, hang the CPU