Symbol | Description | |
---|---|---|
Core: an execution unit that has up to 8 associated lanes for executing program code against data. It has a physical ID like 0, 1, 2, 3... etc. and each lane associated with it also has a physical id relative the the core itself (so there will be as many lane 0s as there are cores). This toy example GPU has one core with with four lanes.[1] | ||
| >(in textarea) | Program Counter (PC): a pointer into the program text associated with a core that indicates the next operation which will progress. For the purposes of our model, each core only has one PC (per program[2]). That choice is more or less what defines the SIMT model of computation: a core will dispatch as many operations as it has (active) lanes at the same time, but those operations will complete independently[3] as the computation . |
Not Pictured | a sense of overall progress against entire logical space (i.e. all {group id, work id, ...} coordinates); this leaves the part of the "work mapping" entirely up to the reader, unless their problem space is sized 1:1 with a single core. | |
an operation that will produce results with ID %x; these results will have physical {core, lane} coordinates as well as logical {group id, work id, ...} coordinates. | ||
Not Pictured | the "type" of the operation (more specifically: the set of architectural hazards that may delay completion relative to dispatch), like "memory" or "not-memory" | |
an operation that will not produce directly identified results, such as `OpStore` (which stores to memory) or `OpReturn` (which signals the exit of a program invocation). | ||
◦ | | dispatched operation that will produce a single result. The operation "belongs" to a {core, lane} pair. |
| completed operation that has produced a result. | |
• | a result that is ready to be produced; i.e. all of its dependencies are available. When enough results are ready, the core will execute LANE_WIDTH operations to complete them in parallel. | |
•• ... (in textarea) | two results that were computed simultaneously. | |
two results that were computed sequentially (here, one tick apart). | ||
Not Pictured | This toy GPU's memory controller that only lets one operation through per tick (per core, probably?) but has infinite bandwidth per operation | |
| A view into the GPU's memory for the 16 bytes associated with buffer "a" by the `OpBufferTALVOS` metadata opcode; formatted as an array of 4 elements (ideally: with the help of the associated SPIR-V type; currently: that's just what you get) with each element identified by its index (offset) such as `.[1] = ...` which indicates the value of the four bytes at element offset 1 (i.e. four bytes into the memory range) interpreted as an unsigned 32-bit integer; here, the element with index 1 has value 4. The view also tracks the most-recently held value, and displays that previous value when the memory changed in the most recent interaction (either a tick or a step), as seen here in elements 2 and 3. | |
Not Pictured | The type metadata also associated by way of the `OpBufferTALVOS` opcode that indicates the layout of each element of a Buffer view | |
Not Pictured | Memory safety, and especially bidirectional impact of parallel scalability on the same (i.e. the `if (i < N)` bit in most CUDA examples). | |
Not Pictured | Tracking uninitialized memory, visualizing incorrect access bounds. | |
(the whole textarea) | a program in SPIR-V (with Talvos-specific extensions) that will be interpreted by the virtual GPU | |
%x = OpXyz %1 %2 %bar | A SPIR-V operation (in text form) that will produce result with id %x by `Xyz`ing its arguments %1 %2 and %bar. | |
Not Pictured | the whole SPIR-V spec[4] (plus associated references such as the Vulkan API[5]) which describe what an OpXyz does, or why it needs to take arguments. | |
Footnotes
|