Design Notes¶
Warning¶
This part of the documentation is where I store my ramblings about ECS design for future reference.
Why ECS?¶
Structuring things like this has certain advantages over classic
object-oriented programming: * Data and logic are separated, modular,
readable and reusable. Admittedly, that is also the sales pitch of OOP
itself. ECS is a further measure in controlling the mess that is game
development. * Compositional inheritance becomes the norm. One doesn’t
use inheritance to build a tree of classes, each more specialized then
the last. Instead, a game object can be said to have a type based on
what components it has at any given time. * Systems work only on
entities that have sets of Components that the System works on.
Thus they can be added and removed during an Entity’s life time,
without tripping up the code working on them. * In games, time is
occurring in discrete steps anyway, inspiring the round-robin nature of
Systems. * Systems are insulated from one another unless they
work on the same component types. If they do, they need to agree with
all other Systems on the semantics of the fields of that
Component. As long as they do that, Systems can be developed
with very little danger of side effects. * Being mostly safe from side
effects allows to add and remove new systems at low development cost and
technical debt, which in turn allows for free experimentation and
organic development during both initial development and long product
lifetimes. At its extreme, game mechanics can be added and removed at
runtime, making downtime in persistent multiplayer servers obsolete. *
A well-insulated game mechanic is also transferrable between many games,
cutting down development time, especially during prototyping.
Deferred Component addition / removal¶
Motivation¶
Imagine that you have a shooter game where two characters each have a gun that removes the gun that removes the gun of the target. At the same moment, those characters shoot at each other.
For some reason that is realized by removing the target’s Gun
component; Good software design is secondary here, I want to make a
point about how to work with components. Then there is a system
Shoot that resolves the shots. Its filter applies to all entities
with a Gun component.
When Shoot’s update() runs, it iterates over the first entity
in its filter, then the second. The order can be considered random and
should not matter.
When it processes the first entity, it looks up its Gun, sees the
shot, finds the target (the second entity), removes its Gun, and is
done.
When it processes the second entity, it looks up its Gun, and an
Exception would be thrown at this point, since the entity does not
have that component anymore. It should not even be in the filter
anymore. But that’s just bad semantics that shouldn’t be fixed by hackin
around the problem. The “true” semantics is that a system is given a
state of aspects of the world, and should transform that into its
successor state, and do so in as small, compact, and easy to reason over
units; It should not make the developer deal with intermediate states.
It has turned out that simply iterating over all entities in a filter is
the building block of update(). During no iteration should the
developer have to think about what happened in any other iteration. The
entity should be presented to the code within the loop as it was at the
beginning of the iteration in all aspects that are read from it.
Luckily, there is a solution… -ish thing.
Deferring changes¶
When we add or remove components, we do not process those changes while
the system is running. Instead, we simply treat these actions as
requests that we queue up an will process at a later time. A good time
for that is “any time right before a system is updated”, since it cleans
up any dirty state, like the setup of a specific world’s state in the
beginning leaves. Another good time is right after a system, since a
developer may choose to let code outside of Systems interact with
the World’s state, so it’d be helpful to leave a clean state. And
since cleaning up a clean state is close to a null operation, there is
no downside to doing both.
When deferring component updates, it is necessary to define the semantics of getting and setting components well. After some experimentation with “more intelligent” I’ll be returning to “more simple” for v0.2. “More intelligent” mostly just breeds complexity, and is designed for edge cases that may never happen.
So, additions / removals of components are deferred, and only take
effect once world._flush_component_updates() is called, which
happens automatically when (before) a System is being run. Thus you
can let a System process one entity, let that processing cause the
removal of a component of another entity, then let the System go on
to do the same the other way around, even if the system depends on the
presence of the now “removed” component. That way, when processing each
Entity, the system is presented with the state (of component
presence) from when the system is started.
However, if you add components, on one hand, they won’t be added to
any filter immediately, just like they won’t be removed by dropping a
component (since those updates are deferred). But you can still access
them through ComponentType in entity and entity[ComponentType],
as those functions use the sets of both existing and newly added
components. NOTE: Not in v0.2 anymore. Needing to access those is that
unlikely-to-occur edge case.
Do also note that none of this magic holds true for the values of state;
You’re on your own in that regard. If a System processes an
Entity and changes some state, then processes another Entity,
that process will not see the original state, only the current state.
Splitting systems into “This can be done”, “This is being done”, and “Now we’re cleaning up weird states that could have come about” seems to be a workable pattern.
Implementational detail: Optimizing type filtering performance¶
NOTE: I’ve ended up implementing it simpler, but the end result is still
that each System maintains a mapping of {Filter: set(Entity)}.
When an Entity is changed, it will be tested against all
Filters. While this could be optimized further (by testing only
against filters that test for the changed components), it is already a
rather lightweight operation and unlikely to be a game’s bottleneck.
As the number of Entities grows through expanding the game world or getting more players, and the number of Systems grows due to new features in the game, the complexity to filter the entities by a system’s type list increases with O(m n). To prevent that from happening each frame, I propose to use mappings of A = {Entity: [Systems]} * B = {System: [Components]} that is modified when an Component or System is added or removed from or to the World.
- Adding a Component to an Entity
- Each System’s type list is tested against the Entity’s Components. If it matches, the System will from now on process this Component that is on this Entity (adding A and B mappings), and the System’s init_components() will be called for the Component.
- Removing a Component from an Entity.
- For each System from mapping A, we match its type list against the
Entity. If it does not match anymore, we need to
- remove the system from A
- run the System’s destroy_component() on he Component
- remove the component from B
- For each System from mapping A, we match its type list against the
Entity. If it does not match anymore, we need to
- Adding a System to the World
- Its type list is tested against any Entity in the World. For each match, A and B mappings are added, and the System’s init_components() is called with the matching Components.
- Removing a System from the World
- All Components from B are used to determine the set of Entities that they are on, to remove the System from A.
- On each Component from B, the System’s destroy_component() is run
- The System is removed from B.
- Running game logic
- This step now requires merely one lookup per system in B to have the set of Components readily available.
There is an edge case where this approach runs into an issue with overselection. Assume there’s a system that processes the components of two sets of entities, X and Y. It processes Y only if processing X has yielded a certain result. In timesteps where that result does not come about, Y does not need to be processed, thus not be filtered for in the first place. In an “everything happens in RAM” situation, this is not a problem; references to the Y set are available, whether they are needed or not, without any penalty incurred. If the data is stored in a DB or over a network, however, the transfer of data that makes it available to the system, however, should be avoided, since this data transfer is a slow operation.
An upshot of eshewing dynamic querying for Components is that Systems have to be upfront about what Component types they process, leading to a clear and programmatically extractable understanding of System-Component dependencies.
Components referencing each other¶
NOTE: This has been implemented using the “Unique values” approach
described below, with the references referring to Entities. The
confusing use of API probably stems from the original design phase of
WECS.
In a world, there is a thing, and it has the property of being a room:
entity = world.create_entity()
entity.add_property(Room())
In the world, there is another thing, and it’s Bob:
bob = world.create_entity()
Bob has the property of being in a room:
bob.add_component(RoomPresence(room=...))
And at “…” the problem starts.
###Observer pattern
If I just use a reference
RoomPresence(room=entity.get_component(Room))
that’s bad, because there’s no cleanup mechanism if entity gets
removed from the world. We could use the observer pattern to do that.
Now Room has a list of references, observers.
RoomPresence(room=room) adds itself to that list. When
entity.destroy() is called, it destroys its components, calling
Room.destroy(), which calls all the observers. Thing is, now we
experience the problem in reverse. So RoomPresence.destroy() now
must take care to clean up the observer lists of all components that
it is observing. You see how this tends to get complicated?
On the upside, we now have a bus over which we can also send more general events, though this will bring complications of its own. But like spells that affect every affectable entity in the room could be implemented that way.
However, this upside is actually a downside. When we introduce inter-component messaging, we now have components processing data, and have broken the fundamental paradigm of ECS:
- Components are data
- Systems are processing
- System-System interaction should happen by manipulating data
So, what can we do instead?
###Unique values
If we use unique values
room.add_component(Room(uid="Balcony"))
bob.add_component(RoomPresence(room="Balcony"))
then we have I have to make sure that those UIDs are in fact unique. That isn’t too difficult:
room.add_component(Room())
bob.add_component(RoomPresence(room=room.get_component(Room)._uid))
The Room._uid is generated automatically during add_component(),
and then the component is registered with the World. Now when a
System CastSpellOnRoom runs and sees that Bob does indeed cast a
spell on the room that he is in, so it tries
room_uid = RoomPresence(room=room.get_component(Room)._uid
room = world.get_component(_uid)
to do something with the room, but if the room has already been
destroyed, world.get_component() will raise a
KeyError("No such component"). It’s now up to the system how to deal
with that, and how to bring Bob’s RoomPresence component back into a
consistent state.
However, it’s not this system’s job to clean up after
RoomPresences, it is to cast a spell. What it can do, or what should
ideally happen automatically, is that Bob gets marked as needing cleanup
(e.g. bob.add_component(CleanUpRoom)), and that a dedicated system
deals with what consistency means in the game (e.g. just removing the
component, or setting the referenced room to an empty void for the
player to enjoy). This in turn leads to possible race conditions; when
does that transition happen during a frame? On the other hand, since
all systems that can’t work properly anymore due to this inconsistent
situation should deal predictably and fail-safe (mark and proceed with
other entities) with it, this should be of preventable impact.
Implementational detail: Size of GUIDs¶
NOTE: Theoretical for now, there are no GUIDs being used yet.
TL;DR: 64 bit is the right answer
Entities act as nothing more than a label, and are usually implemented as a simple integer as a globally unique identifier (GUID). The question arises: How many of those do we need?
Assume a game of five million concurrent players, and a thousand Entities in the game world per player. Thus we arrive at five billion Entities in the game world. This is just above 2**32 numbers (4.29 billion). 64 bit offers over 18 quintillion IDs, which should be enough for even the largest player base with a staggering amount of per-player content of the game world.
Implementational detail: Systems threading¶
CURRENT STATE: When a system is added, an int is provided.
world.update() will run the task in order of ascending numbers.
One advantage of ECSes seems to be parallelization. Systems can run in parallel as they are independent of each other. I think that that’s Snake Oil, and I won’t buy it that easily.
- There is time. The basic time unit of a game is usually a frame on the client’s side, and a tick on the server side. A system may run as fast and as often as it pleases if all that is does is triggered by state changes on components, and thus effectively does event-driven processing on them. However, if that is not the case for a given system, then it will likely need to run once per frame/tick. Thus, a synchronization point between systems is needed.
- There are cause-and-effect dependencies. Consider input, physics, and rendering. In any given frame, these need to happen in a defined sequence, so that the player is presented with a consistent game state.
- There is no time. Every now and then, a System may need to perform a time-costly operation, like loading a model from disk or, even worse, over the network. This would bring the advantages of parallelity to the forefront, as only this specific system would stall, and all others would keep running and pick up on the results of the operation once it is available. This, however, can only happen if there are no synchronization points between those other systems and the stalling one.
I have no idea how to square these with each other elegantly, though within Panda3D, the task manager can solve this. Long-running systems into separate task chains to run asynchronous, while “every frame” tasks are put into the default task chain.
Note: Component Inheritance Considered Dangerous¶
One basic design feature of an ECS is the separation of data from the code that processes it. One could now get the idea “Excellent, then I can have a class hierarchy of Components, and the Systems will process those Components that subclass their component types.” The perceived upshot here is that as gameplay is iterated on, Components can be enriched with new functionality (implemented in Systems) that add to existing behavior, while old behavior runs on unaltered for those components that are still of the base component’s type. This is unnecessary and potentially dangerous. The alternative is to just add a new component type, and a system that runs on entities where both components are present. This leads to easy management of the system:
- Old items should be individually upgraded, or need to be upgraded for the new rollout? Just add the new component, no upgrade mechanism necessary.
- The feature wasn’t fun after all? Remove the new Components from the game. No need to have a downgrade mechanism for components of the new type. Systems that fall into disuse can be identified automatically.
- You end up with two Systems anyway.
- Having an inheritance tree between component types also leads to a semantic problem: What kinds of components can exist on the same entity? If a base class A is used for feature 1, a child class B for feature 1+2, and C for 1+3, how can an entity partake in 1+2+3? It’d have to have both B and C, which would mean having two instances of the shared set of fields, and an ambiguity concerning which component should be used when another system uses the base type A.
- If you’re doing hierarchical inheritance on the component types, you incur the penalties outlined above. If you’re doing compositional inheritance, you’re just replicating what ECS does anyway when you add a new component type to an entity.
Composing templates for generic entities¶
Note: I’ve implemented Aspects with slightly different API; no
MetaAspects until I actually need them.
To set up entities individually, giving them their components and the starting values of those, is repetitive and inefficient. Even writing a factory function for each type of entity in your game is repetitive, because in all likelihood, some kinds of entities will be very similar to each other.
Thus we need factory functions that create entities from sets of building blocks, and allow for overriding the given default values on a per-entity basis. The question is: How are those building blocks put together?
Two approaches offer themselves:
Archetypes: Just use Python’s inheritance system. Conflicts due to diamond inheritance should be resolved by the usual linearization rules. Frankly I have not thought too deeply about this approach.
- Pros: People who know Python know how this works
- Cons: Didn’t we just get rid of OOP inheritance for reasons?
Aspects: We’ll do EC-like composition again on a higher level.
An aspect is a set of component types (and values diverging from the defaults) and parent aspects. When you create an entity from a set of aspects, all component types get pooled. Unlike to Archetypes, each type must be provided only once. This disallows diamond inheritance and forces a pure tree inheritance.
This can already be checked on aspect creation
It also allows for testing at runtime whether an entity still fulfills an archetype.
This in turn allows for removing and adding aspects at runtime while insuring that aspects lower down in the hierarchy still match. An entity can be given several different instances of an archetype, only one of which can be active at any given time, but can be swapped out for another one.
API draft:
Aspect(aspects_or_components, overrides=None)- Creates an aspect.
- Calling an aspect returns a set of new component instances.
- ‘overrides’ provides default values to use instead of the ones
- on the provided aspects. Calling an aspect with overrides does not invalidate, but possibly override, an aspect’s overrides.
moveset = [WalkingMovement, InertialMovement, BumpingMovement, FallingMovement] walker = Aspect(moveset) slider = Aspect(moveset, {InertialMovement: dict(acceleration=5.0)}) world.create_entity(slider()) # A walker with acceleration of 5.0 world.create_entity(slider({InertialMovement: dict(rotated_inertia=1.0)}))) # Acceleration is still 5.0
add_aspect(entity, components) Just a bit of syntactic sugar, and may perform a check whether component clashes would occur before adding any component.
MetaAspect(list_of_aspects) A MetaAspect is a list of aspects. Components can not be created from a MetaAspect. Its purpose is to serve as a flexible filter when removing aspects from an entity. Instead of of one aspect, it is given a list of them, which is matched in order against the entity. The first one to match is then removed from the entity.
remove_aspect(entity, aspect_or_metaaspect) Returns the set of removed components.
Example for Aspects:
# For readability, default values are omitted, and the
# The minimum that a character can be is a disembodied character...
character = Aspect([Clock, Position, Scene, CharacterController])
# ...until it gets a body.
avatar = Aspect([character, Model, WalkingMovement, Stamina])
spectator = Aspect([character, Model, FloatingMovement])
# A player has a camera with which to see into the world.
first_person = Aspect([FirstPersonCamera])
third_person = Aspect([ThirdPersonCamera])
camera = MetaAspect([first_person, third_person])
# Most characters have logic that controls their actions.
input = Aspect([Input])
ai = Aspect([ConstantCharacterAI])
control = MetaAspect([input, ai])
# To make our lives easier, a high-level abstractions...
# (This is the one case that makes MetaAspects necessary.)
mind = MetaAspect([Aspect([control, camera]), control])
# ...and templates.
player_character = Aspect([avatar, input, first_person])
non_player_character = Aspect([avatar, ai])
# Now let's create some entities!
player_entity = world.create_entity(player_character())
npc_entity = world.create_entity(npc_character())
# What if "minds" that control characters could swap bodies?
def swap_minds(entity_a, entity_b):
mind_a = remove_aspect(entity_a, mind)
mind_b = remove_aspect(entity_b, mind)
add_aspect(entity_a, mind_b)
add_aspect(entity_b, mind_a)
swap_minds(player_entity, npc_entity) # This will get confusing...
swap_minds(player_entity, npc_entity) # Much better.
# Now let's force a 3rd Person camera on the player.
remove_aspect(npc_entity, camera)
add_aspect(npc_entity, third_person())
- Pros:
- Looks like it might work; Further research is warranted.
- Cons:
- Does this actually reduce complexity?
- What kind of type-theoretical implications does it have?