Ars brevis
  • Easily define json convertible enum types in python.

    posted on

    Working on real world data, we will often encounter data values that are encoded as strings, when they actually are contrained to certain set of valid states. In these situations using an enumerable type has the advantage to directly contrain the possible states of our value to the actually valid states.

    While this works great conceptually, as soon as we start working on more complex programs with data IO needs, enums can be quite difficult to work with, as we will need to convert them each time we interact with a storage medium.

    If your needs are simple, storing data as JSON might be straightforward, but the following python script will fail:

    # example.py
    import json
    import enum
    
    
    class Color(enum.Enum):
        RED = "red"
        GREEN = "green"
        YELLOW = "yellow"
    
    
    raw_a = "red"
    a = Color(raw_a)
    
    
    with open("test.json", "w") as f:
        json.dump(a, f)
    $ python example.py
    Traceback (most recent call last):
      File "example.py", line 12, in <module>
        json.dump(a, f)
      File "/usr/local/Cellar/python@2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 189, in dump
        for chunk in iterable:
      File "/usr/local/Cellar/python@2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 442, in _iterencode
        o = _default(o)
      File "/usr/local/Cellar/python@2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 184, in default
        raise TypeError(repr(o) + " is not JSON serializable")
    TypeError: <Color.RED: 1> is not JSON serializable

    Our custom enum type unfortunately is not JSON serializable! Now this is understandable, as the json module cannot know how we would like to encode the given object as a JSON representation.

    Now we could either define a custom JSON encoder, which define our needed conversion, or alternatively we could use multiple inheritance.

    This multiple inheritance approach was first described by Justin Carter on Stackoverflow. In this post I attempt to provide a more in-depth explanaition of why it works and whether there are limitations to this approach instead of doing it properly by defining an encoder.

    Multiple inheritance fix

    Instead of just inheriting from enum.Enum in our Color enum, we will first inherit from str.

    # example_mi.py
    import json
    import enum
    
    
    class Color(str, enum.Enum):
        RED = "red"
        GREEN = "green"
        YELLOW = "yellow"
    
    
    raw_a = "red"
    a = Color(raw_a)
    
    
    with open("test.json", "w") as f:
        json.dump(a, f)
    $ python3 example_mi.py
    $ cat test.json
    "red"

    Success! Now this seems to work properly, as our enum is properly stored as its string value by json loads.

    But if anything, this solution should surprise you. Is it safe to use multiple-inheritance here? Could this usage break something else we are doing?

    First you should notice, that instead of writing str, enum.Enum the reversed order enum.Enum, str will produce an Error:

    $ python3 example_mi_wmro.py
    Traceback (most recent call last):
      File "/Users/max/Code/arsbrevis/code-samples/python_enum_json/example_mi_wmro.py", line 5, in <module>
        class Color(enum.Enum, str):
      File "/usr/local/Cellar/[email protected]/3.9.0_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/enum.py", line 131, in __prepare__
        member_type, first_enum = metacls._get_mixins_(cls, bases)
      File "/usr/local/Cellar/[email protected]/3.9.0_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/enum.py", line 521, in _get_mixins_
        raise TypeError("new enumerations should be created as "
    TypeError: new enumerations should be created as `EnumName([mixin_type, ...] [data_type,] enum_type)`

    But why do we get an error here? This has all to do with how Python multiple-inheritance works and how the python enum Class in particular works.

    When we use multiple inheritance in Python attributes will be searched left to right, eg by specifying str, enum.Enum if any attribute exists in str, we will not search further in enum.Enum. So why does the converse not work?

    Reading the python3 documentation on enums provides us with the answer: the enum class contains some special behavior for multiple inherintance, which is implemented via a metaclass. This metaclass expects the base class, as in the last class in our list, to be an enum class.

    When we use str, enum.Enum we are actually creating an derived Enum, which also fully works as their additional datatype. This is an expected usage of the enum class and as such fully within the scope of the library. Actually the documentation itself gives str, enum.Enum as a possible use-case for derived enumerations.

    json encoding

    Now how does this mesh with the json encoding itself? Actually the json-encoder simply does an isinstance check on the str class. Thus our enum is handled as a string. As the derived enum type automatically uses the repr and str methods of their derived data type the actual encoding itself works as we would expect.

    This can be seen in the python-implementation of the encoder.

  • Setup homebrew llvm/clang for compiling C++ on Mac OSX Big Sur

    posted on

    If you attempted to install llvm via homebrew and use it to compile a C++ Program without any additional configuration. It will likely fail with the following error:

    /usr/local/opt/llvm/bin/../include/c++/v1/wchar.h:119:15: fatal error: 'wchar.h' file not found
    #include_next <wchar.h>
                  ^~~~~~~~~
    1 error generated.

    This issue is mainly caused by the removal of the /usr/include directory in MacOSX 10.15 (Catalina).

    There are a couple of stackoverflow questions and answers on this issue. Unfortunately most of these are rather incomplete, do not explain how to fix C++ compilation and also refer to sysroot-based configuration. This works, but will require passing additional args to the clang++/g++ command.

    Libraries and Linking in C/C++

    Any programming language will require working with external libraries at some point. While in interpreted languages such as python or javascript, these might just be text files similar to the code you write yourself, often compiled versions of libraries will be used. For C/C++ code a library is split into the library binary itself and a header-file containing the declarations for the library.

    In order for us to use a library the compiler will need to know both the location of the header for given library and the location of the actual library binary code. On conventional UNIX-like systems, system headers are located in /usr/include and libraries are found in /usr/lib. These paths are automatically searched when code is compiled without additional arguments in a standard setup.

    Compilation and Toolchains in MacOSX

    This conventional way of storing libraries and header files worked well, when one assumed that the architecture of target systems and the local development systems match, which was mostly the case, when one was working on x86-based architectures. Once mobile development and development for multiple architectures are a concern, we will need a way to store multiple architecture versions of a library. As a developer we might also want to compile against different versions of target systems, that might also require slightly different versions of target libraries.

    On Mac the compilation toolchain is packaged inside XCode, which contains the entire set of libraries for each different platform and also different versions of the target platform. Before MacOSX 10.15 using xcode-select --install would install a set of headers and libraries matching the local system into the /usr path, allowing us to use the traditional convention once again.

    Fixing third-party compilers in Catalina

    In order to make our compiler work correctly on Catalina, we would need to include back these two library sources:

    • /usr/local/include and /usr/local/lib
    • /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk with /usr/include and /usr/lib

    This will include libraries installed by homebrew and system libraries needed for Mac OS.

    For example we could store these in environment variables, so that we could use these without addting command line arguments each time or having a configured build tool.

    CPATH is the environment variable used for specifying header directories by LLVM. LIBRARY_PATH is strictly speaking undocumented by llvm, but has been rather added for gcc drop-in compatibility. Thus using this should pose no problem, as long as this variable is used by gcc.

    export CPATH="/usr/local/include:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include"
    export LIBRARY_PATH="/usr/local/lib:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib"