HowTo: Quickly resolve what an Sling/OSGi bundle needs.

30 10 2012

Resolving dependencies for an OSGi bundle can be hard at times, especially if working with legacy code. The sure-fire way of finding all the dependencies is to spin the bundle up in an OSGi container, but that requires building the bundle and deploying it. Here is a quick way of doing it with maven, that may at first sound odd.

If your building your bundle with maven, you will be using the BND tool via the maven-bundle-plugin. This analyses all the byte code that is going into the bundle to work out what will cross over the class-loader boundary. BND via the maven-bundle-plugin has a default import rule of ‘*’. ie import everything. If you are trying to control which dependencies are embedded, which are ignored and which are imported, this can be a hinderance. Strange though it sounds, if you remove it life will be easier. BND will immediately report everything that it needs to import that can’t be imported. It will refuse to build which is a lot faster than generating a build that won’t deploy. The way BND reports is also useful. It tells you exactly what it can’t find and this gives you a list of packages to import, ignore or embed. Once you think you have your list of package imports down to a set that you expect to come from other bundles in your container, turn the ‘*’ import back on and away you go.

In maven that means editing the pom.xml eg:

         <!-- add ignore packages before the * as required eg. !org.testng.annotations, -->
         * <!-- comment the * out to cause BND to report everything its not been told to import -->
         <!-- add packages that you want to appear as raw classes in the jar as private packages Note, they dont have to source code in the project, they can be anywhere on the classpath for the project, but be careful about resources eg* -->

           <!-- embed dependencies (by artifact ID, including transitives if Embed-Transitive is true) that you dont want exposed to OSGi -->

The OSGi purists will tell us that it’s heresy to embed anything but sometimes with legacy systems it’s just too painful to deal with the classloader issues.

There is probably a better way of doing this, if so, do tell.


OSGi and SPI

13 12 2011

OSGi provides a nice simple model to build components in and the classloader policies enable reasonably sophisticated isolation between packages and versions that make it possible to consider multiple versions of an API, and implementations of those APIs within a single container. Where OSGi starts to become unstuck is for SPI or Service Provider Interfaces. It’s not so much the SPI that’s a problem, rather the implementation. SPI’s normally allow a deployer to replace the internal implementation of some feature of a service. In Shindig there is a SPI for the various Social services that allow deployers to take Shindig’s implementation of OpenSocial and graft that implementation onto their existing Social graph. In other places the SPI might cover a lower level concept. Something as simple as storage. In almost all cases the SPI implementation needs some sort of access to the internals of the service that it is supporting, and that’s where the problem starts. I most of the models I have seen, OSGi bundles Export packages that represent the APIs they provide. Those APIs provide a communications conduit to the internal implementation of the services that the API describes without exposing the API. That allows the developer of the API to stabilise the API whilst allowing the implementation to evolve. The OSGi classloader policy gives that developer some certainty that well-behaved clients (ie the ones that don’t circumvent the OSGi classloader policies) wont be binding to the internals of the implementation.

SPIs, by contrast are part of the internal implementation. Exposing an SPI as an export from a bundle is one approach, however it would allow any client to bind to the internal workings of the Service implementation, exposed as an API and that would probably be a mistake. Normal, well-behaved clients, could easily become clients of the SPI. That places additional, unwanted burdens on the SPI interface as it can no longer be fully trusted by the consumer of the SPI or its implementation.

A workable solution appears to be to use OSGi Fragment bundles that bind to a Fragment Host, the Service implementation bundle containing the SPI to be implemented. Fragment bundles different to normal bundles in nature. Its probable best to think of them as a jar that gets added to the classpath of bundle identified as the Fragment Host on activation, so that the Fragment bundles contents become available to the Fragment Hosts classloader. Naturally there are some rules that need to be observed.

Unlike an OSGi bundle a Fragment bundle can’t make any changes to imports and exports of the Fragment Host classloader. In fact if the manifest of the fragment contains any Import-Package, or Export-Package statements, the Fragment will not be bound to the Fragment Host. The Fragment can’t perform activation and the fragment can’t provide classes in  a package that already exists in the Fragment Host bundle, although it appears that a Fragment host can provide unique resources in the same package location. This combination of restrictions cuts off almost all the possible routes for extension, converting the OSGi bundle from something that can be activated, into a simple jar on the classloaders search path.

There is one loophole that does appear to work. If the Fragment Host bundle specifies a Service-Component manifest entry that specifies a service component xml file that is not in the Fragment Host bundle, then that file can be provided by the Fragment bundle. If you are using the BND (or Felix Bundle plugin) tool to specify the Service-Component header, either explicitly or explicitly you will find that your route is blocked. This tool checks that any file specified exists. If the file does not exist when the bundle is being built, BND refuses to generate the manifest. There may be some logic somewhere in that decision, but I havent found an official BND way of overriding the behaviour. The solution is to ask the BND tool to put an empty Service-Component manifest header in, then merge the manifest produced with some supplied headers when the jar is constructed. This allow you to build the bundle leveraging the analysis tools within BND and have a Service-Component header that contains non-existent server component xml files.

On startup, if there is no Fragment bundle adding the extra service component xml file to the Fragment Host classloader, then an error is logged and loading continues. If the Fragment bundle provides the extra service component xml file, then its loaded by the standard Declarative Service Manager that comes with OSGi. In that xml file, the implementor of the SPI can specify the internal services that implement the SPI, and allow the services inside the Fragment Host to satisfy their references from those components. This way, a relatively simple OSGi Fragment bundle can be used to provide an SPI implementation that has access to the full Fragment Host bundle internal packages, avoiding exposing those SPI interfaces to all bundles.

In SparseMap, I am using this mechanism to provide storage drivers for several RDBMs’s via JDBC based drivers and a handful of Column DBs (Cassandra, HBase, MongoDB). The JDBC based drivers imply contain SQL and DDL configuration as well as a simple declarative service and the relevant JDBC driver jar. This is because the JDBC driver implementation is part of the Fragment Host bundle, where it lies inactive. The ColumnDB Fragment bundles all contain the relevant implementation and client libraries to make the driver work. SparseMap was beginning to be a dumping ground for every dependency under the sun. Formalising a storage SPI and extracting implementations into SPI Fragment bundles has made SpraseMap storage independently extensible without having to expose the SPI to all bundles.

This will be in the 1.4 release of SparseMap due in a few days. For those using SparseMap, they will have to ensure that the SPI Fragment bundle is present in the OSGi container when the SparseMap Fragment Host bundle becomes active. If its not present, the repository in SparseMap will fail to start and an error will be logged indicating that OSGI-INF/serviceComponent.xml is missing.


25 11 2011

In spare moments between real work, I’ve been experimenting with a light weight content server for user generated content. In short, that means content in a hierarchical tree that is shallow and very wide. It doesn’t preclude deep narrow trees, but wide and shallow is what it does best. Here are some of the things I wanted to do.

I wanted to support the same type of RESTfull interface as seen in Sakai OAE’s Nakamura and standards like Atom. By that I mean where the URL points to a resource, and actions expressed by the http methods, http protocol and markers in the URL modify what a RESTfull request does. In short, along the lines of the arguments in which probably drove the thinking behind Sling on which Nakamura is based. I mention Atom, simply because when you read the standard  it talks about the payload of a response, but makes no mention of how the URL should be structured to get that payload. It reinforces the earlier desire.

I wanted the server to start as quickly as possible, and use as little memory  as possible. Ideally < 10s and < 20MB. Java applications have got a bad name for bloat but there is no reason they have to be huge to serve load. Why so small (in Java terms)? Why not, contrary to what most apps appear to do, memory is not there to waste?

I wanted the core server to be support some standard protocols. eg WebDav, but I wanted to make it easy to extend. JAX-RS (RestEasy) inside OSGi (Minimal Sling bootstrap + Apache Felix)

I wanted the request processing to be efficient. Stream all requests (commons-upload 1.2.1 with streaming, no writing to intermediate file or byte[] all of which involve high GC traffic and slow processing), all things only processed once and available via an Adaptable pattern, a concept strong in Sling. And requests handled by response objects, not servlets. Why ? So the response state can be thread unsafe, so a request can be suspended in memory and unbound from the thread. And the resolution binding requests to resources to responses to be handled entirely in memory by pointer avoiding iteration. Ok so the lookup of a resource might go through a cache, but the resolution through to resource is an in memory pointer operation.

Where content is static, I wanted to keep it static. OS’s have file systems that are efficient at storing files, efficient at loading those file from disk and eliminating disk access completely, so if the bulk of the static files that my application needs really are static, why not use the filesystem. Many applications seem to confuse statically deterministic and dynamic. If the all possibilities of can be computed at build time, and the resources requires to create and serve are not excessive, then the content is static. Whats excessive ? A production build that takes 15 minutes to process all possibilities once a day is better than continually wasting heat and power doing it all the time. I might be a bit more extreem in that view accepting that filling a TB disk with compiled state is better than continually rebuilding that state incrementally in user facing production requests. If a deployer wants to do something special (SAN, NAS, something cloud like) with that filesystem there are plenty of options. All of Httpd/Tomcat/Jetty are capable of serving static files in high 1000s of requests per second concurrent, so why not use that ability. Browser based apps need every bit of speed they can get for static data.

The downside of all of this minimalism is a server that doesn’t have lots of ways of doing the same thing. Unlike Nakamura, you can’t write JSPs or JRuby servlets. It barely uses the OSGi Event system and has none of the sophistication of Jackrabbit. The core container is Apache Felix with the the Felix HttpSerivice running a minimalist Jetty. The Content System is Sparse Content Map, the search component is Solr as an OSGi bundle. Webdav is provided by Milton and Jax-RS by RestEasy. Cacheing is provided by EhCache. It starts in 8Mb in 12s, and after load drops back to about 10MB.

Additional RESTfull services are creating in one of three ways.

  1. Registering a servlet with the Felix Http Service (whiteboard), which binds to a URL, breaking the desire that nothing should bind to fixed URLs.
  2. Creating a component that provides a marker service, picked up by the OSGi extension to RestEasy that registers that service as a JAX-RS bean.
  3. Creating a factory service that emits JAX-RS annotated classes that act as response objects. The factory is annotated with the type of requests it can deal with, and the response objects tell JAX-RS what they can do with the request. The annotations are discovered when the factory is registered with OSGi, and those annotations are compiled into a one step memory lookup. (single concurrent hashmap get)

Methods 1 and 2 have complete control over the protocol and are wide open to abuse, method 3 follows a processing pattern closely related to Sling.

Integration testing

Well unit testing is obvious, we do it and we try and get 100% coverage of every use case that matters. In fact, if you work on a time an materials basis for anyone, you should read your contract carefully to work out if you have to fix mistakes at your own expense. If you do, then you will probably start writing more tests to prove your client that what you did works. Its no surprise, in other branches of Engineering, that acceptance testing is part of many contracts. I dont think an airline would take delivery of a new plane without starting the engines, or a shipping line take delivery of a super tanker without checking it floats. I am bemused that software engineers often get away with saying “its done”, when clearly its not. Sure we all make mistakes, but delivering code without test coverage is like handing over a ship that sinks.

Integration testing is less obvious. In Sling there is a set of integration tests that test just about everything against a running server. Its part of the standard build but lives in its one project. Its absolutely solid and ensures that nothing builds that is broken, but as an average mortal, I found it scary since when thing did break I had to work hard to find out why. Thats why in Nakamura we wrote all integration tests in scripts. Initially bash and perl then later Ruby. With hindsight this was a huge mistake. First, you had to configure your machine to run Ruby and all the extensions needed. Not too hard on Linux, but for a time, those on OSX would wait forever for ports to finish building some base library. Dependencies gone mad. Fine if you were one of the few who created the system and pulled everything in over many months, but hell for the newcomer. Mostly, the newcomer walks away, or tweets something that everyone ignores.

The devs also get off the hook. New ones dont know where to write the tests, or have to learn Ruby (replace Ruby with whatever the script is). Old devs can sweep them under the carpet and when it gets to release time ignore the fact that 10% of the tests are still broken… because the didn’t have time to maintain them 3 fridays ago at 18:45, just before they went to a party. The party where they zapped 1% of their brain cells including the ones that were remembering what they should have done at 18:49. Still they had a good time, the evening raised their morale, started a great weekend ready for the next week and besides, they had no intention of boarding the ship.

So the integration testing here is done as java unit tests. If this was a c++ project they would be c++ unit tests. They are in the bundle where where the code they test is. They are run by “mvn -Pintegration test”. Even the command says what is going to happen. It starts a full instance of the server (now 12s becomes an age), or uses one thats already running and runs the tests.  If your in eclipse, they can be run in eclipse, just as another test might, and being OSGi, the new code in the bundle can be redeployed to the running OSGi container. That way the dev creating the bundle can put their tests in their bundle and do integration testing with the same tools they did unit testing. No excuse. “find . -type d  -name integration | grep src/test  ” finds  all integration tests, and by omission ships that sink.

Linear Classloaders and OSGi

4 09 2008

OSGi does not remove all classloader problems as can be seen from and where the Peter Kriens notes that 

“Hibernate manipulates the classpath, and programs like that usually do not work well together with OSGi based systems. The reason is that in many systems the class visibility between modules is more or less unrestricted. In OSGi frameworks, the classpath is well defined and restricted. This gives us a lot of good features but it also gives us pain when we want to use a library that has aspirations to become a classloader when it grows up.”

It turns out that some JPA solutions are OSGi friendly, others are not. It all depends on what is done to load the persistence.xml and the related classes, and then the proxy classes cause further classloader problems.
I guess, since the author is Peter is OSGi Director of Technology, he knows what he is talking about.
Apparently EclipseLink was written to be OSGi friendly, and non-proxy, classloader clever ORM solutions also work, Cayenne falls into this group, and reportedly works OK in side OSGi, although I don’t know if that’s v2 or v3