Configuring Logging in Nakamura/Sling, at runtime

12 03 2010

One of the most frustrating things about trying to find out what is broken in a running application is that the information in the logs is not informative enough, and if it is, there is too much of it to be useful. This is often because the logging configuration has to be set prior to the server starting. So if you have 2 options. Configure the logging at debug level just in case there is a problem, or restart the server when there is a problem. Nether are really practical in production. Debug an everything kills the server, and your definitely in the wrong profession if you can predict where to put debug on (ie the future), you should be a banker.

Fortunately Sling/Felix have a nifty feature, exposed in Sakai Nakamura that allows a sysadmin to reconfigure logging at runtime. If you go to the Configuration console plugin http://localhost:8080/system/console/configMgr and find the Configuration factory called “org.apache.sling.commons.log.LogManager.factory.config” you can create a new configuation. In this configuration you can specify a level, target file, and best of all, a package sub path. For instance I just needed to look at the way ACLs were being resolved in Nakamura so I created a configuration logging to logs/acl.log at log level Debug and used a package name org.apache.jackrabbit.core.security.authorization.acl. Now all the ACL activity happening in a class DynamicACLProvider appears in the log file logs/acl.log. When I am done, I can delete the configuration. Neat. No server restart, and highly targeted logging with no need to go round to each node in a cluster and configure each one differently. I could probably make it all stream out to syslogd on demand and collect the results from a cluster centrally. So if you are sinking under mountains of log files with useless debug statements that you cant get rid of till next time downtime is scheduled, there is another way.





ACL extensions just got easier

9 03 2010

Extending the types of ACLs in Jackrabbit 1.x was hard. After, 1.5 where there was a reasonable ACL implementation, much of the code that managed this area was buried deep within inner classes inside the DefaultAccessManager and related classes of Jackrabbit 1.5/1.6. In Jackrabbit 2 as part of the improvement to the UserManager (I guess) its become much easier to make extention. I had a patch that modified the core classes of ACLEditor, ACLProvider, ACLTemplate in JR1.5 allowing the injection of a class that would control the way in which Access Control Entries were collected for a node. This allowed things like dynamic principal membership to be implemented (eg membership of a group that is determined by a rule set, and not a membership record). The upside, was this was possible in 1.5, the downside was that it was hard and required re-implementation of some of the core classes, in the same package space to get round private, protected and even code blocks based on class names. So the patch was huge and hard to maintain.

In JR2, inside the acl.ACLProvider there are protected methods designed to be overridden. You still have to do this inside the same package as the native JR2 classes, in order to instance private internal classes, but at least extending the default ACL resolution in Jackrabbit is a case of extending a few classes. All of this put inside an OSGi environment makes it possible to re bundle an embedded Jackrabbit server with custom extensions on the core… and no need for a massive patch set…. another reason to upgrade to JR2.





Jackrabbit2 User Manager Scaling.

7 03 2010

In Jackrabbit 1.x the User manger (at least after 1.5) stored users in a security workspace within the repository. This was great, and works well upto about 1000 users. However it uses a structure where users are nested in the user that created the user. If if “admin” creates all your users, then there will be 1000 child nodes to the admin user node. Sadly, additions become slower the more child nodes there are. Pretty soon a number of things happen. The number of child nodes causes updates to be come so slow it hard to add users (>5K users). This can be addressed by a sharding the child node path, avoiding large numbers of child nodes. Secondly (and this is harder to solve), the query that is used to find a user, or to check that the user doesn’t exist somewhere becomes progressively more expensive. So that when you get to about 25K users the create user operation has slowed by an order of magnitude.  That may not sound too bad, since its not often that you want to create a user, however, retreval of a user that becomes slower as well since you cant calculate the location of the user node from the userID, and since this needs to be done on almost every request, it slows everything.

Fortunately it looks like all of this has been fixed in Jackrabbit 2. Now the UserManager does not store users in a nested form, it has an autoscaling sharding mechanism and the search query generates results that are far more direct. Some of this is not enabled by default, but here is the config that makes it work.

In repository.xml

<!DOCTYPE Repository 
       PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
"http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
<Repository>
     <FileSystem>
        <param name="path" value="${rep.home}/repository"/>
     </FileSystem>
     <Security appName="Jackrabbit">
        <SecurityManager workspaceName="security">
           <UserManager>
             <param name="defaultDepth" value="4" />
             <param name="autoExpandTree" value="true" />
             <param name="autoExpandSize" value="500" />
           </UserManager>
        </SecurityManager>
        <AccessManager>
        </AccessManager>
...

In tests in Sling I see no slowdown in user creation upto 10K node. With HTTP requests ranging from 12-25ms. To add those 10K nodes from a single request thread takes 6m4s, a rate of about 27/s. The same test performed with Sling on Jackrabbit 1.6 was averaging 6/s over 10K nodes. Concurrent add operations tend to speed this up further as the http cost is factored out. My laptop runs out of steam at about 70/s.

User Node paths that are generated are of the form

.../t/te/testuser201003070946420-300
with auto scaling in operation they become
.../t/te/tes/test/testu/testus/testuse/testuser/testuser2/testuser20/testuser201/testuser2010/testuser20100/testuser201003071035370-4998
However this is an extreem case where all the user ID’s are almost identical.
Conclusion.
If you have a app that has a lot of users, use Jackrabbit 2 not Jackrabbit 1.6, as it scales better.