The Virtual Tarball - Second Draft

by Jon Davis 22. November 2007 17:50

In prototyping yesterday's blogged idea with .vtb / .mrr files, I've run into some design flaws with the proposed "schema". Main problem among them is that directory structures are typically not described in flat lists but as <ul> trees. A local file name should not be described as

<li>dir</li>
<li>dir/subdir</li>
<li>dir/subdir/file.ext</li>

.. but rather as..

<ul class="mrrdir">
  <li>dir
     <ul>
       <li class="mrrdir">subdir
       <ul>
          <li class="mrrfile">file</li>
       </ul></li>
     </ul>
   </li>
 </ul>

This makes more sense because it when it is rendered in HTML it is more legible and maintainable in the DOM.

  • dir
    • subdir
      • file.ext

Imagine if this was "fuzzy" and not strict. If the filename could be "subsubdir/file.ext", or worse "C:/windows/system32/file.ext", you run into all sorts of problems trying to target the download destination path. Directory seperators are completely disallowed, then, in the text value of the file's <li> entity.

This changes the programming a bit on the Windows app side, in both easier and more difficult ways. It becomes easier to manage the directories, but now the files' download names have to be managed within the directories virtually. Note that by "difficult" I mean a few extra minutes, not a few extra hours; on the other hand, thinking this through, I've already lost a few hours and decided to start over in my code while it's still a brand new and barely written prototype codebase.

Meanwhile, the href value must assume that the base URI is always the base URI for the entire document, not for the listed directory.

Here's a proposed valid sample .mrr doc, where the base URI is: http://cachefile.net/  

<ul class="mrr">
 <li class="mrrdir">
  <a href="scripts">scripts</a>
  <ul class="mrrdir">
   <li class="mrrdir">
    <a href="scripts/jquery">jquery</a>
    <ul class="mrrdir">
     <li class="mrrdir">
      <a href="scripts/jquery/1.2.1">1.2.1</a>
      <ul class="mrrdir">
       <li class="mrrfile">
        <a href="scripts/jquery/1.2.1/jquery-1.2.1.js">jquery-1.2.1.js</a>
       </li>
       <li class="mrrfile">
        <a href="scripts/jquery/1.2.1/jquery-1.2.1.min.js">jquery-1.2.1.js</a>
       </li>
       <li class="mrrfile">
        <a href="scripts/jquery/1.2.1/jquery-1.2.1.pack.js">jquery-1.2.1.pack.js</a>
       </li>
      </ul>
     </li>
    </ul>
   </li>
  </ul>
 </li> 
</ul>

Rendered in plain HTML:

I'll update this post with a revised Windows app (C#) prototype soon.

The Virtual Tarball

by Jon Davis 21. November 2007 16:11

AFAIK, no one has done this, at least not in this specific way, I have a need for it, and I can see it being used everywhere. So I'm proposing it, and I'm going to implement it.

My idea: The virtual tarball. (Or something?) A file extension of something like .vtb, or .mrr (mirror file). Inside, it looks like it's just an XML file with XHTML-renderable hyperlinks, but the file type is used by an executable that pulls the files down into the specified directory with the <a> tags' text as the save-to file name.

Example contents:

<ul class="mrr">
 <li class="mrrfile">
  <a
 href="
http://cachefile.net/file_a.bin">file_a.bin</a>
 </li>
 <li class="mrrfile">
  <a
 href="
http://cachefile.net/file_b.bin">file_b.bin</a>
 </li>
 <li class="mrrdir">dir1</li>
 <li class="mrrfile">
  <a
 href="
http://cachefile.net/dir1/file_c.bin">dir1/file_c.bin</a>
 </li>
 <li class="mrrfile-alternate">
  <a
 href="
http://otherurl.net/dir1/file_c.bin">dir1/file_c.bin</a>
 </li>
</ul>

Given this sample, here's what Visual Studio outputted as an XML Schema file from automatic conversion:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
 elementFormDefault="qualified"
 xmlns:xs="
http://www.w3.org/2001/XMLSchema">
 <xs:element name="ul">
  <xs:complexType>
   <xs:sequence>
    <xs:element maxOccurs="unbounded" name="li">
     <xs:complexType mixed="true">
      <xs:sequence minOccurs="0">
       <xs:element name="a">
        <xs:complexType>
         <xs:simpleContent>
          <xs:extension base="xs:string">
           <xs:attribute name="href"
             type="xs:string" use="required" />
          </xs:extension>
         </xs:simpleContent>
        </xs:complexType>
       </xs:element>
      </xs:sequence>
      <xs:attribute name="class" type="xs:string" use="required" />
     </xs:complexType>
    </xs:element>
   </xs:sequence>
   <xs:attribute name="class" type="xs:string" use="required" />
  </xs:complexType>
 </xs:element>
</xs:schema>

The point of this is that it would look like HTML but it could be processed like a .zip file. Only difference between a .mrr file and a .zip file, other than the fact that a .zip file is compressed and isn't human-readable when introspected, is that a .zip contains the contents, whereas a .mrr file only contains hyperlinks to the downloadable files. In the above example, I also have an "-alternate" class so that the processor can see that as a mirrored repository for the same file.

Oh, and yeah, the point of the XHTML compatibility is partly for inspection and previewing, but also for Javascript DOM support. I'm thinking this could be my "engine" for a web browser script library pre-loader page idea I have for adding as a new feature for cachefile.net.

I'm going to get to work on an open source C# console application for Windows, as well as a Javascript browser caching implementation.

Update: I've spent most of the night prototyping the C# app. I'm calling it Mrrki ("murky"), and settled on .mrr (for "mirror"). Here's my first rough draft build: http://www.jondavis.net/codeprojects/Mrrki/0.1/Mrrki.zip.

kick it on DotNetKicks.com

Introducing CacheFile.net - The Central Repository for Common Internet Resources

by Jon Davis 19. November 2007 05:23

Alright so maybe it's a crappy generic domain name but it's a start. The idea is simple: Get all the popular JavaScript scripts and RSS feed graphics on one common URI. If all web sites can point to that URI for common resources, and the URI never changes for the specified resources, then their users are guarenteed to have a faster and more productive user experience than if all the web sites each had their own copies of the same resources.

It's simple math, really. And a site like this is much needed in the Internet community. Personally I'm shocked and amazed that no one has done it already.

http://cachefile.net/

So then, now it's done. And I'm gonna go get back to work on my other productive projects.

kick it on DotNetKicks.com

Currently rated 4.5 by 2 people

  • Currently 4.5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

Pet Projects | Computers and Internet | Web Development

Inline XAML

by Jon Davis 12. November 2007 03:53

I was whining earlier in some technology mailing lists that inline XAML isn’t supported by Silverlight. No one corrected me on that, but it looks like it is supported, rather cleanly and elegantly, except for a stupid Firefox bug. Thx to Jon Galloway for pointing all this out.

http://msdn2.microsoft.com/en-us/library/bb687962.aspx
http://weblogs.asp.net/jgalloway/archive/2007/10/31/silverlight-doesn-t-require-any-javascript.aspx

<html>
 
<head>
 
</head>
  <body>
    <script type="text/xaml" id="xamlContent">
        <?xml version="1.0"?>
        <Canvas ... >
                    ...
        </Canvas>
   
</script>

   
<div id="controlHost">
       
<object
           
id="silverlightControl"

            type
="application/x-silverlight"

            height
="400"

            width
="400">

         
<param name="Source" value="#xamlContent" />
       
</object>
   
</div>
</body>
</html>

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Web Development

Why Not Big RAM, RAM Disk, and 10 Gigabit Switches and Adapters?

by Jon Davis 9. November 2007 14:54

This is just something I've been pondering lately. I've been doing a lot of work this year with Lucene.Net (a port of Lucene, which is written in Java, to .NET) to manage a search engine. In our configuration, it uses RAMDirectory objects to retain the indexes in memory, then it searches the indexed content as though it was on disk. It takes up a lot of RAM, but it's very performant. A search query, including network load of transfering the XML-based query and the XML-based result set (over Windows Communication Framework), is typically in the range of about 0.05 seconds over a gigabit switch using standard, low-end, modern server hardware.

We don't just spider our sites with this stuff as with Google (or Nutch). We manually index our content using real field names and values per index, very similar to SQL Server tables except that you can have multiple same-name fields with different values in the same index record ("document") which is great for multiple keywords. If we could get it to properly join index on fields like in SQL Server you can join tables on fields, as well as to perform arbitrary functions or delegates as query parameters (which is DOABLE!!), we'd have ourselves something that is useful enough for us to consider throwing SQL Server completely out the window for read-only tasks and get a hundredfold performance boost. Yes, I just said, that!!

Because of the load we put on RAM, trying to keep the I/O off the SCSI adapter and limit it to the memory bus, all of this has led me to question why network and RAM capacities have not evolved nearly as fast as hard drive capacities. It seems to me that a natural and clean way of managing the performance of any high-traffic, database-driven web site is to minimize the I/O contention, period. I hear about people spending big money on redundant database servers with all these terabytes of storage space, but then only, say, 16 GB of RAM and gigabit switch. And that's fine, I guess, considering how when the scale goes much higher than that, the prices escalate out of control.

That, then, is my frustration. I want 10 gigabit switches and adapters NOW. I want 128GB RAM on a single motherboard NOW. I want 512GB solid state drives NOW. And I want it all for less than fifteen grand. Come on, industry. Hurry up. :P

But assuming that the hardware became available, this kind of architectural shift would be a shift, indeed, that would also directly affect how server-side software is constructed. Microsoft Windows and SQL Server, in my opinion, should be overhauled. Windows should natively support RAM disks. Microsoft yanked an in-memory OLE-DB database provider a few years ago and I never understood why. And while I realize that SQL Server needs to be rock-solid for reliably persisting committed database transaction to long-term storage, there should be greater design flexibility in the database configuration and greater runtime flexibility, such as in the Transact-SQL language, that determines how transactions persist (lately or atomically).

Maybe I missed stuff that's already there, which is actually quite likely. I'm not exactly an extreme expert on SQL Server. I just find these particular aspects of data service optimizations an area of curiosity.


 

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

About the author

Jon Davis (aka "stimpy77") has been a programmer, developer, and consultant for web and Windows software solutions professionally since 1997, with experience ranging from OS and hardware support to DHTML programming to IIS/ASP web apps to Java network programming to Visual Basic applications to C# desktop apps.
 
Software in all forms is also his sole hobby, whether playing PC games or tinkering with programming them. "I was playing Defender on the Commodore 64," he reminisces, "when I decided at the age of 12 or so that I want to be a computer programmer when I grow up."

Jon was previously employed as a senior .NET developer at a very well-known Internet services company whom you're more likely than not to have directly done business with. However, this blog and all of jondavis.net have no affiliation with, and are not representative of, his former employer in any way.

Contact Me 


Tag cloud

Calendar

<<  May 2018  >>
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar