[4suite] [Amara] performance of pushbind on large files?
Uche Ogbuji
uche at ogbuji.net
Fri Dec 22 10:01:34 MST 2006
Robert Casties wrote:
> I have a little program that reads FileMaker FMPXMLRESULT files (one of
> the worst XML formats I've seen) and writes the data into a database.
>
> Because the files are rather big (340MB) I wrote the first version of my
> program using Python pulldom. The result was not bad but it still takes
> 80 minutes (on a 1.6GHz Mac G5) to churn through 190,000 ROW elements
> with 86 COL elements each.
>
> So I thought maybe amara.pushbind (or pushdom) would do better and wrote
> a version of the program using pushbind. The resulting program runs
> nicely on small files (50 ROWs) but it takes forever to run on big files
> (190k ROWs). The program essentially stops at the first call of
>
> for f in amara.pushbind(filename, u'fm:METADATA/fm:FIELD', prefixes=fm_ns):
> fn = f.NAME
>
> for 30 minutes, using almost full CPU (while the METADATA tag is at the
> beginning of the file). After the stall it seems to run OK though I
> haven't timed it.
>
> Is this a known problem and is pushbind the wrong solution or am I doing
> something wrong?
Actually, I used to use pushbind on huge files, 100MB (but not 340MB)
months ago with no problems. A few weeks ago I needed to process an
80MB document for a client and pushbind did just what you describe.
I've been so super-busy recently that I have not had time to go back and
investigate, but I suspect I introduced a bug into pushbind at some
point, and I need to try to fix it.
Thanks for this reminder. I'll have a look today or this weekend, and
I'll include your example in my testing and report back.
FWIW pushbind, and all bindery ops should become a lot faster based on
architectural changes planned for Amara 2.0, but it should be able to
handle these use-cases at least as well as pulldom right now, or I
consider that a bug.
--
Uche Ogbuji Work: The Kadomo Group, Inc.
http://uche.ogbuji.net http://kadomo.com
http://copia.ogbuji.net Lead dev at http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/
More information about the 4suite
mailing list