The attached ZIP file contains a Word document with some simple VBA code, plus two screen snapshots.
The goal was to produce a laid-out index of every word in a document, excepting for a list of prohibited words and excepting words below a specified length.
The report is placed in a new document with each word listed, its Count of occurences, and its occurrence as Page Number (within the document), Paragraph Number (within the document) and Line Number (within the page).
The output is sorted in Order of First Appearance within the document.
The report is in tab-delimited format to allow a plain convert-text-to-table operation.
If I were counting this as a favour to The Lounge(s) the score would read something like: Lions 23, Christians 4,567.
Indexing Words by page, paragraph, line
-
- PlutoniumLounger
- Posts: 15651
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Indexing Words by page, paragraph, line
You do not have the required permissions to view the files attached to this post.
He who plants a seed, plants life.
-
- Administrator
- Posts: 78625
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Indexing Words by page, paragraph, line
Thanks - it works well.
Just a few remarks:
Just a few remarks:
- It's cute to see the code work its way through the source document, but for "production" purposes it might be better to set Application.ScreenUpdating to False before processing the document, and to True again afterwards.
- Text comparison is case sensitive, so for example "mine", "Mine" and "MINE" end up as different entries. In my humble opinion, it would be better to convert everything to lower case, or to provide an extra parameter to specify case (in)sensitivity.
- It would also be nice to have an option to sort the words alphabetically, but I realize that would be more work, and it's not crucial.
Best wishes,
Hans
Hans
-
- PlutoniumLounger
- Posts: 15651
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Indexing Words by page, paragraph, line
True, true and true.HansV wrote:
- set Application.ScreenUpdating to False
- Text comparison is case sensitive,
- sort the words alphabetically,
And THANKS!
I ran this up early this morning for a friend - it's more a proof-of-concept than anything else.
It has none of my usual bells and whistles, and mirabile dictu it does NOT make use of my utility library UW.dot.
I certainly prefer Range over Selection, but this being a P-O-C I wanted the user to see das blinken lights. Also a developer can step through the code and see how it works, for now.
I usually have an INI file with an associated GUI form (supported by UW.dot) where the user can set options such as case-sensitivity, word-length, prohibited words and so on. Again, for this P-O-C not essential.
Sorting is not so hard. I use the original QSort algorithm I
Anyway, all perfectly valid points.
I'm not even sure if we have a line-of-page thread on Eileen's lounge.
Yet! (grin!)
P.S. I had to use a fine-toothed saw to trim the ZIP to 100K; first time I've been so close that a pixel made itself known ...
He who plants a seed, plants life.
-
- Administrator
- Posts: 78625
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Indexing Words by page, paragraph, line
100 KB was the size limit for attachments in Woody's Lounge before it moved to different software. Here in Eileen's Lounge, the size limit is 256 KB.ChrisGreaves wrote:P.S. I had to use a fine-toothed saw to trim the ZIP to 100K; first time I've been so close that a pixel made itself known ...
Best wishes,
Hans
Hans
-
- PlutoniumLounger
- Posts: 15651
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Indexing Words by page, paragraph, line
Correct. My old eyes ...HansV wrote:Here in Eileen's Lounge, the size limit is 256 KB.
I just re-zipped the original and it looks like 272Kb. I must have seen the "over-limit" messages and reverted to a mental state of 100K limit.
He who plants a seed, plants life.