Looking for:

Microsoft office word 97-2007 binary file format (.doc) specification free download

Click here to Download

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Specifies the Word. This page and associated content may be updated frequently. We recommend you subscribe to the RSS feed to receive update notifications. From time to time, Microsoft may publish a preview, or pre-release, version of an Open Specifications technical document for community review and feedback.

To submit feedback for a preview version of a technical document, please follow any instructions specified for that document. If no instructions are indicated for the document, please по этому адресу feedback by using the Open Specification Forums.

The preview period for a technical document varies. Additionally, not every technical document will be published for preview. A preview version of this document may microsoft office word 97-2007 binary file format (.doc) specification free download available on the Office File Formats – Preview Documents page.

After the preview period, the most current version of the document is available on this page. Find resources for creating interoperable solutions for Microsoft software, services, hardware, and non-Microsoft products:.

Technical Documentation. Additionally, overview documents cover inter-protocol relationships and interactions. This documentation is covered by Microsoft copyrights. Regardless of any other terms смотрите подробнее are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation.

You can also distribute in your microsoft office word 97-2007 binary file format (.doc) specification free download, with or without modification, any schemas, IDLs, or code samples that are included in the documentation.

This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft’s delivery of this documentation grants any licenses under those patents or any other Microsoft patents.

If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting iplg microsoft. Microsoft office word 97-2007 binary file format (.doc) specification free download Programs. To see all of the protocols in scope under a specific license program and the associated patents, visit the Patent Map.

The names of companies and products contained in this documentation might be covered by trademarks or similar intellectual property rights. This notice does not grant any licenses under those rights.

For a list of Microsoft trademarks, visit www. Fictitious Names. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events that are depicted посетить страницу this documentation are fictitious.

No association with any real company, microsoft office word 97-2007 binary file format (.doc) specification free download, product, domain name, email address, logo, person, place, or event is intended or should be inferred. Reservation of Rights.

All other rights are reserved, and this notice does not grant any rights other than as specifically described above, whether by implication, estoppel, or otherwise. The Жмите сюда Specifications documentation does not require the use of Microsoft programming tools or programming environments in order for you to develop an implementation.

If you have access to Microsoft programming tools and environments, you are free to take advantage of them. Certain Open Specifications documents are intended for use in conjunction with publicly available standards specifications and network programming art and, as such, assume that the reader either is familiar with the aforementioned material or has immediate access to it. For questions and support, please contact http://replace.me/27639.txt microsoft.

Feedback will be sent to Microsoft: By pressing the submit button, your feedback will be used to improve Microsoft products and services.

Privacy policy. Skip to main content. Contents Exit focus mode. Is this page helpful? Yes No. Any additional feedback? Skip Submit.

 
 

Microsoft office word 97-2007 binary file format (.doc) specification free download. File Formats: Microsoft Word Document (DOCX/DOC)

 

Formst may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in these materials.

Except as expressly provided in the Microsoft Open Specification Promise and this notice, the furnishing of these materials does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The information contained in worf document represents the point-in-time view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of authoring.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, microsoft office word 97-2007 binary file format (.doc) specification free download, person, microsoft office word 97-2007 binary file format (.doc) specification free download or event is intended or should be inferred.

All rights reserved. Table of Contents Table of Microsoft office word 97-2007 binary file format (.doc) specification free download Doc Files Table Properties TAP Many cyberghost for windows 10 64 bit in this document use the name of formmat internal structure when the file-specific structure is what is really being referred to.

Additions to Word Spevification were several additions to the binary file format with the release of Microsoft Office Word Word introduces a new XML-based file format. While this new format is the default format for documents saved by WordWord also provides the capability to save files to the binary Word file format used in previous versions.

This release of the Word binary file format documentation includes information about the custom XML storage. Word adds several new records to the binary file format to store information about documents created in Word Each of these records stores information about features specific to Word This data is preserved in the binary format so that when reopened in Worddocuments retain data and features that are only available in the newer version.

Mixrosoft and. Doc Files The binary format for Microsoft Word 97 and later versions is based on a structure referred to as a. A Word. The object stream contains xpecification data for embedded objects. Word has no knowledge of the contents ofthis stream. The majority of fi,e document describes the contents of the main stream and the table stream.

In a complex fast-saved Word document, FKP pages are intermingled with pages of text in a random pattern which reflects the history soecification past fast saves.

A bookmark is frequently used as an operand in field microsoft office word 97-2007 binary file format (.doc) specification free download instructions within a field. A bookmark is represented by three parallel data structures, the sttbBkmk, the plcbkf and downoad plcbkl. The sttbBkmk is a string table which contains the name of each defined bookmark. The plcbkf records the beginning CP position of each bookmark. The plcbkl records the limit CP position that delimits the end of a bookmark.

Since bookmarks may be nested within one another to any level, the BKF structure stored in the plcbkf consists of a single index that identifies which plcbkl marks the end of the bookmark.

When a run of text is tagged with a particular character style, a chpx defined for the character style is applied to the character properties defined for the paragraph style of the paragraph that contains the text. This means that the character style can change one or more of the character property field settings specified spfcification the paragraph style of a paragraph to a particular setting without changing the value of any other field.

In Word 6. By applying a CHPX to the character properties CHP inherited by a particular paragraph from its style, it is possible to reconstitute the CHP for the portion of the character run that intersects that paragraph. The high-order byte offlce be zero. Источник статьи maximum value for a single byte is 0xFF. The intensity for each argument is in the range 0 through If all three intensities are zero, the result xownload black.

If all three intensities arethe result is white. CP Character Position : A читать полностью integer specifying the position coordinate of a character of text within the logical text stream of a document. Word files are. To locate the native data for Embedded objects, scan the plc of field codes for the mother, header, footnote and annotation, textbox and header textbox documents fib.

For each separator field, get the chp. If chp. The file location of the object data is stored in chp. At the specified location an object header is stored followed by the native data for the object. A piece table must be stored in the file to describe the text stream of the document.

Due to Unicode compression to code pageall files simple and complex now contain a piece table. FC File Character position : A four-byte integer which is the byte offset of a character or other object from the beginning of a stream of the. Before a file has been edited i. After a file has been edited i. Begins at offset 0 in the file.

This gives the beginning offset and lengths of the document’s text stream and subsidiary data structures within the file. Also stores other file status information.

The first part of the structure contains field codes which instruct Word to insert text into the second part of the structure, the field result. Fields in Word are used to insert text from an external file or to quote another part of a document, to mark index and table of contents entries and produce indexes and tables of contents, maintain DDE links to other programs, to produce dates, times, page numbers, sequence numbers, etc.

There are 91 different field types. A field begin mark delimits the beginning of a field and precedes microsotf of the field codes stored in the field.

Binxry end of the field codes and the beginning of the field result is marked with the field separator and the field result and the field itself are terminated by a field end mark.

The CP locations of specificaion field begin mark, field separator, and field end mark are recorded in plcfld data structures that are maintained for microsoft office word 97-2007 binary file format (.doc) specification free download main document and all of microsoft office word 97-2007 binary file format (.doc) specification free download subdocuments of the main document whenever a field is inserted or edited. A micrsooft can be dead, in which case it has no field separator, no field result, and no entry in the plcfld.

See the definition of the FLD structure for binaty list of possible dead field code strings. An array of two-byte FLD structures is stored in the plcfld in a 1-to-1 correspondence with the recorded CP entries. An FLD associated with a field begin mark records the type of the field.

An FLD associated with the field end speccification records the specificatiion status of the field i. Fields may be binarry. Twenty 20 gree of nesting are permitted. FKP Formatted disK Mcrosoft : A data structure that fits in one byte page that encodes either the character properties or the paragraph properties of a certain portion of a Word.

An FKP consists of four components: 1 a count of the number of runs or paragraphs described by the page. Each BX begins with an offset that locates the properties of the paragraph that begins at a particular FC. Then search ifle the bin table for the specificaation of property you want to produce, читать статью find the FKP in the document stream whose array (.eoc) FCs encompasses the FC of the document character.

Add this offset to the beginning address of the FKP in memory. The text stream of a non-complex file can be described by an fc an offset from the beginning of fole file to mark where the text begins and a ccp count of CPs to record how many characters are stored in the binarry stream.

However, a full-saved piece table will not have property modifiers prms and all text in the file is referenced by the piece table. The 0th sprm is recorded at microsoft office word 97-2007 binary file format (.doc) specification free download 0 of the structure. Any succeeding sprms are recorded immediately after the end of the preceding sprm.

OLE bianry. Only main documents and header documents contain Office Drawing objects. The native data for an Office Drawing object may be obtained by taking the CP for the special character and using this to find the corresponding entry in the plcspa.

An entry worx this plc consists speciifcation microsoft office word 97-2007 binary file format (.doc) specification free download FSPA structure, which is described elsewhere in this document. Office Drawing objects can have text attached to them. Text for the textboxes is stored separately in the textbox subdocument of the main or header document.

Textboxes can be linked in chains of up to 32 textboxes. Ordering of textboxes in the subdocument is completely unrelated to the document structure due to the nature of textbox linking. This contains an index страница into plctxbxs and a sequence number in the chain of linked textboxes. So, for each entry in the plctxbxs there is a corresponding entry in the plctxbxBkd at the same CP, and tree may be additional entries in the plctxbxBkd to describe the breaks from one textbox to the next in linked textbox chains.

In Word data structures, an unsigned two-byte integer page number is given the acronym PN for Page Number. The PAPX contains an ISTD a style code to identify the style in control of the paragraph and a binarg which specificqtion how the style’s paragraph properties must be changed to produce the paragraph properties of the paragraph.

A paragraph style provides a set of character and paragraph property defaults for the text of any paragraph tagged with that style. When a new paragraph is created and given a particular style, newly typed text is set to the character and paragraph properties of that style unless the user makes an exception to the paragraph style definition by performing other editing operations.

The fcPic is a byte offset into the data stream. Beginning at the position recorded in chp. If the picture is an Office shape, a Window’s metafile or a bitmap, the shape, metafile or bitmap will immediately follow the PIC. Pictures that are a reference to an Office shape file will include both the filename and the shape in that order.

Pictures inserted with Word 97 bjnary later versions are in the new Office shape format documented elsewhere. However, pictures can be copied from older files into newer ones and their fjle format will persist until the picture is edited or displayed.

See Appendix B for a discussion of this technique. The array of CPs in the plcfpcd defines a partitioning of the Word document into disjoint pieces. The second array is offide array of PCDs Piece Descriptors which is in 1-to-1 correspondence to the array of CPs that records the physical location in the Word file where the corresponding piece begins.

 

Doc (computing) | Informatika & Komputer | | replace.me – INTRODUCTION

 

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in these materials. Except as expressly provided in the Microsoft Open Specification Promise and this notice, the furnishing of these materials does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The information contained in this document represents the point-in-time view of Microsoft Corporation on the issues discussed as of the date of publication.

Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of authoring. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.

All rights reserved. Table of Contents Table of Contents Doc Files Table Properties TAP Many discussions in this document use the name of the internal structure when the file-specific structure is what is really being referred to. Additions to Word There were several additions to the binary file format with the release of Microsoft Office Word Word introduces a new XML-based file format.

While this new format is the default format for documents saved by Word , Word also provides the capability to save files to the binary Word file format used in previous versions. This release of the Word binary file format documentation includes information about the custom XML storage.

Word adds several new records to the binary file format to store information about documents created in Word Each of these records stores information about features specific to Word This data is preserved in the binary format so that when reopened in Word , documents retain data and features that are only available in the newer version. Word and. Doc Files The binary format for Microsoft Word 97 and later versions is based on a structure referred to as a.

A Word. The object stream contains binary data for embedded objects. Word has no knowledge of the contents ofthis stream. The majority of this document describes the contents of the main stream and the table stream. In a complex fast-saved Word document, FKP pages are intermingled with pages of text in a random pattern which reflects the history of past fast saves.

A bookmark is frequently used as an operand in field code instructions within a field. A bookmark is represented by three parallel data structures, the sttbBkmk, the plcbkf and the plcbkl. The sttbBkmk is a string table which contains the name of each defined bookmark. The plcbkf records the beginning CP position of each bookmark.

The plcbkl records the limit CP position that delimits the end of a bookmark. Since bookmarks may be nested within one another to any level, the BKF structure stored in the plcbkf consists of a single index that identifies which plcbkl marks the end of the bookmark. When a run of text is tagged with a particular character style, a chpx defined for the character style is applied to the character properties defined for the paragraph style of the paragraph that contains the text. This means that the character style can change one or more of the character property field settings specified by the paragraph style of a paragraph to a particular setting without changing the value of any other field.

In Word 6. By applying a CHPX to the character properties CHP inherited by a particular paragraph from its style, it is possible to reconstitute the CHP for the portion of the character run that intersects that paragraph.

The high-order byte must be zero. The maximum value for a single byte is 0xFF. The intensity for each argument is in the range 0 through If all three intensities are zero, the result is black. If all three intensities are , the result is white. CP Character Position : A four-byte integer specifying the position coordinate of a character of text within the logical text stream of a document.

Word files are. To locate the native data for Embedded objects, scan the plc of field codes for the mother, header, footnote and annotation, textbox and header textbox documents fib. For each separator field, get the chp. If chp. The file location of the object data is stored in chp. At the specified location an object header is stored followed by the native data for the object. A piece table must be stored in the file to describe the text stream of the document. Due to Unicode compression to code page , all files simple and complex now contain a piece table.

FC File Character position : A four-byte integer which is the byte offset of a character or other object from the beginning of a stream of the. Before a file has been edited i. After a file has been edited i. Begins at offset 0 in the file.

This gives the beginning offset and lengths of the document’s text stream and subsidiary data structures within the file. Also stores other file status information. The first part of the structure contains field codes which instruct Word to insert text into the second part of the structure, the field result.

Fields in Word are used to insert text from an external file or to quote another part of a document, to mark index and table of contents entries and produce indexes and tables of contents, maintain DDE links to other programs, to produce dates, times, page numbers, sequence numbers, etc.

There are 91 different field types. A field begin mark delimits the beginning of a field and precedes any of the field codes stored in the field. The end of the field codes and the beginning of the field result is marked with the field separator and the field result and the field itself are terminated by a field end mark. The CP locations of the field begin mark, field separator, and field end mark are recorded in plcfld data structures that are maintained for the main document and all of the subdocuments of the main document whenever a field is inserted or edited.

A field can be dead, in which case it has no field separator, no field result, and no entry in the plcfld. See the definition of the FLD structure for a list of possible dead field code strings. An array of two-byte FLD structures is stored in the plcfld in a 1-to-1 correspondence with the recorded CP entries. An FLD associated with a field begin mark records the type of the field.

An FLD associated with the field end mark records the current status of the field i. Fields may be nested. Twenty 20 levels of nesting are permitted. FKP Formatted disK Page : A data structure that fits in one byte page that encodes either the character properties or the paragraph properties of a certain portion of a Word. An FKP consists of four components: 1 a count of the number of runs or paragraphs described by the page. Each BX begins with an offset that locates the properties of the paragraph that begins at a particular FC.

Then search through the bin table for the type of property you want to produce, to find the FKP in the document stream whose array of FCs encompasses the FC of the document character.

Add this offset to the beginning address of the FKP in memory. The text stream of a non-complex file can be described by an fc an offset from the beginning of the file to mark where the text begins and a ccp count of CPs to record how many characters are stored in the text stream.

However, a full-saved piece table will not have property modifiers prms and all text in the file is referenced by the piece table. The 0th sprm is recorded at offset 0 of the structure. Any succeeding sprms are recorded immediately after the end of the preceding sprm. OLE 2. Only main documents and header documents contain Office Drawing objects. The native data for an Office Drawing object may be obtained by taking the CP for the special character and using this to find the corresponding entry in the plcspa.

An entry in this plc consists of a FSPA structure, which is described elsewhere in this document. Office Drawing objects can have text attached to them. Text for the textboxes is stored separately in the textbox subdocument of the main or header document.

Textboxes can be linked in chains of up to 32 textboxes. Ordering of textboxes in the subdocument is completely unrelated to the document structure due to the nature of textbox linking. This contains an index itxbxs into plctxbxs and a sequence number in the chain of linked textboxes. So, for each entry in the plctxbxs there is a corresponding entry in the plctxbxBkd at the same CP, and there may be additional entries in the plctxbxBkd to describe the breaks from one textbox to the next in linked textbox chains.

In Word data structures, an unsigned two-byte integer page number is given the acronym PN for Page Number. The PAPX contains an ISTD a style code to identify the style in control of the paragraph and a grpprl which specifies how the style’s paragraph properties must be changed to produce the paragraph properties of the paragraph. A paragraph style provides a set of character and paragraph property defaults for the text of any paragraph tagged with that style.

When a new paragraph is created and given a particular style, newly typed text is set to the character and paragraph properties of that style unless the user makes an exception to the paragraph style definition by performing other editing operations.

The fcPic is a byte offset into the data stream. Beginning at the position recorded in chp. If the picture is an Office shape, a Window’s metafile or a bitmap, the shape, metafile or bitmap will immediately follow the PIC. Pictures that are a reference to an Office shape file will include both the filename and the shape in that order.

Pictures inserted with Word 97 and later versions are in the new Office shape format documented elsewhere. However, pictures can be copied from older files into newer ones and their old format will persist until the picture is edited or displayed. See Appendix B for a discussion of this technique. The array of CPs in the plcfpcd defines a partitioning of the Word document into disjoint pieces.

The second array is an array of PCDs Piece Descriptors which is in 1-to-1 correspondence to the array of CPs that records the physical location in the Word file where the corresponding piece begins.

 
 

Microsoft office word 97-2007 binary file format (.doc) specification free download

 
 

Participation in the program is voluntary and no information collected is used to identify or contact you. Finally they have doe something about this Windows Vista, bit versions Download the package now.

Download the package now. Update for Windows Vista, bit versions Download the package now. From the Windows Vista Team blog. This white paper describes the ways Microsoft strives to continuously improve Windows Vista. It then introduces Windows Vista Service Pack 1 SP1 and describes how the service pack will fit into the ongoing improvement process. Posted by markd in Security 5 Comments.

I chose cancel then it claims it is analysing my drive and finding porn. I see Sandi has been aware of this for awhile. So I finally gave in and installed Vista and Office on my work machine. Change for change sakes it seems to me. I notice it is slower than XP. IE7 seems to take along time to load simple pages.

Office is a productivity nightmare. I realise it is going to take some time to get used to it. I will try to keep an open mind and use it for 30 days and then see what I think. I am going to put a desktop out in the office with Vista and Office and get the users to try it.

I think they will have some interesting things to say about it. Wow, I just realised I can access this from Word and post, edit etc. Jun There are 91 different field types. A field begin mark delimits the beginning of a field and precedes any of the field codes stored in the field. The end of the field codes and the beginning of the field result is marked with the field separator and the field result and the field itself are terminated by a field end mark.

The CP locations of the field begin mark, field separator, and field end mark are recorded in plcfld data structures that are maintained for the main document and all of the subdocuments of the main document whenever a field is inserted or edited. A field can be dead, in which case it has no field separator, no field result, and no entry in the plcfld.

See the definition of the FLD structure for a list of possible dead field code strings. An array of two-byte FLD structures is stored in the plcfld in a 1-to-1 correspondence with the recorded CP entries. An FLD associated with a field begin mark records the type of the field. An FLD associated with the field end mark records the current status of the field i.

Fields may be nested. Twenty 20 levels of nesting are permitted. FKP Formatted disK Page : A data structure that fits in one byte page that encodes either the character properties or the paragraph properties of a certain portion of a Word. An FKP consists of four components: 1 a count of the number of runs or paragraphs described by the page. Each BX begins with an offset that locates the properties of the paragraph that begins at a particular FC.

Then search through the bin table for the type of property you want to produce, to find the FKP in the document stream whose array of FCs encompasses the FC of the document character. Add this offset to the beginning address of the FKP in memory. The text stream of a non-complex file can be described by an fc an offset from the beginning of the file to mark where the text begins and a ccp count of CPs to record how many characters are stored in the text stream.

However, a full-saved piece table will not have property modifiers prms and all text in the file is referenced by the piece table. The 0th sprm is recorded at offset 0 of the structure. Any succeeding sprms are recorded immediately after the end of the preceding sprm. OLE 2. Only main documents and header documents contain Office Drawing objects. The native data for an Office Drawing object may be obtained by taking the CP for the special character and using this to find the corresponding entry in the plcspa.

An entry in this plc consists of a FSPA structure, which is described elsewhere in this document. Office Drawing objects can have text attached to them. Text for the textboxes is stored separately in the textbox subdocument of the main or header document. Textboxes can be linked in chains of up to 32 textboxes. Ordering of textboxes in the subdocument is completely unrelated to the document structure due to the nature of textbox linking.

This contains an index itxbxs into plctxbxs and a sequence number in the chain of linked textboxes. So, for each entry in the plctxbxs there is a corresponding entry in the plctxbxBkd at the same CP, and there may be additional entries in the plctxbxBkd to describe the breaks from one textbox to the next in linked textbox chains.

In Word data structures, an unsigned two-byte integer page number is given the acronym PN for Page Number. The PAPX contains an ISTD a style code to identify the style in control of the paragraph and a grpprl which specifies how the style’s paragraph properties must be changed to produce the paragraph properties of the paragraph.

A paragraph style provides a set of character and paragraph property defaults for the text of any paragraph tagged with that style. When a new paragraph is created and given a particular style, newly typed text is set to the character and paragraph properties of that style unless the user makes an exception to the paragraph style definition by performing other editing operations.

The fcPic is a byte offset into the data stream. Beginning at the position recorded in chp. If the picture is an Office shape, a Window’s metafile or a bitmap, the shape, metafile or bitmap will immediately follow the PIC. Pictures that are a reference to an Office shape file will include both the filename and the shape in that order. Pictures inserted with Word 97 and later versions are in the new Office shape format documented elsewhere.

However, pictures can be copied from older files into newer ones and their old format will persist until the picture is edited or displayed.

See Appendix B for a discussion of this technique. The array of CPs in the plcfpcd defines a partitioning of the Word document into disjoint pieces. The second array is an array of PCDs Piece Descriptors which is in 1-to-1 correspondence to the array of CPs that records the physical location in the Word file where the corresponding piece begins. To find the physical location of a particular logical character in a Word document, take the CP coordinate of that character within the document and find the piece that contains that character.

Finally, add the offset of the desired character from the beginning of its piece to the FC of the beginning of the piece. If the second most significant bit is clear, then this indicates the actual file offset of the Unicode character two bytes. If the second most significant bit is set, then the actual address of the codepage compressed version of the Unicode character one byte , is actually at the offset indicated by clearing this bit and dividing by two.

The lengths of the data structures stored in PLCFs within Word files are listed later in this document. PLF PLex stored in File : A data structure consisting of an array of structures preceded by a long count of structures. If the user has made only a small change to formatting that can be expressed as a single 1 or 2-byte sprm, that sprm is stored within the prm. A single run may cross paragraph boundaries and may encompass the entire document.

Users frequently treat sections as the equivalent of a chapter in a book. The boundaries of sections mark locations where the layout rules for a document number of columns, text of headers and footers to use, whether page numbers should be displayed, etc.

The array of CPs in the plcfsed records the boundaries of sections in the Word document. If the FC stored in a SED is -1, the section properties of the section are exactly equal to the standard section properties.

Use this index to locate the SED in the plcfsed which describes the section. It consists of an operation code which identifies the field s to be changed, and an operand which gives the value that a particular field is changed to or a parameter passed to a procedure to change the field or fields. A prl property modifiers stored in a list is a sprm plus its operand. Every PAPX for every paragraph recorded in a document contains an ISTD which identifies the style from which a paragraph inherited its default character and paragraph properties.

STTBFs consist of an optional short containing 0xFFFF, indicating that the strings are extended character strings, a short indicating how many strings are included in the string table, another short indicating the size in bytes of the extra data stored with each string and each string followed by the extra data.

Non-extended character Pascal strings begin with a single byte length count which describes how many characters follow the length byte in the string. Extra data associated with a string may also be stored in an sttbf. Extended character strings are stored just the same, except they have a double byte length count and each extended character occupies two bytes.

Each subdocument has its own CP coordinate space. In other words, data structures are stored in Word files that are components of these subdocuments. In full-saved documents, a simple calculation with values stored in the FIB produces the file offset of the beginning of the subdocument text streams if they exist.

The length of these streams is also stored. In fast-saved documents, the piece tables of subdocuments are concatenated to the end of the main document piece table.

In this case, to identify the beginning of subdocument text, you must sum the length of the main document text stream with the lengths of any subdocument text streams stored ahead of the subdocument information stored in the FIB and treat this sum as a CP coordinate.

To retrieve the text of the subdocument, you must do lookups in the piece table, starting with the piece that contains the beginning CP coordinate, to find the physical location of each piece of the subdocument text stream.

The last paragraph of each cell is terminated by a special paragraph mark called a cell mark. Following the cell mark that ends the last cell of a table row, the table row is terminated by a special paragraph mark called a row mark.

When Word displays a table row, it assigns a rectangular shaped display area to each cell in the row. The leftmost display area in a table row is assigned to the 0th cell of the row; the next display area to the right is assigned to the 1st cell of the row, etc. The text of the cell is wrapped to fit its display area. As more text is added to the cell, the cell display area extends downward. A set of table properties that determine how many cells are in a row, where the horizontal boundaries of cell display areas are, and what borders are drawn around each cell in the table is stored for the row mark that marks the end of the table row.

The information in the TAP for a table row is stored in a Word file as a list of sprms that modify a TAP which has been cleared to zeros. This list of table sprms is appended to the grpprl of paragraph sprms that is recorded in the PAPX for the row mark that delimits the end of a table row. Note In this document, bit 0 is the low-order bit.

Structures are described as they would be declared in C for the Intel architecture. When numbering bytes in a word from low offset towards high offset, two-byte integers have their least significant eight bits stored in byte 0 and most significant eight bits in byte 1. If bit 31 is the most significant bit in a four-byte integer, bits 31 through 24 are stored in byte 3 of a four-byte integer, bits 23 through 16 are stored in byte 2, bits 15 through 8 will be stored in byte 1, and bits 7 through 0 are stored in byte 0.

Naming Conventions The field names in Word data structures usually consist of a prefix of lower case characters followed by an optional upper case modifier. The following tags are used in the lower case prefix of field names to document the data type of the field: b Used to name a 1 byte integer value c Prefix used to signify that an integer value is a count of some number of objects. Always a 4 byte quantity. The two following modifiers are used occasionally in this documentation: First Means that the variable marks the first of a range of objects.

For example, cpFirst would mark the first character position of a range of characters in a document. Lim Means the variable marks the limit of a range of objects i. For example, cpLim would be the limit CP of a range of characters in a document. SummaryInformation and DocumentSummaryInformation are widely understood. FIB Stored at the beginning of page 0 of the file.

Text of body, footnotes, headers Text begins at the position recorded in fib. Previous versions of Word wrote them in contiguous blocks. SEPXs are no longer guaranteed to start on a page boundary if it would span a boundary when placed immediately after the preceding SEPX.

FIB Stored at beginning of page 0 of the file. Text of body, footnotes, headers stored during last full save Text begins at the position recorded in fib. Ordinarily a file will contain only one table stream. However, in some unusual circumstances e.

This field only appears in auto saved files. These files are normal Word documents in every other way. For example, an auto saved file is typically longer than the equivalent Word document. This is recorded in all Word documents. Format is described in the Office drawing group format document. This is recorded in all Word documents formFldSttbs form field dropdown string tables Written immediately after the previously recorded table, if the document contains form field dropdown controls.

This undocumented structure maps LID and grammar checker type to grammar checking options. This is immediately followed by the allocated data hanging off the LSTFs. Only written during a fast save. Recorded in all Word documents plcfspl spelling state table Written immediately after the previously recorded table.

This is a string table containing the list names for each list. It is parallel with the plcflst, and may contain null strings if the corresponding LST does not have a list name. The sttbfffn is an sttbf where each string is instead an FFN structure note that just as for a Pascal-style string, the first byte in the FFN records the total number of bytes not counting the count byte itself.

The names of the fonts correspond to the ftc codes in the CHP structure. Format of the Data Stream embedded objects-native data Word embedded object structures are sequentially concatenated if the document contains embedded objects. Within this fstorage, zero or more custom XML parts can exist each in their own storage. Each of these storages is stamped with a unique identifier as its storage name. An instance of one of these storages contains two streams within it: 1.

A stream named item 2. FIB The FIB contains a “magic word” and pointers to the various other parts of the file, as well as information about the length of the file. The FIB starts at the beginning of the file. The FIB is defined in the structure definition section of this document. Text The text of the file starts at fib. No other occurrences of this character sequence are allowed.

Other line break or word wrap information is not stored. The following ASCII codes are treated as “special” characters when they have the character property special on chp.

Note The end of a section is also the end of a paragraph. The last character of a section is a section mark which stands in place of the paragraph mark normally required to end a paragraph.

An exception is made for the last character of a document which is always a paragraph mark although the end of a document is always an implicit end of section. Otherwise, the document is represented by the piece table stored in the file in the data beginning at fib. The document text stream includes text that is part of the main document, plus any text that exists for the footnote, header, macro, or annotation subdocuments.

The sizes of the main document and the header, footnote, macro and annotation subdocuments are stored in the fib, in variables: fib. Character and Paragraph Formatting Properties Character and paragraph properties in Word documents are stored in a compressed format.

The information stored on disk is not the properties of a particular sequence of text but the difference of the properties from a specific reference property. The PAP is a data structure that holds uncompressed paragraph property information; the CHP pronounced “chip” is a structure that holds uncompressed character property information. Each paragraph in a Word document inherits a default set of paragraph and character properties from one of the paragraph styles recorded in the style sheet data structure STSH.

A particular PAP is converted into its compressed form, the PAPX, by first comparing the pap for a paragraph with the pap stored in the style sheet for the paragraph’s style. Any properties in the paragraph’s PAP that are different from those stored in the style sheet PAP are encoded as a list of sprms grpprl. It contains an istd index to style descriptor which specifies which style entry in the style sheet contains the default paragraph and character properties for the paragraph, paragraph height information, and the list of difference sprms.

If the only difference between the paragraph’s PAP and the style’s PAP were in the justification code field, which is one byte long, one two-byte sprm, sprmPJc, would be generated to express that difference; thus the total PAPX size would be 5 bytes.

This is better than compression since the total size of a PAP is bytes. To convert a CHP for a sequence of characters contained within a single paragraph into its compressed form, the CHPX, it’s first necessary to know the paragraph style assigned to the paragraph containing those characters and any character style that may be tagging the character run.

The character properties inherited from the paragraph style are moved into a buffer. If the chp. Any properties in the paragraph’s CHP that are different from those stored in the generated CHP are encoded as a list of sprms grpprl. The sprms express how the content of the CHP generated from the paragraph and character styles should be transformed to create the character properties for the text run.

If one of the bit fields in the CHP to be compressed such as fBold is different from the reference CHP, you would build a difference sprm using sprmCFBold in the first byte and the bytes pattern 0x81 in the second byte which signifies that the value of the bit in the CHP to be compressed is of opposite value from the value stored in the reference CHP.

If there was no difference, sprmCFBold would not be recorded in the grrprl to be generated. If there were a difference in a field larger than a single bit such as the chp. If a sequence of characters has the same character properties and the sequence spans more than one paragraph, it’s necessary to examine each paragraph’s properties and to generate a different CHPX every time there is a change of style.

In Word documents, the fundamental unit of text for which character exception information is kept is the run of exception text, a contiguous sequence of characters stored on disk that all have the same exception properties with respect to their underlying style character properties. If a user never changed the character properties inherited from the styles used in the document and did a complete save of the document, although each of those styles may have different properties, the entire document stream would be one large run of exception text and one CHPX would suffice to describe the character properties of the entire document.

The fundamental unit of text for which paragraph properties are recorded is the paragraph. An FKP is a byte data structure that is stored in one page of a Word file. This byte array, named rgb, is in 1-to-1 correspondence with the rgfc. This array called the rgbx is in 1-to-1 correspondence with the rgfc. Word uses this optimization.

An rgb or rgbx[]. When an rgb or rgbx[]. For CHPX FKPs a 0 rgb value means the properties of the run of text were exactly equal to the character properties inherited from the style of the paragraph it was in. The new FC is added at the end of the rgfc. Bin Tables A bin table plcfbte partitions the total extent of the Word file that contains text characters into a set of contiguous intervals marked by an fcFirst and an fcLim.

The fcFirst for the nth interval would be plcfbte. Associated with each interval is a BTE. Even though a sequence of text may be between two paragraph end marks, it may reside in a paragraph different from the one defined by the next paragraph end mark, because the text may have been moved by the user into a different paragraph.

In the logical text stream represented by the document’s piece table, the paragraph mark that follows the moved text is stored in a non-adjacent physical location in the file. Style Sheet A style sheet is a collection of styles. In Word, each document has its own style sheet. A style is a collection of formatting information with a name.

Word 6. Versions of Word prior to 6. Character styles have just character formatting. Paragraph styles have both character and paragraph formatting. The style sheet establishes a correspondence between a style code and a style definition. Note: the storage and behavior of styles has changed considerably since WinWord 2. The range of the stc was , with as the null style. The styles for a document both paragraph and character styles are stored in an array in each document.

The array can have unused slots. Some slots at the beginning of the array are reserved for specific styles, whether they were created yet or not. Istd are Heading Istd 10 is Default Paragraph Font. Istd are reserved. So the first non-fixed index is 15 see stshi. Each document has a separate array, so the same style will usually [Those styles in fixed locations in the style sheet will have the same istd’s in all documents] have a different istd in two different documents. Thus style matching between documents must be done by name or by sti if the styles are built-in.

Styles are usually referred to using an istd. A doc, istd pair uniquely identifies a style because it tells which style is in which array. Built-in styles have a unique sti to indicate which built-in style they reference. User-defined styles use stiUser. Every paragraph has a paragraph style. Every character has a character style. The default paragraph style is Normal stiNormal, istdNormal.

The formatting of a paragraph the PAP and a character the CHP depend on the paragraph and character styles applied to them, as well as any additional formatting stored in the FKPs.

For a CHP: 1. Properties from the character’s style the UPX. The STSHI contains general information about the following style sheet, including how many styles are in it. The cbStshi to use for those file versions is 4 bytes. Then for each style in the style sheet stshi. The current definition of the STSHI structure might be longer or shorter than that stored in the file, the style sheet reader routine needs to take this into account.

There will be stshi. Note: styles can be empty, i. The stshi. If the STD base is grown in a future version, the file format doesn’t change, because the style sheet reader can discard parts it doesn’t know about, or use defaults if the file’s STD is not as large as it was expecting. Currently, stshi. Note: the built-in style names may need to be “regenerated” if the file is opened in a different language or if stshi. This indicates the number of fixed-index positions reserved in the style sheet when it was saved.

If not, the built-in style names need to be “regenerated”, i. See notes on sprmCRgftcX for details. Introduced in Word stshi. The index into mpstilsd corresponds to the index of the style that the LSD structure affects see std. A cb of zero indicates an empty slot in the style array, i. Note: the STD structure may be longer or shorter than the one stored in the file; stshi. The style sheet reader routine must take this into account.

The variable-length part of the STD has three variable-length subparts, the xstzName, the grupx, and the grupe. An sti is intended to be permanent throughout versions of Word, although new sti’s may be added in new versions. The types currently in use are: stkPara 1 A paragraph style stkChar 2 A character style stkTable 3 A table style stkList 4 A list style More style types may exist in the future, so styles of an unknown type should be discarded.

A style is always based on another style or the null style istdNil. Following a “chain” of based-on styles will always end at the null style, because a based-on chain cannot have a loop in it. A style can have up to 11 “ancestors” in its based-on chain, including the null style. A style’s definition is built up from the style that it is based on. See std. For a paragraph style, this is the style to apply when Enter is pressed at the end of a paragraph. For a character style, the next style is essentially ignored, but should be the same as the current style.

The name is stored as an xstz preceded by a length byte, followed by a null-terminator. A style name can contain multiple “aliases”, separated by commas. Aliases are alternate names for the same style e. WinWord 2. If a style is a built-in style, the built-in style name is always stored first. All names and aliases must be unique within a style sheet e.

A style name including all its aliases and comma separators can be up to characters long. So the xstz format of that name can be up to characters.

Style names are case sensitive. The built-in style names corresponding to each sti listed previously are defined for each language version of Word. See below. This array begins after the variable-length xstzName field, at the next even-byte offset within the STD.

A UPX Universal Property eXception describes the difference in formatting of this style as compared to its based-on style. The meaning of each UPX depends on the style type std. For a paragraph style, std. For a character style, std. Note that new UPXs may be added in the future, so std. Any UPXs past those expected should be discarded.

For a list style, std. For a table style, std. In addition, each style type can contain an additional UPX containing revision mark information, which is not documented. The grpprl within each UPX contains the differences of this property type for this style from the UPE of that property type for the based on style.

Even if the grpprl is empty, the istd is still needed. These are not stored in the file! Rather, they are constructed using the std. The std. Note: UPEs are not stored in the file. The meaning of each UPE depends on the style type std.

In addition, each style type can contain an additional UPE containing revision mark information, which is not documented. UPE needs to be constructed first. Eventually by following the based-on chain, a style will be based on the null style istdNil. It can be constructed by starting with the first UPE from the based-on style std.

To apply a UPX. Note: a UPE. Merging grpprls can be difficult, but for character styles it is easy because no prls in character style grpprls should interact with each other. Each prl from the source the UPX. Merging grpprls can be difficult. List Tables Word 97 and later versions store paragraph numbering information very differently from Word 6. In Word 97 and later versions, the pap only contains two values: a short ilfo and a byte ilvl, which indicate which list the paragraph belongs to and which level of that list it is part of, respectively.

There are three list tables in a word document: the rglst, the hpllfo, and the hsttbListNames. An LST consists of two main parts: 1. A LVL structure contains two parts: 1. An LVLF, which stores all static data such as the start-at value for the list level, the numbering type arabic or roman , the alignment left, right or centered of the number, and several Word 6.

A set of pointers to variable length data: a a grpprlChpx, which sets character formatting to the paragraph number text, b a grpprlPapx, which sets paragraph formatting to the paragraph containing the number, such as indenting and tab information c the number text itself.

Word writes out the rglst as the plcflst by writing out a short integer containing the number of LST structures to be written; followed by an enumeration of the rglst, writing out each LSTF structure. It then writes the appropriate number of LVL structures as described below. List Names and the sttbListNames The string table containing the List Names is by far the least significant of the three list tables. If this list has a name, however, it is in this table: the table is a parallel array with the rglst above, and will contain an empty string for any list which does not have a list name.

An LFOLVL contains a set of flags to indicate whether just the start-at value of the LST is overridden, or whether just the formatting is overridden, or both, as well as either a start-at value or a pointer to a LVL record, depending upon the values of the flags.

Note: if the LFOLVL says the start-at value should be overridden, what that means is that the FIRST paragraph in the document with this LFO should have a number equal exactly to that start-at value, but any subsequent paragraphs should just follow the previous paragraph in the sequence. Using the pap.

Using the LFO, and the pap. If the override does not pertain to either formatting or start-at value, look up the LST for this list. Once the correct LVL record is obtained, apply the lvl. It may adjust the indents and tab settings for the paragraph. Use the other information in the LVL, such as the start at, number text, and grpprlChpx, to determine the appearance of the actual paragraph number text.

A sprm is a two-byte opcode at offset 0 which identifies the operation to be performed. If necessary information for the operation can always be expressed with a fixed length parameter, the fixed length parameter is recorded immediately after the opcode beginning at offset 2. If the parameter for the sprm is variable length, the count of bytes of the following parameter is stored in the byte at offset 2, followed by the parameter at offset 3.

The method for calculating the length of sprmPChgTabs is recorded below with the description of the sprm.

For sprmTDefTable and sprmTDefTable10, the length of the parameter plus 1 is recorded in the two bytes beginning at offset 2. Objects within the set are referenced by their index in the set. UINT2: Unsigned two-byte integer value. UINT4: Unsigned four-byte integer value. SINT2: Signed two-byte integer value. SINT4: Signed four-byte integer value. Additions for PowerPoint Several records were added to the binary file format with the release of PowerPoint The persistent directory is encoded as follows: 12 bit value which is 20 bit value indicates current reference number number of sequential offsets documentRef: Reverence to the document atom.

Containers: Records that keep atoms and other containers in a logical and organized way. AnimationAtom12 Added in PowerPoint It contains: 1. AnimationInfoAtom 2. Otherwise index ID in SoundCollection list. Clients using Programmable Tags to store version dependent binary file format extensions: 1.

Document 2. Handout 3. MainMaster 4. Notes 5. Slide 6. BlipEntity BlipEntity A container for information about a single picture bullet: It contains: BlipEntity Fields Offset Type Name Contents 0 ubyte winBlipType Preferred format for this picture on windows operating systems 1 ubyte macBlipType Preferred format for this picture on Macinstosh operating systems Follwing these, starting at offset 2, is a variable-length record containing the binary picture data.

When the presentation has bookmarks, in addition it contains a set of a BookmarkEntityAtom and a CString for each bookmark: 1. BookmarkEntityAtom 3. CString , containing the value of the bookmark BookmarkEntityAtom Atom that tracks bookmarks.

BroadCastDocInfo9 A container for per-document broadcast information. CString , Instance Title 1 , optional 2. CString , Instance Description 2 , optional 3. CString , Instance Speaker 3 , optional 4.

CString , Instance Contact 4, optional 5. CString , Instance EmailAddress 6 , optional 7. CString , Instance EmailName 7 , optional 8. CString , Instance UserName 16 , optional CString , Instance PresentationName 18 , optional Build IDs are generated incrementally. ChartBuild , optional 2.

DiagramBuild , optional 3. BuildAtom 2. CString , Instance Author 0 : Author of the comment 2. CString , Instance Text 1 : Text of the comment 3. CommentAtom10 CommentAtom10 An atom for information about specific comments. CString , Instance Author 0 : Last author adding comments 2. CurrentUserAtom This is written to the current user stream. DiffAtom10 DiffAtom10 An atom for collaboration info. CString : Name of the reviewer this collaboration information was created by 2.

Atom that tracks the Document level flags added in PowerPoint DocumentAtom 2. ExObjList , optional 3. Environment , Instance: DocEnvironment 0 4. SoundCollection , Instance: Sounds 5 , optional 5. PPDrawingGroup 6. List , Instance: DocInfoList 0 8. SmartTagStore11 , optional 9. OutlineTextProps11 , optional FontCollection10 , optional TxMasterStyle10Atom , optional TextDefaults10Atom , optional GridSpacingAtom10 CommentIndex10 , optional FontEmbedFlags10 , optional CString , Instance: Copyright 1 , optional CString , Instance: Keywords 2 , optional FilterPrivacyFlags10 , optional OutlineTextProps10 , optional DocToolbarStatesAtom , optional SlideListTable10 , optional DiffTree10 , optional CString , Instance: ModifyPswd 3 , optional PhotoAlbumInfoAtom , optional TxMasterStyle9Atom , optional BlipCollection , optional TextDefaults9Atom , optional SrKinsoku , optional ExHyperlink9 , optional PresAdvisoryFlags9 , optional BroadcastDocInfo9 , optional SSDocInfoAtom , optional NamedShows , optional Summary , Instance: BookmarkCollecton 0 , optional PrintOptions , optional EndDocument DocFlags12 , optional It has no content.

Environment The container for shared text entities, such as fonts, styles, rulers, etc. This container has: 1. SrKinsoku , Instance DocKinsoku 2 , optional 2. FontCollection , optional 3. TxCFExceptionAtom , optional 4. TxPFExceptionAtom , optional 5. DefaultRulerAtom , optional 6. TxSpecialInfoAtom , optional 7. ExMediaAtom 2. ExControlAtom 2. ExOleObjAtom 3. CString , Instance ClipboardName 3 that appears in the paste special dialog.

MetaFile , optional ExControlAtom Contains a long integer, slideID, which stores the unique slide identifier of the slide where this control resides. Valid values are: 0 – doesnt follow the color scheme 1 – follows the entire color scheme 2 – follows the text and background scheme 4 bool1 cantLockServerB Set if the embedded server can not be locked 5 bool1 noSizeToServerB Set if dont need to send the dimension to the embedded object 6 Bool1 isTable Set if the object is a Word table Microsoft Office PowerPoint Binary File Format.

ExHyperlinkAtom 2. ExLinkAtom 2. ExObjListAtom 2. ExAviMovie , optional 3. ExCDAudio , optional 4. ExControl , optional 5. ExEmbed , optional 6. ExHyperlink , optional 7. ExLink , optional 8. ExMCIMovie , optional 9. ExQuickTimeMovie , optional ExSubscription , optional See subType Values table below.

ExVideo 2. ExVideo A container for Video external object related information. CString , Instance 0: Path of the multimedia file. ExWavAudioEmbeddedAtom , optional 3. It contains: FilterPrivacyFlags10 fields Offset Type Name Content 0 sint4 flags Bit 1: If set, personal information gets removed upon save FontCollection A container holding information about all the fonts in the presentation.

FontEntityAtom , optional 2. FontEmbedData , optional FontCollection10 A container holding additional information about fonts in the presentation. GPointAtom This atom keeps the master coordinates of a point. If not, it contains an index into the color scheme, with each value describing a color in the Scheme Colors dialog : See Scheme Colors table below for valid values. This field can have a value of 0xFF if the color is undefined.

Handout This is a container that keeps the information about the handout master. It contains 1. PPDrawing 2.

Leave a Reply

Your email address will not be published. Required fields are marked *