SRP_Extract_XML

charlesd · January 2023

Wondering what the Xpath is supposed to be.
if the command is
Records = Srp_Extract_XML(XML, XPath)
the variable XML contains the data of the XML file
the variable XPath contains the tags in the format "/vendorPlacementData/vendorPlacementDataList/vendorPlacementDataRecord'"
I was comparing to a program I wrote several years ago.

DonBakke · January 2023

Is this path defined within a namespace?

KevinFournier · January 2023

XPath is a standard means on navigating to a specific point in an XML document. You can read up on it here.

charlesd · January 2023

Kevin I looked at the syntax on the webpages from W3school and the syntax matches what I was using.
Below the beginning of the XML data

looking at the data I was thinking that the XPath should have been "vendorPlacementData/vendorPlacementDataList/vendorPlacementDataRecord"
The main node is "vendorPlacementData"
The child node is "vendorPlacementDataList" (Bookstore at W3school)
The next child node is "vendorPlacementDataRecord" (Book at W3school)
Am I missing something?

MattCrozier · January 2023

Is the second line shown there valid? (specifically the extra '<' at the beginning), or even needed? It seems to be redundant.

KevinFournier · January 2023

Is your XML actually valid? There are two headers in that screenshot. If that's what the actual XML looks like, it won't parse.

charlesd · January 2023

This the XML file I received from the client, which make me ask the following question.
Can I remove all the data prior to the first proper tag / node vendorPlacementData?

KevinFournier · January 2023

Yes, SRP_Extract_XML can parse any well-formed XML snippets. Still, let your client know this is happening. It is not your responsibility to parse invalid XML.

charlesd · January 2023

I removed the two Lines prior to the data node and I am still not getting data in my variable

KevinFournier · January 2023

I can't troubleshoot with a partial screenshot. Does the remainder XML pass a validation test? There are XML syntax validators on the web.

charlesd · January 2023

Send me an email to charles.dehaas@ncbi.com so I can send you the sample file they gave us.

KevinFournier · January 2023

I figured out your issue. XPath doesn't like default namespaces. So, you have to name the default namespace and then use that name in your XPath query.

	Path = "/x:vendorPlacementData/x:vendorPlacementDataList/x:vendorPlacementDataRecord"
 	Ans = SRP_Extract_Xml(XML, Path, 'xmlns:x="http://"')

charlesd · January 2023

Thank you will test when back in the office

charlesd · January 2023

Good morning Kevin
I added the namespace as you suggested but got the same result.

DonBakke · January 2023

You didn't update your XPath argument as Kevin included.

charlesd · January 2023

Stand Corrected.
Command works with the correct XPath.
Can I get an explanation of the "/x:" in the XPath

KevinFournier · January 2023

Microsoft's docs state:

If you want to [XPath] query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.

So, let's say we have the following XML:

<books xmlns="http://">  
    <book>  
        <title>Title</title>  
        <author>Author Name</author>  
        <price>5.50</price>  
    </book>  
</books>

So, books and everything inside of it is in the default namespace, which is normally null. In this case, the default namespace has been renamed to "http://". In order to use SRP_Extract_XML, we have to assign that namespace to a prefix of our choosing. We can use whatever we want. So, I used 'x' when I added that namespace parameter:

SRP_Extract_Xml(XML, Path, 'xmlns:x="http://"')

See "xmlns" followed by a colon and an 'x'? Everything after the colon is our chosen prefix. We could have used 'd' for default. Some XML files have a whole bunch of name spaces, and SRP_Extract_XML allows us to provide custom prefixes for all of them. The caveat, for XPath, is that we have to use those prefixes we invented inside the XPath query. If I use the following XPath:

"/books/book/title"

I get nothing because XPath can't find them. Why? Because this XPath query is telling SRP_Extract_XML to look in the null namespace for books, then in the null namespace for book, then in the null namespace for title. But there is no null namespace in the XML document. There is a default namespace, but it has been named "http://", not null. So, we use this instead:

"/x:books/x:book/x:title"

This query says look in the "http://" namespace for books, then in the "http://" namespace for book, then in the "http://" namespace for title.

charlesd · January 2023

thanks Kevin

SRP_Extract_XML

Comments