Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

SRP_Extract_XML

Wondering what the Xpath is supposed to be.
if the command is
Records = Srp_Extract_XML(XML, XPath)
the variable XML contains the data of the XML file
the variable XPath contains the tags in the format "/vendorPlacementData/vendorPlacementDataList/vendorPlacementDataRecord'"
I was comparing to a program I wrote several years ago.

Comments

  • Is this path defined within a namespace?
  • XPath is a standard means on navigating to a specific point in an XML document. You can read up on it here.
  • Kevin I looked at the syntax on the webpages from W3school and the syntax matches what I was using.
    Below the beginning of the XML data

    looking at the data I was thinking that the XPath should have been "vendorPlacementData/vendorPlacementDataList/vendorPlacementDataRecord"
    The main node is "vendorPlacementData"
    The child node is "vendorPlacementDataList" (Bookstore at W3school)
    The next child node is "vendorPlacementDataRecord" (Book at W3school)
    Am I missing something?
  • Is the second line shown there valid? (specifically the extra '<' at the beginning), or even needed? It seems to be redundant.
  • Is your XML actually valid? There are two headers in that screenshot. If that's what the actual XML looks like, it won't parse.
  • This the XML file I received from the client, which make me ask the following question.
    Can I remove all the data prior to the first proper tag / node vendorPlacementData?
  • Yes, SRP_Extract_XML can parse any well-formed XML snippets. Still, let your client know this is happening. It is not your responsibility to parse invalid XML.
  • I removed the two Lines prior to the data node and I am still not getting data in my variable
  • I can't troubleshoot with a partial screenshot. Does the remainder XML pass a validation test? There are XML syntax validators on the web.
  • Send me an email to charles.dehaas@ncbi.com so I can send you the sample file they gave us.
  • I figured out your issue. XPath doesn't like default namespaces. So, you have to name the default namespace and then use that name in your XPath query.

    Path = "/x:vendorPlacementData/x:vendorPlacementDataList/x:vendorPlacementDataRecord" Ans = SRP_Extract_Xml(XML, Path, 'xmlns:x="http://"')
  • Thank you will test when back in the office
  • Good morning Kevin
    I added the namespace as you suggested but got the same result.

  • You didn't update your XPath argument as Kevin included.
  • Stand Corrected.
    Command works with the correct XPath.
    Can I get an explanation of the "/x:" in the XPath
  • Microsoft's docs state:
    If you want to [XPath] query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.


    So, let's say we have the following XML:

    <books xmlns="http://"> <book> <title>Title</title> <author>Author Name</author> <price>5.50</price> </book> </books>

    So, books and everything inside of it is in the default namespace, which is normally null. In this case, the default namespace has been renamed to "http://". In order to use SRP_Extract_XML, we have to assign that namespace to a prefix of our choosing. We can use whatever we want. So, I used 'x' when I added that namespace parameter:

    SRP_Extract_Xml(XML, Path, 'xmlns:x="http://"')

    See "xmlns" followed by a colon and an 'x'? Everything after the colon is our chosen prefix. We could have used 'd' for default. Some XML files have a whole bunch of name spaces, and SRP_Extract_XML allows us to provide custom prefixes for all of them. The caveat, for XPath, is that we have to use those prefixes we invented inside the XPath query. If I use the following XPath:

    "/books/book/title"

    I get nothing because XPath can't find them. Why? Because this XPath query is telling SRP_Extract_XML to look in the null namespace for books, then in the null namespace for book, then in the null namespace for title. But there is no null namespace in the XML document. There is a default namespace, but it has been named "http://", not null. So, we use this instead:

    "/x:books/x:book/x:title"

    This query says look in the "http://" namespace for books, then in the "http://" namespace for book, then in the "http://" namespace for title.
  • thanks Kevin
Sign In or Register to comment.